About Us
Welcome to IMI, the student-run Imperial College London AI Safety research group, focused on mechanistic interpretability! Our team has several projects running with the aim to explore comprehensively how AI systems work, from individual neurons to complex circuits. We want to contribute to reaching transparent and reliable AI developments.
Research Interests
We started with fundamental mechanistic interpretability techniques and are diving into exciting research. Currently, we focus on:
- Circuit Detection: Mapping connections in neural networks to understand specific pathways and circuits, getting insights into the roles of individual neurons.
- Neuron and Layer Functionality Mapping: Developing tools to identify and categorize neuron and layer functions, aiming to build a comprehensive view of how information is represented and processed at different layers within the model.
- Core Mechanistic Interpretability Studies: Our main projects currently include denoising diffusion models, understanding feature interactions between layers, and studying sparse feature circuits in large language models for specific refusal mechanisms.