Course description

What is the internal structure of modern neural networks and how can we study it? This course provides a broad and deep introduction to interpretability, the subfield of machine learning concerned with understanding precisely how models process information and why they produce the outputs they do. We will cover topics such as probing, steering, causal abstraction, and sparse autoencoders, with a particular emphasis on causal methods and large language models. The course will include guest lectures from leading interpretability labs across academia and industry.

Staff

Thomas Icard
Thomas Icard StanfordInstructor
Atticus Geiger
Atticus Geiger GoodfireInstructor
Amir Zur
Amir Zur StanfordInstructor
Jing Huang
Jing Huang StanfordInstructor
Junyi Tao
Junyi Tao StanfordTeaching Assistant
Taka Yamakoshi
Taka Yamakoshi StanfordTeaching Assistant
Siri Vatsavaya
Siri Vatsavaya GoodfireCourse Manager

Please reach the staff at cs221m-spr2526-staff@lists.stanford.edu.

Logistics

Coursework

The course will have five weeks of notebook-guided lectures, four weeks of guest lectures, and one week of final presentations. Students will be graded for participation in lectures and for their final project.

Syllabus

Please download the syllabus here.

.

Schedule

Note: schedule is subject to change.

Date Lesson Readings Materials
Week 1
Mon. March 30
Introduction
Week 1
Wed. April 1
Review of language models Slides
Interactive notebook
Week 2
Mon. April 6
Behavioral analysis and input attribution Slides
Interactive notebook
Week 2
Wed. April 8
Probes for decoding activations Slides
Interactive notebook
Week 3
Mon. April 13
Causal methods for interpretability Slides
Week 3
Wed. April 15
Interventions for steering activations Slides
Interactive notebook
Week 4
Mon. April 20
Theory of causal abstraction Interactive notebook
Week 4
Wed. April 22
Causal mediation analysis Slides
Interactive notebook
Week 5
Mon. April 27
Designing counterfactuals Slides
Interactive notebook
Week 5
Wed. April 29
Automated causal interpretability
Davies et al. 2023
Cao et al. 2020, 2022
Geiger et al. 2023 DAS
Wu et al. 2023 boundless DAS
Week 6
Mon. May 4
Guest lecture: Chris Potts
Week 6
Wed. May 6
Guest lecture: Jack Merullo
Week 7
Mon. May 11
Guest lecture: David Bau
Week 7
Wed. May 13
Mid-project check-in Final project description
Week 8
Mon. May 18
Guest lecture: Neel Nanda
Week 8
Wed. May 20
Guest lecture: Jing Huang
Week 9
Mon. May 25
No lecture - Memorial Day
Week 9
Wed. May 27
Guest lecture: Jack Lindsey
Week 10
Mon. June 1
Guest lecture: Naomi Saphra
Week 10
Wed. June 3
Project presentations Final project description

Frequently asked questions

I have submitted an application but have not heard back by Mar 27th, is it still possible to enroll in the course?

We have received more than 200 applications, far more than what we initially expected. It is truly exciting to see so many students interested in interpretability! We have increased the course capacity to accommodate as many students as we can, however, we are constrained by resources, e.g., course staff, project mentors, compute, etc. At this point, we do not plan to further increase the class size. We will likely have another iteration of the course next year, so if you are still around, check it out next spring!

Can I audit this course without enrollment?

We generally do not allow auditing. However, you are more than welcome to attend the guest lectures, which will be in the second half of the course. We will also try to make most of the course materials public.

I have enrolled in the class, but cannot attend some lectures in person.

We value participation. Students are expected to attend all lectures and engage with the course materials. If you are unable to attend a lecture due to travel or other unforeseen circumstances, you must notify us by email in advance, i.e., before the lecture. Please include the date of the anticipated absence and the reason for your absence. We will follow up with you as necessary.

Will the lecture be recorded? Will the recordings be available online?

As this is the first offering of the course, lectures during the first half of the course will not be recorded. Guest lectures may be recorded and shared publicly at the discretion of the speakers.