Causal abstraction for faithful model interpretation
Atticus Geiger, Christopher Potts, Thomas Icard
Preprint
pdf
bibtex
Finding alignments between interpretable causal variables and distributed neural representations
Atticus Geiger*, Zhengxuan Wu*, Christopher Potts, Thomas Icard, Noah D. Goodman
CLeaR 2024
pdf
bibtex
CEBaB: Estimating the causal effects of real-world concepts on NLP model behavior
Eldar D Abraham, Karel D'Oosterlinck, Amir Feder, Yair Gat, Atticus Geiger, Christopher Potts, Roi Reichart, Zhengxuan Wu
NeurIPS 2022
pdf
bibtex
Inducing Causal Structure for Interpretable Neural Network
Atticus Geiger*, Zhengxuan Wu*, Hanson Lu*, Josh Rozner, Elisa Kreiss, Thomas Icard, Noah D. Goodman, Christopher Potts
ICML 2022
pdf
bibtex
Causal Abstractions of Neural Networks
Atticus Geiger*, Hanson Lu*, Thomas Icard, Christopher Potts
NeurIPS 2021
pdf
bibtex