Natural language is an appealing medium for explaining how large languag...
A number of recent benchmarks seek to assess how well models handle natu...
Obtaining human-interpretable explanations of large, general-purpose lan...
Causal abstraction is a promising theoretical framework for explainable
...
A faithful and interpretable explanation of an AI model's behavior and
i...
Causal abstraction provides a theory describing how several causal model...
Explainability methods for NLP systems encounter a version of the fundam...
The increasing size and complexity of modern ML systems has improved the...
Distillation efforts have led to language models that are more compact a...
In many areas, we have well-founded insights about causal structure that...
Structural analysis methods (e.g., probing and feature attribution) are
...
We introduce DynaSent ('Dynamic Sentiment'), a new English-language benc...
Humans have a remarkable capacity to reason about abstract relational
st...
In adversarial (challenge) testing, we pose hard generalization tasks in...
Deep learning models for semantics are generally evaluated using natural...
Standard evaluations of deep learning models for semantics using natural...