Reinforcement learning from human feedback (RLHF) is a technique for tra...
Large Language Models (LLMs) perform complex reasoning by generating
exp...
Biological vision systems make adaptive use of context to recognize obje...
Language models are known to learn a great quantity of factual informati...
Recent work on explainable NLP has shown that few-shot prompting can ena...
Current abstractive summarization models either suffer from a lack of cl...
Many past works aim to improve visual reasoning in models by supervising...
Providing natural language instructions in prompts is a useful new parad...
Do language models have beliefs about the world? Dennett (1995) famously...
The problem of identifying algorithmic recourse for people affected by
m...
Feature importance (FI) estimates are a popular form of explanation, and...
Many methods now exist for conditioning model outputs on task instructio...
Influence functions approximate the 'influences' of training data-points...
Data collection for natural language (NL) understanding tasks has
increa...
Algorithmic approaches to interpreting machine learning models have
prol...
Vision models are interpretable when they classify objects on the basis ...
We provide code that produces beautiful poetry. Our sonnet-generation
al...