Language models (LMs) are becoming the foundation for almost all major
l...
In-context learning refers to the ability of a model to condition on a p...
We present a methodology for modifying the behavior of a classifier by
d...
To improve model generalization, model designers often restrict the feat...
As machine learning systems grow in scale, so do their training data
req...
We develop a methodology for assessing the robustness of models to
subpo...
We study the roots of algorithmic progress in deep policy gradient algor...
Building rich machine learning datasets in a scalable manner often
neces...
Dataset replication is a useful tool for assessing whether improvements ...
Deep neural networks have been demonstrated to be vulnerable to backdoor...
We show that the basic classification framework alone can be used to tac...
We show that the basic classification framework alone can be used to tac...
Many applications of machine learning require models that are human-alig...
Adversarial examples have attracted significant attention in machine
lea...
Correctly evaluating defenses against adversarial examples has proven to...
We study how the behavior of deep policy gradient algorithms reflects th...
We provide a new understanding of the fundamental nature of adversariall...
Batch Normalization (BatchNorm) is a widely adopted technique that enabl...
Machine learning models are often susceptible to adversarial perturbatio...
Recent work has shown that neural network-based vision classifiers exhib...
Recent work has demonstrated that neural networks are vulnerable to
adve...