It's better to say "I can't answer" than answering incorrectly: Towards Safety critical NLP systems
In order to make AI systems more reliable and their adoption in safety critical applications possible, it is essential to impart the capability to abstain from answering when their prediction is likely to be incorrect and seek human intervention. Recently proposed "selective answering" techniques model calibration as a binary classification task. We argue that, not all incorrectly answered questions are incorrect to the same extent and the same is true for correctly answered questions. Hence, treating all correct predictions equally and all incorrect predictions equally constraints calibration. In this work, we propose a methodology that incorporates the degree of correctness, shifting away from classification labels as it directly tries to predict the probability of model's prediction being correct. We show the efficacy of the proposed method on existing Natural Language Inference (NLI) datasets by training on SNLI and evaluating on MNLI mismatched and matched datasets. Our approach improves area under the curve (AUC) of risk-coverage plot by 10.22% and 8.06% over maxProb with respect to the maximum possible improvement on MNLI mismatched and matched set respectively. In order to evaluate our method on Out of Distribution (OOD) datasets, we propose a novel setup involving questions with a variety of reasoning skills. Our setup includes a test set for each of the five reasoning skills: numerical, logical, qualitative, abductive and commonsense. We select confidence threshold for each of the approaches where the in-domain accuracy (SNLI) is 99%. Our results show that, the proposed method outperforms existing approaches by abstaining on 2.6% more OOD questions at respective confidence thresholds.
READ FULL TEXT