Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

12/27/2021
by   Jiahuan Pei, et al.
19

Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer model predictions is crucial for building trustable machine learning applications, e.g., medical diagnosis. Although many recent transformer extensions have been proposed, the study of the uncertainty estimation of transformer models is under-explored. In this work, we propose a novel way to enable transformers to have the capability of uncertainty estimation and, meanwhile, retain the original predictive performance. This is achieved by learning a hierarchical stochastic self-attention that attends to values and a set of learnable centroids, respectively. Then new attention heads are formed with a mixture of sampled centroids using the Gumbel-Softmax trick. We theoretically show that the self-attention approximation by sampling from a Gumbel distribution is upper bounded. We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets. The experimental results demonstrate that our approach: (1) achieves the best predictive performance and uncertainty trade-off among compared methods; (2) exhibits very competitive (in most cases, improved) predictive performance on ID datasets; (3) is on par with Monte Carlo dropout and ensemble methods in uncertainty estimation on OOD datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2022

Uncertainty estimation for out-of-distribution detection in computational histopathology

In computational histopathology algorithms now outperform humans on a ra...
research
04/26/2023

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

Large language models (LLMs) are known for their exceptional performance...
research
10/25/2022

Revisiting Softmax for Uncertainty Approximation in Text Classification

Uncertainty approximation in text classification is an important area wi...
research
10/12/2022

Deep Combinatorial Aggregation

Neural networks are known to produce poor uncertainty estimations, and a...
research
03/04/2023

Calibrating Transformers via Sparse Gaussian Processes

Transformer models have achieved profound success in prediction tasks in...
research
12/17/2020

Transformer Interpretability Beyond Attention Visualization

Self-attention techniques, and specifically Transformers, are dominating...
research
06/02/2022

BayesFormer: Transformer with Uncertainty Estimation

Transformer has become ubiquitous due to its dominant performance in var...

Please sign up or login with your details

Forgot password? Click here to reset