Estimating Predictive Uncertainty Under Program Data Distribution Shift

07/23/2021
by   Yufei Li, et al.
0

Deep learning (DL) techniques have achieved great success in predictive accuracy in a variety of tasks, but deep neural networks (DNNs) are shown to produce highly overconfident scores for even abnormal samples. Well-defined uncertainty indicates whether a model's output should (or should not) be trusted and thus becomes critical in real-world scenarios which typically involves shifted input distributions due to many factors. Existing uncertainty approaches assume that testing samples from a different data distribution would induce unreliable model predictions thus have higher uncertainty scores. They quantify model uncertainty by calibrating DL model's confidence of a given input and evaluate the effectiveness in computer vision (CV) and natural language processing (NLP)-related tasks. However, their methodologies' reliability may be compromised under programming tasks due to difference in data representations and shift patterns. In this paper, we first define three different types of distribution shift in program data and build a large-scale shifted Java dataset. We implement two common programming language tasks on our dataset to study the effect of each distribution shift on DL model performance. We also propose a large-scale benchmark of existing state-of-the-art predictive uncertainty on programming tasks and investigate their effectiveness under data distribution shift. Experiments show that program distribution shift does degrade the DL model performance to varying degrees and that existing uncertainty methods all present certain limitations in quantifying uncertainty on program dataset.

READ FULL TEXT
research
06/06/2019

Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift

Modern machine learning methods including deep learning have achieved gr...
research
12/13/2022

An Exploratory Study of AI System Risk Assessment from the Lens of Data Distribution and Uncertainty

Deep learning (DL) has become a driving force and has been widely adopte...
research
06/11/2022

CodeS: A Distribution Shift Benchmark Dataset for Source Code Learning

Over the past few years, deep learning (DL) has been continuously expand...
research
05/14/2020

Estimating predictive uncertainty for rumour verification models

The inability to correctly resolve rumours circulating online can have h...
research
09/03/2021

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

The dominating NLP paradigm of training a strong neural predictor to per...
research
12/17/2021

Can uncertainty boost the reliability of AI-based diagnostic methods in digital pathology?

Deep learning (DL) has shown great potential in digital pathology applic...
research
04/24/2020

Towards Characterizing Adversarial Defects of Deep Learning Software from the Lens of Uncertainty

Over the past decade, deep learning (DL) has been successfully applied t...

Please sign up or login with your details

Forgot password? Click here to reset