Uncertainty Quantification in the Classification of High Dimensional Data

03/26/2017
by   Andrea L. Bertozzi, et al.
0

Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning. We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach. We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms.

READ FULL TEXT

page 22

page 26

page 27

research
07/10/2020

Semi-supervised Learning for Multilayer Graphs Using Diffuse Interface Methods and Fast Matrix Vector Products

We generalize a graph-based multiclass semi-supervised classification te...
research
11/01/2021

Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification

Semi-supervised learning aims to leverage a large amount of unlabeled da...
research
05/23/2018

Large Data and Zero Noise Limits of Graph-Based Semi-Supervised Learning Algorithms

Scalings in which the graph Laplacian approaches a differential operator...
research
09/14/2017

Fast semi-supervised discriminant analysis for binary classification of large data-sets

High-dimensional data requires scalable algorithms. We propose and analy...
research
10/23/2022

A study of uncertainty quantification in overparametrized high-dimensional models

Uncertainty quantification is a central challenge in reliable and trustw...
research
01/31/2019

Bayesian active learning for optimization and uncertainty quantification in protein docking

Motivation: Ab initio protein docking represents a major challenge for o...
research
10/30/2019

A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning

In this paper, we proposed a general framework for data poisoning attack...

Please sign up or login with your details

Forgot password? Click here to reset