Deep Reference Priors: What is the best way to pretrain a model?

by   Yansong Gao, et al.

What is the best way to exploit extra data – be it unlabeled data from the same task, or labeled data from a related task – to learn a given task? This paper formalizes the question using the theory of reference priors. Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the weights of the model. Such priors enable the task to maximally affect the Bayesian posterior, e.g., reference priors depend upon the number of samples available for learning the task and for very small sample sizes, the prior puts more probability mass on low-complexity models in the hypothesis space. This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data. We develop generalizations of reference priors and demonstrate applications to two problems. First, by using unlabeled data to compute the reference prior, we develop new Bayesian semi-supervised learning methods that remain effective even with very few samples per class. Second, by using labeled data from the source task to compute the reference prior, we develop a new pretraining method for transfer learning that allows data from the target task to maximally affect the Bayesian posterior. Empirical validation of these methods is conducted on image classification datasets.


page 1

page 2

page 3

page 4


GuidedMix-Net: Semi-supervised Semantic Segmentation by Using Labeled Images as Reference

Semi-supervised learning is a challenging problem which aims to construc...

Learning Approximately Objective Priors

Informative Bayesian priors are often difficult to elicit, and when this...

TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data

Machine learning practitioners often have access to a spectrum of data: ...

Deep Hough-Transform Line Priors

Classical work on line segment detection is knowledge-based; it uses car...

Exploit Multiple Reference Graphs for Semi-supervised Relation Extraction

Manual annotation of the labeled data for relation extraction is time-co...

Post-Inference Prior Swapping

While Bayesian methods are praised for their ability to incorporate usef...

Biclustering random matrix partitions with an application to classification of forensic body fluids

Classification of unlabeled data is usually achieved by supervised learn...

Please sign up or login with your details

Forgot password? Click here to reset