SampleAhead: Online Classifier-Sampler Communication for Learning from Synthesized Data

by   Qi Chen, et al.

State-of-the-art techniques of artificial intelligence, in particular deep learning, are mostly data-driven. However, collecting and manually labeling a large scale dataset is both difficult and expensive. A promising alternative is to introduce synthesized training data, so that the dataset size can be significantly enlarged with little human labor. But, this raises an important problem in active vision: given an infinite data space, how to effectively sample a finite subset to train a visual classifier? This paper presents an approach for learning from synthesized data effectively. The motivation is straightforward -- increasing the probability of seeing difficult training data. We introduce a module named SampleAhead to formulate the learning process into an online communication between a classifier and a sampler, and update them iteratively. In each round, we adjust the sampling distribution according to the classification results, and train the classifier using the data sampled from the updated distribution. Experiments are performed by introducing synthesized images rendered from ShapeNet models to assist PASCAL3D+ classification. Our approach enjoys higher classification accuracy, especially in the scenario of a limited number of training samples. This demonstrates its efficiency in exploring the infinite data space.


page 1

page 2

page 3

page 4


Synthesize-It-Classifier: Learning a Generative Classifier through RecurrentSelf-analysis

In this work, we show the generative capability of an image classifier n...

Minimum Cost Active Labeling

Labeling a data set completely is important for groundtruth generation. ...

Active Sampler: Light-weight Accelerator for Complex Data Analytics at Scale

Recent years have witnessed amazing outcomes from "Big Models" trained b...

FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

Histopathological tissue classification is a fundamental task in computa...

Overcoming Overconfidence for Active Learning

It is not an exaggeration to say that the recent progress in artificial ...

Material Classification in the Wild: Do Synthesized Training Data Generalise Better than Real-World Training Data?

We question the dominant role of real-world training images in the field...

Efficient Failure Pattern Identification of Predictive Algorithms

Given a (machine learning) classifier and a collection of unlabeled data...

Please sign up or login with your details

Forgot password? Click here to reset