Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks

by   Vishal Kaushal, et al.

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges respectively. A special class of subset selection functions naturally model notions of diversity, coverage and representation and they can be used to eliminate redundancy and thus lend themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Disparity-Min models for training-data subset selection and reducing labeling effort. We do this for a variety of computer vision tasks including Gender Recognition, Scene Recognition and Object Recognition. Our results show that subset selection done in the right way can add 2-3 accuracy on existing baselines, particularly in the case of less training data. This allows the training of complex machine learning models (like Convolutional Neural Networks) with much less training data while incurring minimal performance loss.


page 1

page 2

page 3

page 4


Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

Supervised machine learning based state-of-the-art computer vision techn...

Balancing Constraints and Submodularity in Data Subset Selection

Deep learning has yielded extraordinary results in vision and natural la...

Challenges and Opportunities for Machine Learning Classification of Behavior and Mental State from Images

Computer Vision (CV) classifiers which distinguish and detect nonverbal ...

Training Data Subset Selection for Regression with Controlled Generalization Error

Data subset selection from a large number of training instances has been...

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Large scale machine learning and deep models are extremely data-hungry. ...

Learning to Abstain via Curve Optimization

In practical applications of machine learning, it is often desirable to ...

Data Lifecycle Management in Evolving Input Distributions for Learning-based Aerospace Applications

As input distributions evolve over a mission lifetime, maintaining perfo...

Please sign up or login with your details

Forgot password? Click here to reset