Latent Model Ensemble with Auto-localization

by   Miao Sun, et al.

Deep Convolutional Neural Networks (CNN) have exhibited superior performance in many visual recognition tasks including image classification, object detection, and scene label- ing, due to their large learning capacity and resistance to overfit. For the image classification task, most of the current deep CNN- based approaches take the whole size-normalized image as input and have achieved quite promising results. Compared with the previously dominating approaches based on feature extraction, pooling, and classification, the deep CNN-based approaches mainly rely on the learning capability of deep CNN to achieve superior results: the burden of minimizing intra-class variation while maximizing inter-class difference is entirely dependent on the implicit feature learning component of deep CNN; we rely upon the implicitly learned filters and pooling component to select the discriminative regions, which correspond to the activated neurons. However, if the irrelevant regions constitute a large portion of the image of interest, the classification performance of the deep CNN, which takes the whole image as input, can be heavily affected. To solve this issue, we propose a novel latent CNN framework, which treats the most discriminate region as a latent variable. We can jointly learn the global CNN with the latent CNN to avoid the aforementioned big irrelevant region issue, and our experimental results show the evident advantage of the proposed latent CNN over traditional deep CNN: latent CNN outperforms the state-of-the-art performance of deep CNN on standard benchmark datasets including the CIFAR-10, CIFAR- 100, MNIST and PASCAL VOC 2007 Classification dataset.


page 1

page 3


Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Existing deep convolutional neural networks (CNNs) require a fixed-size ...

Gaussian RAM: Lightweight Image Classification via Stochastic Retina-Inspired Glimpse and Reinforcement Learning

Previous studies on image classification have mainly focused on the perf...

Multi-Label Image Classification with Regional Latent Semantic Dependencies

Deep convolution neural networks (CNN) have demonstrated advanced perfor...

ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection

In recent years, Convolutional Neural Networks (CNNs) have shown superio...

Learn To Pay Attention

We propose an end-to-end-trainable attention module for convolutional ne...

OCFormer: One-Class Transformer Network for Image Classification

We propose a novel deep learning framework based on Vision Transformers ...

MgNet: A Unified Framework of Multigrid and Convolutional Neural Network

We develop a unified model, known as MgNet, that simultaneously recovers...

Please sign up or login with your details

Forgot password? Click here to reset