Inter-Image Communication for Weakly Supervised Localization

by   Xiaolin Zhang, et al.

Weakly supervised localization aims at finding target object regions using only image-level supervision. However, localization maps extracted from classification networks are often not accurate due to the lack of fine pixel-level supervision. In this paper, we propose to leverage pixel-level similarities across different objects for learning more accurate object locations in a complementary way. Particularly, two kinds of constraints are proposed to prompt the consistency of object features within the same categories. The first constraint is to learn the stochastic feature consistency among discriminative pixels that are randomly sampled from different images within a batch. The discriminative information embedded in one image can be leveraged to benefit its counterpart with inter-image communication. The second constraint is to learn the global consistency of object features throughout the entire dataset. We learn a feature center for each category and realize the global feature consistency by forcing the object features to approach class-specific centers. The global centers are actively updated with the training process. The two constraints can benefit each other to learn consistent pixel-level features within the same categories, and finally improve the quality of localization maps. We conduct extensive experiments on two popular benchmarks, i.e., ILSVRC and CUB-200-2011. Our method achieves the Top-1 localization error rate of 45.17 surpassing the current state-of-the-art method by a large margin. The code is available at


page 2

page 11


Self-produced Guidance for Weakly-supervised Object Localization

Weakly supervised methods usually generate localization results based on...

Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization

Classification activation map (CAM), utilizing the classification struct...

Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps

Recently, remarkable progress has been made in weakly supervised object ...

Referring Image Segmentation Using Text Supervision

Existing Referring Image Segmentation (RIS) methods typically require ex...

Multi-spectral Class Center Network for Face Manipulation Detection and Localization

As Deepfake contents continue to proliferate on the internet, advancing ...

CAR: Class-aware Regularizations for Semantic Segmentation

Recent segmentation methods, such as OCR and CPNet, utilizing "class lev...

Weakly Supervised Object Localization with Inter-Intra Regulated CAMs

Weakly supervised object localization (WSOL) aims to locate objects in i...

Please sign up or login with your details

Forgot password? Click here to reset