New Image Captioning Encoder via Semantic Visual Feature Matching for Heavy Rain Images

by   Chang-Hwan Son, et al.

Image captioning generates text that describes scenes from input images. It has been developed for high quality images taken in clear weather. However, in bad weather conditions, such as heavy rain, snow, and dense fog, the poor visibility owing to rain streaks, rain accumulation, and snowflakes causes a serious degradation of image quality. This hinders the extraction of useful visual features and results in deteriorated image captioning performance. To address practical issues, this study introduces a new encoder for captioning heavy rain images. The central idea is to transform output features extracted from heavy rain input images into semantic visual features associated with words and sentence context. To achieve this, a target encoder is initially trained in an encoder-decoder framework to associate visual features with semantic words. Subsequently, the objects in a heavy rain image are rendered visible by using an initial reconstruction subnetwork (IRS) based on a heavy rain model. The IRS is then combined with another semantic visual feature matching subnetwork (SVFMS) to match the output features of the IRS with the semantic visual features of the pretrained target encoder. The proposed encoder is based on the joint learning of the IRS and SVFMS. It is is trained in an end-to-end manner, and then connected to the pretrained decoder for image captioning. It is experimentally demonstrated that the proposed encoder can generate semantic visual features associated with words even from heavy rain images, thereby increasing the accuracy of the generated captions.


page 13

page 17

page 20

page 21


Reflective Decoding Network for Image Captioning

State-of-the-art image captioning methods mostly focus on improving visu...

Image Captioning based on Feature Refinement and Reflective Decoding

Automatically generating a description of an image in natural language i...

Learning to Guide Decoding for Image Captioning

Recently, much advance has been made in image captioning, and an encoder...

Multimodal Transformer with Multi-View Visual Representation for Image Captioning

Image captioning aims to automatically generate a natural language descr...

Enhanced Modality Transition for Image Captioning

Image captioning model is a cross-modality knowledge discovery task, whi...

An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU)

Image captioning by the encoder-decoder framework has shown tremendous a...

Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired

We propose a simple yet effective image captioning framework that can de...

Please sign up or login with your details

Forgot password? Click here to reset