Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

11/24/2021
by   Changxu Cheng, et al.
0

Semantic information has been proved effective in scene text recognition. Most existing methods tend to couple both visual and semantic information in an attention-based decoder. As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance. In this paper, we propose a novel Visual-Semantic Decoupling Network (VSDN) to address the problem. Our VSDN contains a Visual Decoder (VD) and a Semantic Decoder (SD) to learn purer visual and semantic feature representation respectively. Besides, a Semantic Encoder (SE) is designed to match SD, which can be pre-trained together by additional inexpensive large vocabulary via a simple word correction task. Thus the semantic feature is more unbiased and precise to guide the visual feature alignment and enrich the final character representation. Experiments show that our method achieves state-of-the-art or competitive results on the standard benchmarks, and outperforms the popular baseline by a large margin under circumstances where the training set has a small size of vocabulary.

READ FULL TEXT
research
11/22/2021

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The attention-based encoder-decoder framework is becoming popular in sce...
research
12/02/2021

Visual-Semantic Transformer for Scene Text Recognition

Modeling semantic information is helpful for scene text recognition. In ...
research
07/26/2021

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

Although text recognition has significantly evolved over the years, stat...
research
09/05/2019

Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation

Recently, automatic image caption generation has been an important focus...
research
05/08/2020

On Vocabulary Reliance in Scene Text Recognition

The pursuit of high performance on public benchmarks has been the drivin...
research
07/20/2023

Learning Discriminative Visual-Text Representation for Polyp Re-Identification

Colonoscopic Polyp Re-Identification aims to match a specific polyp in a...
research
04/18/2021

Multi-scale Self-calibrated Network for Image Light Source Transfer

Image light source transfer (LLST), as the most challenging task in the ...

Please sign up or login with your details

Forgot password? Click here to reset