Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

03/11/2021
by   Shancheng Fang, et al.
0

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition. Code is available at https://github.com/FangShancheng/ABINet.

READ FULL TEXT

page 3

page 7

research
11/19/2022

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

Scene text spotting is of great importance to the computer vision commun...
research
04/06/2022

IterVM: Iterative Vision Modeling Module for Scene Text Recognition

Scene text recognition (STR) is a challenging problem due to the imperfe...
research
08/13/2021

IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition

Although recent works based on deep learning have made progress in impro...
research
07/14/2022

Scene Text Recognition with Permuted Autoregressive Sequence Models

Context-aware STR methods typically use internal autoregressive (AR) lan...
research
07/25/2023

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

Due to the enormous technical challenges and wide range of applications,...
research
09/07/2019

LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning

Most research on lifelong learning (LLL) applies to images or games, but...
research
05/21/2023

PiVe: Prompting with Iterative Verification Improving Graph-based Generative Capability of LLMs

Large language models (LLMs) have shown great abilities of solving vario...

Please sign up or login with your details

Forgot password? Click here to reset