On Vocabulary Reliance in Scene Text Recognition

05/08/2020
by   Zhaoyi Wan, et al.
6

The pursuit of high performance on public benchmarks has been the driving force for research in scene text recognition, and notable progress has been achieved. However, a close investigation reveals a startling fact that the state-of-the-art methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary. We call this phenomenon "vocabulary reliance". In this paper, we establish an analytical framework to conduct an in-depth study on the problem of vocabulary reliance in scene text recognition. Key findings include: (1) Vocabulary reliance is ubiquitous, i.e., all existing algorithms more or less exhibit such characteristic; (2) Attention-based decoders prove weak in generalizing to words outside vocabulary and segmentation-based decoders perform well in utilizing visual features; (3) Context modeling is highly coupled with the prediction layers. These findings provide new insights and can benefit future research in scene text recognition. Furthermore, we propose a simple yet effective mutual learning strategy to allow models of two families (attention-based and segmentation-based) to learn collaboratively. This remedy alleviates the problem of vocabulary reliance and improves the overall scene text recognition performance.

READ FULL TEXT
research
09/01/2022

1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words

Scene text recognition has attracted increasing interest in recent years...
research
08/04/2022

1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

This report presents our winner solution to ECCV 2022 challenge on Out-o...
research
05/01/2022

Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching

Machine learning algorithms have shown potential to improve prefetching ...
research
11/24/2021

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

Semantic information has been proved effective in scene text recognition...
research
09/02/2022

Vision-Language Adaptive Mutual Decoder for OOV-STR

Recent works have shown huge success of deep learning models for common ...
research
09/07/2017

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Scene text recognition has been a hot research topic in computer vision ...
research
12/20/2022

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

Detecting actions in untrimmed videos should not be limited to a small, ...

Please sign up or login with your details

Forgot password? Click here to reset