Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

04/16/2022
by   Caiyuan Zheng, et al.
1

Scene text recognition (STR) attracts much attention over the years because of its wide application. Most methods train STR model in a fully supervised manner which requires large amounts of labeled data. Although synthetic data contributes a lot to STR, it suffers from the real-tosynthetic domain gap that restricts model performance. In this work, we aim to boost STR models by leveraging both synthetic data and the numerous real unlabeled images, exempting human annotation cost thoroughly. A robust consistency regularization based semi-supervised framework is proposed for STR, which can effectively solve the instability issue due to domain inconsistency between synthetic and real images. A character-level consistency regularization is designed to mitigate the misalignment between characters in sequence recognition. Extensive experiments on standard text recognition benchmarks demonstrate the effectiveness of the proposed method. It can steadily improve existing STR models, and boost an STR model to achieve new state-of-the-art results. To our best knowledge, this is the first consistency regularization based framework that applies successfully to STR.

READ FULL TEXT

page 3

page 6

research
11/26/2021

Traditional Chinese Synthetic Datasets Verified with Labeled Data for Scene Text Recognition

Scene text recognition (STR) has been widely studied in academia and ind...
research
03/07/2021

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Scene text recognition (STR) task has a common practice: All state-of-th...
research
03/08/2021

Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

Data-driven based approaches, in spite of great success in many tasks, h...
research
08/23/2022

Consistency Regularization for Domain Adaptation

Collection of real world annotations for training semantic segmentation ...
research
08/17/2022

Domestic sound event detection by shift consistency mean-teacher training and adversarial domain adaptation

Semi-supervised learning and domain adaptation techniques have drawn inc...
research
07/17/2023

Revisiting Scene Text Recognition: A Data Perspective

This paper aims to re-assess scene text recognition (STR) from a data-or...
research
11/24/2020

Temporal Action Detection with Multi-level Supervision

Training temporal action detection in videos requires large amounts of l...

Please sign up or login with your details

Forgot password? Click here to reset