CSTR: A Classification Perspective on Scene Text Recognition

02/22/2021
by   Hongxiang Cai, et al.
0

The prevalent perspectives of scene text recognition are from sequence to sequence (seq2seq) and segmentation. In this paper, we propose a new perspective on scene text recognition, in which we model the scene text recognition as an image classification problem. Based on the image classification perspective, a scene text recognition model is proposed, which is named as CSTR. The CSTR model consists of a series of convolutional layers and a global average pooling layer at the end, followed by independent multi-class classification heads, each of which predicts the corresponding character of the word sequence in input image. The CSTR model is easy to train using parallel cross entropy losses. CSTR is as simple as image classification models like ResNet <cit.> which makes it easy to implement, and the fully convolutional neural network architecture makes it efficient to train and deploy. We demonstrate the effectiveness of the classification perspective on scene text recognition with thorough experiments. Futhermore, CSTR achieves nearly state-of-the-art performance on six public benchmarks including regular text, irregular text. The code will be available at https://github.com/Media-Smart/vedastr.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2016

Robust Scene Text Recognition with Automatic Rectification

Recognizing text in natural images is a challenging task with many unsol...
research
08/29/2019

Focus-Enhanced Scene Text Recognition with Deformable Convolutions

Recently, scene text recognition methods based on deep learning have spr...
research
02/10/2020

A New Perspective for Flexible Feature Gathering in Scene Text Recognition Via Character Anchor Pooling

Irregular scene text recognition has attracted much attention from the r...
research
11/02/2018

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

Recognizing irregular text in natural scene images is challenging due to...
research
06/06/2023

Looking and Listening: Audio Guided Text Recognition

Text recognition in the wild is a long-standing problem in computer visi...
research
04/05/2020

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

In recent years, scene text recognition is always regarded as a sequence...
research
11/08/2017

SIMILARnet: Simultaneous Intelligent Localization and Recognition Network

Global Average Pooling (GAP) [4] has been used previously to generate cl...

Please sign up or login with your details

Forgot password? Click here to reset