ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

08/20/2023
by   Mingxin Huang, et al.
0

In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-based framework. While previous studies have shown the crucial importance of the intrinsic synergy between text detection and recognition, recent advances in Transformer-based methods usually adopt an implicit synergy strategy with shared query, which can not fully realize the potential of these two interactive tasks. In this paper, we argue that the explicit synergy considering distinct characteristics of text detection and recognition can significantly improve the performance text spotting. To this end, we introduce a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder. Specifically, we decompose the conventional shared query into task-aware queries for text polygon and content, respectively. Through the decoder with the proposed vision-language communication module, the queries interact with each other in an explicit manner while preserving discriminative patterns of text detection and recognition, thus improving performance significantly. Additionally, we propose a task-aware query initialization scheme to ensure stable training. Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods. Code is available at https://github.com/mxin262/ESTextSpotter.

READ FULL TEXT

page 3

page 8

research
03/19/2022

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

End-to-end scene text spotting has attracted great attention in recent y...
research
11/09/2022

Masked Vision-Language Transformers for Scene Text Recognition

Scene text recognition (STR) enables computers to recognize and read the...
research
03/20/2022

End-to-End Video Text Spotting with Transformer

Recent video text spotting methods usually require the three-staged pipe...
research
01/04/2023

SPTS v2: Single-Point Scene Text Spotting

End-to-end scene text spotting has made significant progress due to its ...
research
05/27/2021

When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model

In recent years, significant progress has been made in the research of f...
research
11/25/2022

The Naughtyformer: A Transformer Understands Offensive Humor

Jokes are intentionally written to be funny, but not all jokes are creat...
research
05/13/2021

Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition

Text recognition is a popular topic for its broad applications. In this ...

Please sign up or login with your details

Forgot password? Click here to reset