Enhancing Handwritten Text Recognition with N-gram sequence decomposition and Multitask Learning

12/28/2020
by   Vasiliki Tassopoulou, et al.
15

Current state-of-the-art approaches in the field of Handwritten Text Recognition are predominately single task with unigram, character level target units. In our work, we utilize a Multi-task Learning scheme, training the model to perform decompositions of the target sequence with target units of different granularity, from fine to coarse. We consider this method as a way to utilize n-gram information, implicitly, in the training process, while the final recognition is performed using only the unigram output. the difference of the internal Unigram decoding of such a multi-task approach highlights the capability of the learned internal representations, imposed by the different n-grams at the training step. We select n-grams as our target units and we experiment from unigrams to fourgrams, namely subword level granularities. These multiple decompositions are learned from the network with task-specific CTC losses. Concerning network architectures, we propose two alternatives, namely the Hierarchical and the Block Multi-task. Overall, our proposed model, even though evaluated only on the unigram task, outperforms its counterpart single-task by absolute 2.52% WER and 1.02% CER, in the greedy decoding, without any computational overhead during inference, hinting towards successfully imposing an implicit language model.

READ FULL TEXT

page 1

page 3

research
07/18/2018

Hierarchical Multi Task Learning With CTC

In Automatic Speech Recognition, it is still challenging to learn useful...
research
06/15/2021

Multi-script Handwritten Digit Recognition Using Multi-task Learning

Handwritten digit recognition is one of the extensively studied area in ...
research
10/10/2016

Latent Sequence Decompositions

We present the Latent Sequence Decompositions (LSD) framework. LSD decom...
research
09/28/2018

Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images

There are two types of information in each handwritten word image: expli...
research
04/07/2020

Multi-Task Learning via Co-Attentive Sharing for Pedestrian Attribute Recognition

Learning to predict multiple attributes of a pedestrian is a multi-task ...
research
03/01/2017

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

Most existing sequence labelling models rely on a fixed decomposition of...
research
04/30/2018

Staircase Network: structural language identification via hierarchical attentive units

Language recognition system is typically trained directly to optimize cl...

Please sign up or login with your details

Forgot password? Click here to reset