A large-scale field test on word-image classification in large historical document collections using a traditional and two deep-learning methods

04/17/2019
by   Lambert Schomaker, et al.
0

This technical report describes a practical field test on word-image classification in a very large collection of more than 300 diverse handwritten historical manuscripts, with 1.6 million unique labeled images and more than 11 million images used in testing. Results indicate that several deep-learning tests completely failed (mean accuracy 83 output units (lexical words) in one-hot encoding for classification, performance steeply drops to almost zero percent accuracy, even with a modest size of the pre-final (i.e., penultimate) layer (150 units). A traditional feature method (BOVW) displays a consistent performance over numbers of classes and numbers of training examples (mean accuracy 87 nearest mean on the output of the pre-final layer of an Inception V3 network, for each book, only yielded mediocre results (mean accuracy 49%), but was not sensitive to high numbers of classes. Notably, this experiment was only possible on the basis of labels that were harvested on the basis of a traditional method which already works starting from a single labeled image per class. It is expected that the performance of the failed deep learning tests can be repaired, but only on the basis of human handcrafting (sic) of network architecture and hyperparameters. When the failed problematic books are not considered, end-to-end CNN training yields about 95 dominated by a large subset of Chinese characters, performances for other script styles being lower.

READ FULL TEXT
research
03/26/2020

Classification of the Chinese Handwritten Numbers with Supervised Projective Dictionary Pair Learning

Image classification has become a key ingredient in the field of compute...
research
12/11/2019

Lifelong learning for text retrieval and recognition in historical handwritten document collections

This chapter provides an overview of the problems that need to be dealt ...
research
06/27/2018

This looks like that: deep learning for interpretable image recognition

When we are faced with challenging image classification tasks, we often ...
research
09/06/2023

Improving Image Classification of Knee Radiographs: An Automated Image Labeling Approach

Large numbers of radiographic images are available in knee radiology pra...
research
04/20/2016

Local Binary Pattern for Word Spotting in Handwritten Historical Document

Digital libraries store images which can be highly degraded and to index...
research
01/08/2018

Deep Nearest Class Mean Model for Incremental Odor Classification

In recent years, more and more machine learning algorithms have been app...
research
04/20/2016

Network of Experts for Large-Scale Image Categorization

We present a tree-structured network architecture for large scale image ...

Please sign up or login with your details

Forgot password? Click here to reset