Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection

by   Zekun Li, et al.

Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have been proposed to locate text regions in map images automatically, but most of the algorithms are trained on out-ofdomain datasets (e.g., scenic images). Training data determines the quality of machine learning models, and manually annotating text regions in map images is labor-extensive and time-consuming. On the other hand, existing geographic data sources, such as Open- StreetMap (OSM), contain machine-readable map layers, which allow us to separate out the text layer and obtain text label annotations easily. However, the cartographic styles between OSM map tiles and historical maps are significantly different. This paper proposes a method to automatically generate an unlimited amount of annotated historical map images for training text detection models. We use a style transfer model to convert contemporary map images into historical style and place text labels upon them. We show that the state-of-the-art text detection models (e.g., PSENet) can benefit from the synthetic historical maps and achieve significant improvement for historical map text detection.


page 1

page 4

page 9


An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

Historical maps contain detailed geographic information difficult to fin...

The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps

Scanned historical maps in libraries and archives are valuable repositor...

Aligning geographic entities from historical maps for building knowledge graphs

Historical maps contain rich geographic information about the past of a ...

Combining Deep Learning and Mathematical Morphology for Historical Map Segmentation

The digitization of historical maps enables the study of ancient, fragil...

SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models

For successful scene text recognition (STR) models, synthetic text image...

Using maps to predict economic activity

We introduce a novel machine learning approach to leverage historical an...

A Large-Scale Comparison of Historical Text Normalization Systems

There is no consensus on the state-of-the-art approach to historical tex...

Please sign up or login with your details

Forgot password? Click here to reset