A Machine Learning Approach to Quantitative Prosopography

by   Aayushee Gupta, et al.

Prosopography is an investigation of the common characteristics of a group of people in history, by a collective study of their lives. It involves a study of biographies to solve historical problems. If such biographies are unavailable, surviving documents and secondary biographical data are used. Quantitative prosopography involves analysis of information from a wide variety of sources about "ordinary people". In this paper, we present a machine learning framework for automatically designing a people gazetteer which forms the basis of quantitative prosopographical research. The gazetteer is learnt from the noisy text of newspapers using a Named Entity Recognizer (NER). It is capable of identifying influential people from it by making use of a custom designed Influential Person Index (IPI). Our corpus comprises of 14020 articles from a local newspaper, "The Sun", published from New York in 1896. Some influential people identified by our algorithm include Captain Donald Hankey (an English soldier), Dame Nellie Melba (an Australian operatic soprano), Hugh Allan (a Canadian shipping magnate) and Sir Hugh John McDonald (the first Prime Minister of Canada).


page 3

page 12

page 14


Protagonists' Tagger in Literary Domain – New Datasets and a Method for Person Entity Linkage

Semantic annotation of long texts, such as novels, remains an open chall...

Clustering Prominent People and Organizations in Topic-Specific Text Corpora

Named entities in text documents are the names of people, organization, ...

Transfer Learning across Several Centuries: Machine and Historian Integrated Method to Decipher Royal Secretary's Diary

A named entity recognition and classification plays the first and foremo...

People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts

Although pre-trained named entity recognition (NER) models are highly ac...

Old Content and Modern Tools - Searching Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910

Named Entity Recognition (NER), search, classification and tagging of na...

Building a Massive Corpus for Named Entity Recognition using Free Open Data Sources

With the recent progress in machine learning, boosted by techniques such...

Payday loans – blessing or growth suppressor? Machine Learning Analysis

The upsurge of real estate involves a variety of factors that have got i...

Please sign up or login with your details

Forgot password? Click here to reset