Not always about you: Prioritizing community needs when developing endangered language technology

by   Zoey Liu, et al.

Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is high-resource. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low resource languages and endangered languages. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders.


page 1

page 2

page 3

page 4


Learnings from Technological Interventions in a Low Resource Language: A Case-Study on Gondi

The primary obstacle to developing technologies for low-resource languag...

Automatic Keyboard Layout Design for Low-Resource Latin-Script Languages

We present our approach to automatically designing and implementing keyb...

Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki

One of the major challenges that under-represented and endangered langua...

Toward More Meaningful Resources for Lower-resourced Languages

In this position paper, we describe our perspective on how meaningful re...

What a Creole Wants, What a Creole Needs

In recent years, the natural language processing (NLP) community has giv...

Challenges in Developing LRs for Non-Scheduled Languages: A Case of Magahi

Magahi is an Indo-Aryan Language, spoken mainly in the Eastern parts of ...

Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

The primary obstacle to developing technologies for low-resource languag...

Please sign up or login with your details

Forgot password? Click here to reset