Demo of the Linguistic Field Data Management and Analysis System – LiFE

03/22/2022
by   Siddharth Singh, et al.
2

In the proposed demo, we will present a new software - Linguistic Field Data Management and Analysis System - LiFE (https://github.com/kmi-linguistics/life) - an open-source, web-based linguistic data management and analysis application that allows for systematic storage, management, sharing and usage of linguistic data collected from the field. The application allows users to store lexical items, sentences, paragraphs, audio-visual content with rich glossing / annotation; generate interactive and print dictionaries; and also train and use natural language processing tools and models for various purposes using this data. Since its a web-based application, it also allows for seamless collaboration among multiple persons and sharing the data, models, etc with each other. The system uses the Python-based Flask framework and MongoDB in the backend and HTML, CSS and Javascript at the frontend. The interface allows creation of multiple projects that could be shared with the other users. At the backend, the application stores the data in RDF format so as to allow its release as Linked Data over the web using semantic web technologies - as of now it makes use of the OntoLex-Lemon for storing the lexical data and Ligt for storing the interlinear glossed text and then internally linking it to the other linked lexicons and databases such as DBpedia and WordNet. Furthermore it provides support for training the NLP systems using scikit-learn and HuggingFace Transformers libraries as well as make use of any model trained using these libraries - while the user interface itself provides limited options for tuning the system, an externally-trained model could be easily incorporated within the application; similarly the dataset itself could be easily exported into a standard machine-readable format like JSON or CSV that could be consumed by other programs and pipelines.

READ FULL TEXT

page 1

page 2

page 3

research
02/19/2023

SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes

We present a neural Sanskrit Natural Language Processing (NLP) toolkit n...
research
04/07/2023

Halcyon – A Pathology Imaging and Feature analysis and Management System

Halcyon is a new pathology imaging analysis and feature management syste...
research
12/20/2018

SMILK, linking natural language and data from the web

As part of the SMILK Joint Lab, we studied the use of Natural Language P...
research
09/01/2021

Unsub Extender: a Python-based web application for visualizing Unsub data

This article introduces Unsub Extender, a free tool to help libraries an...
research
05/03/2018

Web Resource for Storing Collective Experience

Experience is what makes our life more effective that is why it is neces...
research
05/16/2020

The Missing Path: Diagnosing Incompleteness in Linked Data

The Semantic Web is an interoperable ecosystem where data producers, suc...
research
07/06/2021

Terminologies, modèles de données archéologiques et thésaurus documentaires

The HyperThésau and Bibracte numérique projects have given rise to a col...

Please sign up or login with your details

Forgot password? Click here to reset