PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

11/24/2022
by   Shubhanshu Mishra, et al.
0

Online data streams make training machine learning models hard because of distribution shift and new patterns emerging over time. For natural language processing (NLP) tasks that utilize a collection of features based on lexicons and rules, it is important to adapt these features to the changing data. To address this challenge we introduce PyTAIL, a python library, which allows a human in the loop approach to actively train NLP models. PyTAIL enhances generic active learning, which only suggests new instances to label by also suggesting new features like rules and lexicons to label. Furthermore, PyTAIL is flexible enough for users to accept, reject, or update rules and lexicons as the model is being trained. Finally, we simulate the performance of PyTAIL on existing social media benchmark datasets for text classification. We compare various active learning strategies on these benchmarks. The model closes the gap with as few as 10 importance of tracking evaluation metric on remaining data (which is not yet merged with active learning) alongside the test dataset. This highlights the effectiveness of the model in accurately annotating the remaining dataset, which is especially suitable for batch processing of large unlabelled corpora. PyTAIL will be available at https://github.com/socialmediaie/pytail.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2021

Small-text: Active Learning for Text Classification in Python

We present small-text, a simple modular active learning library, which o...
research
01/31/2022

POTATO: exPlainable infOrmation exTrAcTion framewOrk

We present POTATO, a task- and languageindependent framework for human-i...
research
01/28/2022

Dominant Set-based Active Learning for Text Classification and its Application to Online Social Media

Recent advances in natural language processing (NLP) in online social me...
research
04/05/2022

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

We introduce Dynatask: an open source system for setting up custom NLP t...
research
06/15/2023

Re-Benchmarking Pool-Based Active Learning for Binary Classification

Active learning is a paradigm that significantly enhances the performanc...
research
04/20/2022

Active Few-Shot Learning with FASL

Recent advances in natural language processing (NLP) have led to strong ...
research
06/02/2023

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

The costly human effort required to prepare the training data of machine...

Please sign up or login with your details

Forgot password? Click here to reset