Multi-Source (Pre-)Training for Cross-Domain Measurement, Unit and Context Extraction

08/05/2023
by   Yueling Li, et al.
0

We present a cross-domain approach for automated measurement and context extraction based on pre-trained language models. We construct a multi-source, multi-domain corpus and train an end-to-end extraction pipeline. We then apply multi-source task-adaptive pre-training and fine-tuning to benchmark the cross-domain generalization capability of our model. Further, we conceptualize and apply a task-specific error analysis and derive insights for future work. Our results suggest that multi-source training leads to the best overall results, while single-source training yields the best results for the respective individual domain. While our setup is successful at extracting quantity values and units, more research is needed to improve the extraction of contextual entities. We make the cross-domain corpus used in this work available online.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2020

CrossNER: Evaluating Cross-Domain Named Entity Recognition

Cross-domain named entity recognition (NER) models are able to cope with...
research
08/21/2023

Contrastive Graph Prompt-tuning for Cross-domain Recommendation

Recommender systems are frequently challenged by the data sparsity probl...
research
02/23/2022

UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training

We present UnifiedQA-v2, a QA model built with the same process as Unifi...
research
05/05/2023

Harnessing the Power of BERT in the Turkish Clinical Domain: Pretraining Approaches for Limited Data Scenarios

In recent years, major advancements in natural language processing (NLP)...
research
05/18/2023

Silver Syntax Pre-training for Cross-Domain Relation Extraction

Relation Extraction (RE) remains a challenging task, especially when con...
research
02/27/2023

Fluid Transformers and Creative Analogies: Exploring Large Language Models' Capacity for Augmenting Cross-Domain Analogical Creativity

Cross-domain analogical reasoning is a core creative ability that can be...
research
02/25/2023

Prompt-based Learning for Text Readability Assessment

We propose the novel adaptation of a pre-trained seq2seq model for reada...

Please sign up or login with your details

Forgot password? Click here to reset