Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models

02/09/2023
by   Maciej P. Polak, et al.
0

Accurate and comprehensive material databases extracted from research papers are critical for materials science and engineering but require significant human effort to develop. In this paper we present a simple method of extracting materials data from full texts of research papers suitable for quickly developing modest-sized databases. The method requires minimal to no coding, prior knowledge about the extracted property, or model training, and provides high recall and almost perfect precision in the resultant database. The method is fully automated except for one human-assisted step, which typically requires just a few hours of human labor. The method builds on top of natural language processing and large general language models but can work with almost any such model. The language models GPT-3/3.5, bart and DeBERTaV3 are evaluated here for comparison. We provide a detailed detailed analysis of the methods performance in extracting bulk modulus data, obtaining up to 90 depending on the amount of human effort involved. We then demonstrate the methods broader effectiveness by developing a database of critical cooling rates for metallic glasses.

READ FULL TEXT

page 4

page 8

research
03/07/2023

Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering – Example of ChatGPT

There has been a growing effort to replace hand extraction of data from ...
research
09/27/2022

A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing

The ever-increasing number of materials science articles makes it hard t...
research
12/10/2022

Structured information extraction from complex scientific text with fine-tuned large language models

Intelligently extracting and linking complex scientific information from...
research
06/03/2023

Towards Coding Social Science Datasets with Language Models

Researchers often rely on humans to code (label, annotate, etc.) large s...
research
01/05/2021

Looking Through Glass: Knowledge Discovery from Materials Science Literature using Natural Language Processing

Most of the knowledge in materials science literature is in the form of ...
research
06/23/2023

LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding

Deductive coding is a widely used qualitative research method for determ...
research
03/30/2023

Recognition, recall, and retention of few-shot memories in large language models

The training of modern large language models (LLMs) takes place in a reg...

Please sign up or login with your details

Forgot password? Click here to reset