Med-EASi: Finely Annotated Dataset and Models for Controllable Simplification of Medical Texts

02/17/2023
by   Chandrayee Basu, et al.
1

Automatic medical text simplification can assist providers with patient-friendly communication and make medical texts more accessible, thereby improving health literacy. But curating a quality corpus for this task requires the supervision of medical experts. In this work, we present Med-EASi (Medical dataset for Elaborative and Abstractive Simplification), a uniquely crowdsourced and finely annotated dataset for supervised simplification of short medical texts. Its expert-layman-AI collaborative annotations facilitate controllability over text simplification by marking four kinds of textual transformations: elaboration, replacement, deletion, and insertion. To learn medical text simplification, we fine-tune T5-large with four different styles of input-output combinations, leading to two control-free and two controllable versions of the model. We add two types of controllability into text simplification, by using a multi-angle training approach: position-aware, which uses in-place annotated inputs and outputs, and position-agnostic, where the model only knows the contents to be edited, but not their positions. Our results show that our fine-grained annotations improve learning compared to the unannotated baseline. Furthermore, position-aware control generates better simplification than the position-agnostic one. The data and code are available at https://github.com/Chandrayee/CTRL-SIMP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2022

Towards Attribute-Entangled Controllable Text Generation: A Pilot Study of Blessing Generation

Controllable Text Generation (CTG) has obtained great success due to its...
research
05/15/2018

Generating Continuous Representations of Medical Texts

We present an architecture that generates medical texts while learning a...
research
10/12/2022

RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media

We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly...
research
07/02/2020

Weakly Supervised Segmentation with Multi-scale Adversarial Attention Gates

Large, fine-grained image segmentation datasets, annotated at pixel-leve...
research
05/21/2023

Multilingual Simplification of Medical Texts

Automated text simplification aims to produce simple versions of complex...
research
04/27/2023

ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue System Development

Existing medical text datasets usually take the form of ques- tion and a...
research
05/03/2021

Leveraging Deep Representations of Radiology Reports in Survival Analysis for Predicting Heart Failure Patient Mortality

Utilizing clinical texts in survival analysis is difficult because they ...

Please sign up or login with your details

Forgot password? Click here to reset