CHiLL: Zero-shot Custom Interpretable Feature Extraction from Clinical Notes with Large Language Models

by   Denis Jered McInerney, et al.

Large Language Models (LLMs) have yielded fast and dramatic progress in NLP, and now offer strong few- and zero-shot capabilities on new tasks, reducing the need for annotation. This is especially exciting for the medical domain, in which supervision is often scant and expensive. At the same time, model predictions are rarely so accurate that they can be trusted blindly. Clinicians therefore tend to favor "interpretable" classifiers over opaque LLMs. For example, risk prediction tools are often linear models defined over manually crafted predictors that must be laboriously extracted from EHRs. We propose CHiLL (Crafting High-Level Latents), which uses LLMs to permit natural language specification of high-level features for linear models via zero-shot feature extraction using expert-composed queries. This approach has the promise to empower physicians to use their domain expertise to craft features which are clinically meaningful for a downstream task of interest, without having to manually extract these from raw EHR (as often done now). We are motivated by a real-world risk prediction task, but as a reproducible proxy, we use MIMIC-III and MIMIC-CXR data and standard predictive tasks (e.g., 30-day readmission) to evaluate our approach. We find that linear models using automatically extracted features are comparably performant to models using reference features, and provide greater interpretability than linear models using "Bag-of-Words" features. We verify that learned feature weights align well with clinical expectations.


Large Language Models are Zero-Shot Clinical Information Extractors

We show that large language models, such as GPT-3, perform well at zero-...

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

We evaluate four state-of-the-art instruction-tuned large language model...

An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing

Large language models (LLMs) have shown remarkable capabilities in Natur...

Almanac: Knowledge-Grounded Language Models for Clinical Medicine

Large-language models have recently demonstrated impressive zero-shot ca...

CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition

Vision-Language models like CLIP have been widely adopted for various ta...

Please sign up or login with your details

Forgot password? Click here to reset