Inference for the Case Probability in High-dimensional Logistic Regression

12/13/2020
by   Zijian Guo, et al.
0

Labeling patients in electronic health records with respect to their statuses of having a disease or condition, i.e. case or control statuses, has increasingly relied on prediction models using high-dimensional variables derived from structured and unstructured electronic health record data. A major hurdle currently is a lack of valid statistical inference methods for the case probability. In this paper, considering high-dimensional sparse logistic regression models for prediction, we propose a novel bias-corrected estimator for the case probability through the development of linearization and variance enhancement techniques. We establish asymptotic normality of the proposed estimator for any loading vector in high dimensions. We construct a confidence interval for the case probability and propose a hypothesis testing procedure for patient case-control labelling. We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2021

SIHR: An R Package for Statistical Inference in High-dimensional Linear and Logistic Regression Models

We introduce and illustrate through numerical examples the R package wh...
research
04/29/2019

Individualized Treatment Selection: An Optimal Hypothesis Testing Approach In High-dimensional Models

The ability to predict individualized treatment effects (ITEs) based on ...
research
01/20/2022

A Visual Analytics Approach to Building Logistic Regression Models and its Application to Health Records

Multidimensional data analysis has become increasingly important in many...
research
01/30/2019

Electronic Health Record Phenotyping with Internally Assessable Performance (PhIAP) using Anchor-Positive and Unlabeled Patients

Building phenotype models using electronic health record (EHR) data conv...
research
03/23/2021

On the global identifiability of logistic regression models with misclassified outcomes

In the last decade, the secondary use of large data from health systems,...
research
02/16/2019

Privacy Preserving Integrative Regression Analysis of High-dimensional Heterogeneous Data

Meta-analyzing multiple studies, enabling more precise estimation and in...
research
12/01/2017

Prediction-Constrained Topic Models for Antidepressant Recommendation

Supervisory signals can help topic models discover low-dimensional data ...

Please sign up or login with your details

Forgot password? Click here to reset