Computationally Assisted Quality Control for Public Health Data Streams

06/29/2023
by   Ananya Joshi, et al.
0

Irregularities in public health data streams (like COVID-19 Cases) hamper data-driven decision-making for public health stakeholders. A real-time, computer-generated list of the most important, outlying data points from thousands of daily-updated public health data streams could assist an expert reviewer in identifying these irregularities. However, existing outlier detection frameworks perform poorly on this task because they do not account for the data volume or for the statistical properties of public health streams. Accordingly, we developed FlaSH (Flagging Streams in public Health), a practical outlier detection framework for public health data users that uses simple, scalable models to capture these statistical properties explicitly. In an experiment where human experts evaluate FlaSH and existing methods (including deep learning approaches), FlaSH scales to the data volume of this task, matches or exceeds these other methods in mean accuracy, and identifies the outlier points that users empirically rate as more helpful. Based on these results, FlaSH has been deployed on data streams used by public health stakeholders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2018

A Proposal for Outlier and Noise Detection in Public Officials' Affidavits

Outlier and noise detection processes are highly useful in the quality a...
research
11/23/2022

Precision Medicine for the Population-The Hope and Hype of Public Health Genomics

Public health is the most recent of the biomedical sciences to be seduce...
research
05/02/2019

Web data mining for public health purposes

For a long time, public health events, such as disease incidence or vacc...
research
10/22/2022

SplitStrains, a tool to identify and separate mixed Mycobacterium tuberculosis infections from WGS data

The occurrence of multiple strains of a bacterial pathogen such as M. tu...
research
10/06/2016

A Robust Framework for Classifying Evolving Document Streams in an Expert-Machine-Crowd Setting

An emerging challenge in the online classification of social media data ...
research
05/23/2023

GenSpectrum Chat: Data Exploration in Public Health Using Large Language Models

Introduction: The COVID-19 pandemic highlighted the importance of making...

Please sign up or login with your details

Forgot password? Click here to reset