Improving Screening Processes via Calibrated Subset Selection

02/02/2022
by   Lequn Wang, et al.
0

Many selection processes such as finding patients qualifying for a medical trial or retrieval pipelines in search engines consist of multiple stages, where an initial screening stage focuses the resources on shortlisting the most promising candidates. In this paper, we investigate what guarantees a screening classifier can provide, independently of whether it is constructed manually or trained. We find that current solutions do not enjoy distribution-free theoretical guarantees – we show that, in general, even for a perfectly calibrated classifier, there always exist specific pools of candidates for which its shortlist is suboptimal. Then, we develop a distribution-free screening algorithm – called Calibrated Subset Selection (CSS) – that, given any classifier and some amount of calibration data, finds near-optimal shortlists of candidates that contain a desired number of qualified candidates in expectation. Moreover, we show that a variant of our algorithm that calibrates a given classifier multiple times across specific groups can create shortlists with provable diversity guarantees. Experiments on US Census survey data validate our theoretical results and show that the shortlists provided by our algorithm are superior to those provided by several competitive baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2023

On the Within-Group Discrimination of Screening Classifiers

Screening classifiers are increasingly used to identify qualified candid...
research
10/04/2022

Selection by Prediction with Conformal p-values

Decision making or scientific discovery pipelines such as job hiring and...
research
05/30/2022

Fairness in the First Stage of Two-Stage Recommender Systems

Many large-scale recommender systems consist of two stages, where the fi...
research
03/09/2023

Improving computation efficiency using input and architecture features for a virtual screening application

Virtual screening is an early stage of the drug discovery process that s...
research
09/23/2021

Optimal Decision Making in High-Throughput Virtual Screening Pipelines

Effective selection of the potential candidates that meet certain condit...
research
07/17/2023

Towards Automated Design of Riboswitches

Experimental screening and selection pipelines for the discovery of nove...
research
07/31/2019

SketchyCoreSVD: SketchySVD from Random Subsampling of the Data Matrix

We present a method called SketchyCoreSVD to compute the near-optimal ra...

Please sign up or login with your details

Forgot password? Click here to reset