DIVINE: Diverse Influential Training Points for Data Visualization and Model Refinement

by   Umang Bhatt, et al.

As the complexity of machine learning (ML) models increases, resulting in a lack of prediction explainability, several methods have been developed to explain a model's behavior in terms of the training data points that most influence the model. However, these methods tend to mark outliers as highly influential points, limiting the insights that practitioners can draw from points that are not representative of the training data. In this work, we take a step towards finding influential training points that also represent the training data well. We first review methods for assigning importance scores to training points. Given importance scores, we propose a method to select a set of DIVerse INfluEntial (DIVINE) training points as a useful explanation of model behavior. As practitioners might not only be interested in finding data points influential with respect to model accuracy, but also with respect to other important metrics, we show how to evaluate training data points on the basis of group fairness. Our method can identify unfairness-inducing training points, which can be removed to improve fairness outcomes. Our quantitative experiments and user studies show that visualizing DIVINE points helps practitioners understand and explain model behavior better than earlier approaches.


page 20

page 23

page 29


Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting

In consequential decision-making applications, mitigating unwanted biase...

On the Vulnerability of Data Points under Multiple Membership Inference Attacks and Target Models

Membership Inference Attacks (MIAs) infer whether a data point is in the...

On the Privacy Risks of Algorithmic Fairness

Algorithmic fairness and privacy are essential elements of trustworthy m...

PUMA: Performance Unchanged Model Augmentation for Training Data Removal

Preserving the performance of a trained model while removing unique char...

How Many and Which Training Points Would Need to be Removed to Flip this Prediction?

We consider the problem of identifying a minimal subset of training data...

Data Budgeting for Machine Learning

Data is the fuel powering AI and creates tremendous value for many domai...

Reconnoitering the class distinguishing abilities of the features, to know them better

The relevance of machine learning (ML) in our daily lives is closely int...

Please sign up or login with your details

Forgot password? Click here to reset