SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning

by   Vasisht Duddu, et al.

Data used to train machine learning (ML) models can be sensitive. Membership inference attacks (MIAs), attempting to determine whether a particular data record was used to train an ML model, risk violating membership privacy. ML model builders need a principled definition of a metric that enables them to quantify the privacy risk of (a) individual training data records, (b) independently of specific MIAs, (c) efficiently. None of the prior work on membership privacy risk metrics simultaneously meets all of these criteria. We propose such a metric, SHAPr, which uses Shapley values to quantify a model's memorization of an individual training data record by measuring its influence on the model's utility. This memorization is a measure of the likelihood of a successful MIA. Using ten benchmark datasets, we show that SHAPr is effective (precision: 0.94± 0.06, recall: 0.88± 0.06) in estimating susceptibility of a training data record for MIAs, and is efficient (computable within minutes for smaller datasets and in  90 minutes for the largest dataset). SHAPr is also versatile in that it can be used for other purposes like assessing fairness or assigning valuation for subsets of a dataset. For example, we show that SHAPr correctly captures the disproportionate vulnerability of different subgroups to MIAs. Using SHAPr, we show that the membership privacy risk of a dataset is not necessarily improved by removing high risk training data records, thereby confirming an observation from prior work in a significantly extended setting (in ten datasets, removing up to 50


page 9

page 13


Systematic Evaluation of Privacy Risks of Machine Learning Models

Machine learning models are prone to memorizing sensitive data, making t...

Enhanced Membership Inference Attacks against Machine Learning Models

How much does a given trained model leak about each individual data reco...

On Inferring Training Data Attributes in Machine Learning Models

A number of recent works have demonstrated that API access to machine le...

Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data

A fancy learning algorithm A outperforms a baseline method B when they a...

Private Training Set Inspection in MLaaS

Machine Learning as a Service (MLaaS) is a popular cloud-based solution ...

Membership Encoding for Deep Learning

Machine learning as a service (MLaaS), and algorithm marketplaces are on...

The Natural Auditor: How To Tell If Someone Used Your Words To Train Their Model

To help enforce data-protection regulations such as GDPR and detect unau...

Please sign up or login with your details

Forgot password? Click here to reset