Differentially Private Distributed Data Summarization under Covariate Shift

by   Kanthi Sarpatwar, et al.

We envision AI marketplaces to be platforms where consumers, with very less data for a target task, can obtain a relevant model by accessing many private data sources with vast number of data samples. One of the key challenges is to construct a training dataset that matches a target task without compromising on privacy of the data sources. To this end, we consider the following distributed data summarizataion problem. Given K private source datasets denoted by [D_i]_i∈ [K] and a small target validation set D_v, which may involve a considerable covariate shift with respect to the sources, compute a summary dataset D_s⊆_i∈ [K] D_i such that its statistical distance from the validation dataset D_v is minimized. We use the popular Maximum Mean Discrepancy as the measure of statistical distance. The non-private problem has received considerable attention in prior art, for example in prototype selection (Kim et al., NIPS 2016). Our work is the first to obtain strong differential privacy guarantees while ensuring the quality guarantees of the non-private version. We study this problem in a Parsimonious Curator Privacy Model, where a trusted curator coordinates the summarization process while minimizing the amount of private information accessed. Our central result is a novel protocol that (a) ensures the curator accesses at most O(K^1/3|D_s| + |D_v|) points (b) has formal privacy guarantees on the leakage of information between the data owners and (c) closely matches the best known non-private greedy algorithm. Our protocol uses two hash functions, one inspired by the Rahimi-Recht random features method and the second leverages state of the art differential privacy mechanisms. We introduce a novel "noiseless" differentially private auctioning protocol for winner notification and demonstrate the efficacy of our protocol using real-world datasets.


page 1

page 2

page 3

page 4


Outis: Crypto-Assisted Differential Privacy on Untrusted Servers

Differential privacy has steadily become the de-facto standard for achie...

Differentially Private Selection from Secure Distributed Computing

Given a collection of vectors x^(1),…,x^(n)∈{0,1}^d, the selection probl...

When differential privacy meets NLP: The devil is in the detail

Differential privacy provides a formal approach to privacy of individual...

Pain-Free Random Differential Privacy with Sensitivity Sampling

Popular approaches to differential privacy, such as the Laplace and expo...

On the Complexity of Two-Party Differential Privacy

In distributed differential privacy, the parties perform analysis over t...

One-bit Submission for Locally Private Quasi-MLE: Its Asymptotic Normality and Limitation

Local differential privacy (LDP) is an information-theoretic privacy def...

The Target-Charging Technique for Privacy Accounting across Interactive Computations

We propose the Target Charging Technique (TCT), a unified privacy analys...

Please sign up or login with your details

Forgot password? Click here to reset