P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

by   Shun Takagi, et al.
Kyoto University

How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for this problem is to build a generative model under differential privacy, which offers a rigorous privacy guarantee. However, the existing method cannot adequately handle high dimensional data. In particular, when the input dataset contains a large number of features, the existing techniques require injecting a prohibitive amount of noise to satisfy differential privacy, which results in the outsourced data analysis meaningless. To address the above issue, this paper proposes privacy-preserving phased generative model (P3GM), which is a differentially private generative model for releasing such sensitive data. P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency (e.g., easy to converge). We give theoretical analyses about the learning complexity and privacy loss in P3GM. We further experimentally evaluate our proposed method and demonstrate that P3GM significantly outperforms existing solutions. Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity. Besides, in several data mining tasks with synthesized data, our model outperforms the competitors in terms of accuracy.


page 1

page 2


Differentially Private Tree-Based Redescription Mining

Differential privacy provides a strong form of privacy and allows preser...

Data Masking with Privacy Guarantees

We study the problem of data release with privacy, where data is made av...

HDPView: Differentially Private Materialized View for Exploring High Dimensional Relational Data

How can we explore the unknown properties of high-dimensional sensitive ...

Plausible Deniability for Privacy-Preserving Data Synthesis

Releasing full data records is one of the most challenging problems in d...

Privacy for Free: How does Dataset Condensation Help Privacy?

To prevent unintentional data leakage, research community has resorted t...

DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

Recent success of deep neural networks (DNNs) hinges on the availability...

Private Q-Learning with Functional Noise in Continuous Spaces

We consider privacy-preserving algorithms for deep reinforcement learnin...

Please sign up or login with your details

Forgot password? Click here to reset