DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization

by   Honglu Jiang, et al.

A large amount of high-dimensional and heterogeneous data appear in practical applications, which are often published to third parties for data analysis, recommendations, targeted advertising, and reliable predictions. However, publishing these data may disclose personal sensitive information, resulting in an increasing concern on privacy violations. Privacy-preserving data publishing has received considerable attention in recent years. Unfortunately, the differentially private publication of high dimensional data remains a challenging problem. In this paper, we propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases: a Markov-blanket-based attribute clustering phase and an invariant post randomization (PRAM) phase. Specifically, splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable allocation of privacy budget, while a double-perturbation mechanism satisfying local differential privacy facilitates an invariant PRAM to ensure no loss of statistical information and thus significantly preserves data utility. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy. We conduct extensive experiments on four real-world datasets and the experimental results demonstrate that our mechanism can significantly improve the data utility of the published data while satisfying differential privacy.


Differentially Private Empirical Risk Minimization with Input Perturbation

We propose a novel framework for the differentially private ERM, input p...

Differentially private sliced inverse regression in the federated paradigm

We extend the celebrated sliced inverse regression to address the challe...

Utility-efficient Differentially Private K-means Clustering based on Cluster Merging

Differential privacy is widely used in data analysis. State-of-the-art k...

Improving Utility for Privacy-Preserving Analysis of Correlated Columns using Pufferfish Privacy

Surveys are an important tool for many areas of social science research,...

HDPView: Differentially Private Materialized View for Exploring High Dimensional Relational Data

How can we explore the unknown properties of high-dimensional sensitive ...

Differentially Private Data Publication with Multi-level Data Utility

Conventional private data publication mechanisms aim to retain as much d...

Plausible Deniability for Privacy-Preserving Data Synthesis

Releasing full data records is one of the most challenging problems in d...

Please sign up or login with your details

Forgot password? Click here to reset