Latent Dirichlet Allocation Model Training with Differential Privacy

10/09/2020
by   Fangyuan Zhao, et al.
0

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for hidden semantic discovery of text data and serves as a fundamental tool for text analysis in various applications. However, the LDA model as well as the training process of LDA may expose the text information in the training data, thus bringing significant privacy concerns. To address the privacy issue in LDA, we systematically investigate the privacy protection of the main-stream LDA training algorithm based on Collapsed Gibbs Sampling (CGS) and propose several differentially private LDA algorithms for typical training scenarios. In particular, we present the first theoretical analysis on the inherent differential privacy guarantee of CGS based LDA training and further propose a centralized privacy-preserving algorithm (HDP-LDA) that can prevent data inference from the intermediate statistics in the CGS training. Also, we propose a locally private LDA training algorithm (LP-LDA) on crowdsourced data to provide local differential privacy for individual data contributors. Furthermore, we extend LP-LDA to an online version as OLP-LDA to achieve LDA training on locally private mini-batches in a streaming setting. Extensive analysis and experiment results validate both the effectiveness and efficiency of our proposed privacy-preserving LDA training algorithms.

READ FULL TEXT

page 1

page 14

page 15

research
06/04/2019

On Privacy Protection of Latent Dirichlet Allocation Model Training

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique ...
research
05/25/2018

An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm

Latent Dirichlet Allocation (LDA) is a powerful probabilistic model used...
research
04/18/2022

PrivateRec: Differentially Private Training and Serving for Federated News Recommendation

Privacy protection is an essential issue in personalized news recommenda...
research
10/05/2016

Decentralized Topic Modelling with Latent Dirichlet Allocation

Privacy preserving networks can be modelled as decentralized networks (e...
research
08/04/2020

The Exact Asymptotic Form of Bayesian Generalization Error in Latent Dirichlet Allocation

Latent Dirichlet allocation (LDA) obtains essential information from dat...
research
02/18/2022

A new LDA formulation with covariates

The Latent Dirichlet Allocation (LDA) model is a popular method for crea...
research
01/06/2016

Streaming Gibbs Sampling for LDA Model

Streaming variational Bayes (SVB) is successful in learning LDA models i...

Please sign up or login with your details

Forgot password? Click here to reset