A Submodularity-based Agglomerative Clustering Algorithm for the Privacy Funnel
For the privacy funnel (PF) problem, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). For a data curator that wants to share the data X correlated with the sensitive information S, the PF problem is to generate the sanitized data X̂ that maintains a specified utility/fidelity threshold on I(X; X̂) while minimizing the privacy leakage I(S; X̂). Our IAC-MDSF algorithm starts with the original alphabet X̂ := X and iteratively merges the elements in the current alphabet X̂ that minimizes the Lagrangian function I(S;X̂) - λ I(X;X̂) . We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of X̂ by the existing MDSF algorithms. We show that the IAC-MDSF algorithm also applies to the information bottleneck (IB), a dual problem to PF. By varying the value of the Lagrangian multiplier λ, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: I(S;X̂) vs. - I(X;X̂). We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.
READ FULL TEXT