PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection of Open Data

by   Kaustav Bhattacharjee, et al.

Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing low-cost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals' privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate the attack strategies and diagnose disclosure risks through two case studies with data privacy experts.


page 4

page 9


Power to the Data Defenders: Human-Centered Disclosure Risk Calibration of Open Data

The open data ecosystem is susceptible to vulnerabilities due to disclos...

Privacy with Good Taste: A Case Study in Quantifying Privacy Risks in Genetic Scores

Analysis of genetic data opens up many opportunities for medical and sci...

Toward Evaluating Re-identification Risks in the Local Privacy Model

LDP (Local Differential Privacy) has recently attracted much attention a...

Revealing Cumulative Risks in Online Personal Information: A Data Narrative Study

When pieces from an individual's personal information available online a...

From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy

Undoubtedly, the evolution of Generative AI (GenAI) models has been the ...

InfoScrub: Towards Attribute Privacy by Targeted Obfuscation

Personal photos of individuals when shared online, apart from exhibiting...

Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier

As universities recognize the inherent value in the data they collect an...

Please sign up or login with your details

Forgot password? Click here to reset