Labeling without Seeing? Blind Annotation for Privacy-Preserving Entity Resolution

08/07/2023
by   Yixiang Yao, et al.
0

The entity resolution problem requires finding pairs across datasets that belong to different owners but refer to the same entity in the real world. To train and evaluate solutions (either rule-based or machine-learning-based) to the entity resolution problem, generating a ground truth dataset with entity pairs or clusters is needed. However, such a data annotation process involves humans as domain oracles to review the plaintext data for all candidate record pairs from different parties, which inevitably infringes the privacy of data owners, especially in privacy-sensitive cases like medical records. To the best of our knowledge, there is no prior work on privacy-preserving ground truth dataset generation, especially in the domain of entity resolution. We propose a novel blind annotation protocol based on homomorphic encryption that allows domain oracles to collaboratively label ground truths without sharing data in plaintext with other parties. In addition, we design a domain-specific easy-to-use language that hides the sophisticated underlying homomorphic encryption layer. Rigorous proof of the privacy guarantee is provided and our empirical experiments via an annotation simulator indicate the feasibility of our privacy-preserving protocol (f-measure on average achieves more than 90% compared with the real ground truths).

READ FULL TEXT
research
08/17/2022

Evaluating the Feasibility of a Provably Secure Privacy-Preserving Entity Resolution Adaptation of PPJoin using Homomorphic Encryption

Entity resolution is the task of disambiguating records that refer to th...
research
08/23/2021

AMPPERE: A Universal Abstract Machine for Privacy-Preserving Entity Resolution Evaluation

Entity resolution is the task of identifying records in different datase...
research
03/01/2023

DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction

Personal Digital Assistants (PDAs) - such as Siri, Alexa and Google Assi...
research
01/27/2022

Report: State of the Art Solutions for Privacy Preserving Machine Learning in the Medical Context

Machine Learning on Big Data gets more and more attention in various fie...
research
11/29/2019

Incremental Clustering Techniques for Multi-Party Privacy-Preserving Record Linkage

Privacy-Preserving Record Linkage (PPRL) supports the integration of sen...
research
06/20/2018

Developing a Temporal Bibliographic Data Set for Entity Resolution

Entity resolution is the process of identifying groups of records within...
research
03/27/2022

Privacy-preserving record linkage using local sensitive hash and private set intersection

The amount of data stored in data repositories increases every year. Thi...

Please sign up or login with your details

Forgot password? Click here to reset