De-identification of Privacy-related Entities in Job Postings

De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data. It has been well-studied within the medical domain. The need for de-identification technology is increasing, as privacy-preserving data handling is in high demand in many domains. In this paper, we focus on job postings. We present JobStack, a new corpus for de-identification of personal data in job vacancies on Stackoverflow. We introduce baselines, comparing Long-Short Term Memory (LSTM) and Transformer models. To improve upon these baselines, we experiment with contextualized embeddings and distantly related auxiliary data via multi-task learning. Our results show that auxiliary data improves de-identification performance. Surprisingly, vanilla BERT turned out to be more effective than a BERT model trained on other portions of Stackoverflow.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2019

Passive TCP Identification for Wired and WirelessNetworks: A Long-Short Term Memory Approach

Transmission control protocol (TCP) congestion control is one of the key...
research
04/03/2023

Detecting Fake Job Postings Using Bidirectional LSTM

Fake job postings have become prevalent in the online job market, posing...
research
07/15/2022

Towards Privacy-Preserving Person Re-identification via Person Identify Shift

Recently privacy concerns of person re-identification (ReID) raise more ...
research
09/18/2018

Model-Protected Multi-Task Learning

Multi-task learning (MTL) refers to the paradigm of learning multiple re...
research
09/16/2022

Transformer-based Detection of Multiword Expressions in Flower and Plant Names

Multiword expression (MWE) is a sequence of words which collectively pre...
research
11/14/2017

SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring

Deep learning has demonstrated tremendous potential for Automatic Text S...
research
07/15/2019

Audits as Evidence: Experiments, Ensembles, and Enforcement

We develop tools for utilizing correspondence experiments to detect ille...

Please sign up or login with your details

Forgot password? Click here to reset