The Data Airlock: infrastructure for restricted data informatics

03/17/2022
by   Gregory Rolan, et al.
0

Data science collaboration is problematic when access to operational data or models from outside the data-holding organisation is prohibited, for a variety of legal, security, ethical, or practical reasons. There are significant data privacy challenges when performing collaborative data science work against such restricted data. In this paper we describe a range of causes and risks associated with restricted data along with the social, environmental, data, and cryptographic measures that may be used to mitigate such issues. We then show how these are generally inadequate for restricted data contexts and introduce the 'Data Airlock' - secure infrastructure that facilitates 'eyes-off' data science workloads. After describing our use-case we detail the architecture and implementation of a first, single-organisation version of the Data Airlock infrastructure. We conclude with outcomes and learning from this implementation, and outline requirements for a second, federated version.

READ FULL TEXT
research
08/08/2023

Why Data Science Projects Fail

Data Science is a modern Data Intelligence practice, which is the core o...
research
05/04/2018

Building Data Science Capabilities into University Data Warehouse to Predict Graduation

The discipline of data science emerged to combine statistical methods wi...
research
10/11/2021

Beyond Desktop Computation: Challenges in Scaling a GPU Infrastructure

Enterprises and labs performing computationally expensive data science a...
research
10/18/2017

Mapping for accessibility: A case study of ethics in data science for social good

Ethics in the emerging world of data science are often discussed through...
research
03/20/2015

Data Science as a New Frontier for Design

The purpose of this paper is to contribute to the challenge of transferr...
research
07/17/2023

Towards eXplainable AI for Mobility Data Science

This paper presents our ongoing work towards XAI for Mobility Data Scien...
research
10/10/2018

Revitalizing Copybacks in Modern SSDs: Why and How

For modern flash-based SSDs, the performance overhead of internal data m...

Please sign up or login with your details

Forgot password? Click here to reset