Censoring chemical data to mitigate dual use risk

04/20/2023
by   Quintina L. Campbell, et al.
0

The dual use of machine learning applications, where models can be used for both beneficial and malicious purposes, presents a significant challenge. This has recently become a particular concern in chemistry, where chemical datasets containing sensitive labels (e.g. toxicological information) could be used to develop predictive models that identify novel toxins or chemical warfare agents. To mitigate dual use risks, we propose a model-agnostic method of selectively noising datasets while preserving the utility of the data for training deep neural networks in a beneficial region. We evaluate the effectiveness of the proposed method across least squares, a multilayer perceptron, and a graph neural network. Our findings show selectively noised datasets can induce model variance and bias in predictions for sensitive labels with control, suggesting the safe sharing of datasets containing sensitive information is feasible. We also find omitting sensitive data often increases model variance sufficiently to mitigate dual use. This work is proposed as a foundation for future research on enabling more secure and collaborative data sharing practices and safer machine learning applications in chemistry.

READ FULL TEXT
research
10/04/2018

Dual Convolutional Neural Network for Graph of Graphs Link Prediction

Graphs are general and powerful data representations which can model com...
research
01/25/2022

Maximizing information from chemical engineering data sets: Applications to machine learning

It is well-documented how artificial intelligence can have (and already ...
research
11/27/2022

Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models

A growing ecosystem of large, open-source foundation models has reduced ...
research
07/10/2023

Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey

In graph machine learning, data collection, sharing, and analysis often ...
research
09/15/2023

Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures

Predicting chemical function from structure is a major goal of the chemi...
research
11/19/2019

Forbidden knowledge in machine learning – Reflections on the limits of research and publication

Certain research strands can yield "forbidden knowledge". This term refe...

Please sign up or login with your details

Forgot password? Click here to reset