Unassisted Noise Reduction of Chemical Reaction Data Sets

by   Alessandra Toniato, et al.

Existing deep learning models applied to reaction prediction in organic chemistry can reach high levels of accuracy (> 90 Processing-based ones). With no chemical knowledge embedded than the information learnt from reaction data, the quality of the data sets plays a crucial role in the performance of the prediction models. While human curation is prohibitively expensive, the need for unaided approaches to remove chemically incorrect entries from existing data sets is essential to improve artificial intelligence models' performance in synthetic chemistry tasks. Here we propose a machine learning-based, unassisted approach to remove chemically wrong entries from chemical reaction collections. We applied this method to the collection of chemical reactions Pistachio and to an open data set, both extracted from USPTO (United States Patent Office) patents. Our results show an improved prediction quality for models trained on the cleaned and balanced data sets. For the retrosynthetic models, the round-trip accuracy metric grows by 13 percentage points and the value of the cumulative Jensen Shannon divergence decreases by 30 with 97 The proposed strategy is the first unassisted rule-free technique to address automatic noise reduction in chemical data sets.


page 12

page 34

page 36

page 42


Judging Chemical Reaction Practicality From Positive Sample only Learning

Chemical reaction practicality is the core task among all symbol intelli...

Prognosis of Rotor Parts Fly-off Based on Cascade Classification and Online Prediction Ability Index

Large rotating machines, e.g., compressors, steam turbines, gas turbines...

Deep-learning-based prediction of nanoparticle phase transitions during in situ transmission electron microscopy

We develop the machine learning capability to predict a time sequence of...

Stress Testing BERT Anaphora Resolution Models for Reaction Extraction in Chemical Patents

The high volume of published chemical patents and the importance of a ti...

ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision

Structured chemical reaction information plays a vital role for chemists...

ChemVise: Maximizing Out-of-Distribution Chemical Detection with the Novel Application of Zero-Shot Learning

Accurate chemical sensors are vital in medical, military, and home safet...

Rxn Hypergraph: a Hypergraph Attention Model for Chemical Reaction Representation

It is fundamental for science and technology to be able to predict chemi...

Please sign up or login with your details

Forgot password? Click here to reset