AugLiChem: Data Augmentation Library of Chemical Structures for Machine Learning

11/30/2021
by   Rishikesh Magar, et al.
1

Machine learning (ML) has demonstrated the promise for accurate and efficient property prediction of molecules and crystalline materials. To develop highly accurate ML models for chemical structure property prediction, datasets with sufficient samples are required. However, obtaining clean and sufficient data of chemical properties can be expensive and time-consuming, which greatly limits the performance of ML models. Inspired by the success of data augmentations in computer vision and natural language processing, we developed AugLiChem: the data augmentation library for chemical structures. Augmentation methods for both crystalline systems and molecules are introduced, which can be utilized for fingerprint-based ML models and Graph Neural Networks(GNNs). We show that using our augmentation strategies significantly improves the performance of ML models, especially when using GNNs. In addition, the augmentations that we developed can be used as a direct plug-in module during training and have demonstrated the effectiveness when implemented with different GNN models through the AugliChem library. The Python-based package for our implementation of Auglichem: Data augmentation library for chemical structures, is publicly available at: https://github.com/BaratiLab/AugLiChem.

READ FULL TEXT
research
08/21/2022

MolGraph: a Python package for the implementation of small molecular graphs and graph neural networks with TensorFlow and Keras

Molecular machine learning (ML) has proven important for tackling variou...
research
09/17/2023

Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties

The application of machine learning (ML) techniques in computational che...
research
10/14/2021

Predictive models of RNA degradation through dual crowdsourcing

Messenger RNA-based medicines hold immense potential, as evidenced by th...
research
05/04/2022

Crystal Twins: Self-supervised Learning for Crystalline Material Property Prediction

Machine learning (ML) models have been widely successful in the predicti...
research
02/07/2023

Data augmentation for machine learning of chemical process flowsheets

Artificial intelligence has great potential for accelerating the design ...
research
05/25/2021

Improving Machine Learning-Based Modeling of Semiconductor Devices by Data Self-Augmentation

In the electronics industry, introducing Machine Learning (ML)-based tec...
research
06/19/2023

Human Limits in Machine Learning: Prediction of Plant Phenotypes Using Soil Microbiome Data

The preservation of soil health has been identified as one of the main c...

Please sign up or login with your details

Forgot password? Click here to reset