On the Interplay of Subset Selection and Informed Graph Neural Networks

06/15/2023
by   Niklas Breustedt, et al.
0

Machine learning techniques paired with the availability of massive datasets dramatically enhance our ability to explore the chemical compound space by providing fast and accurate predictions of molecular properties. However, learning on large datasets is strongly limited by the availability of computational resources and can be infeasible in some scenarios. Moreover, the instances in the datasets may not yet be labelled and generating the labels can be costly, as in the case of quantum chemistry computations. Thus, there is a need to select small training subsets from large pools of unlabelled data points and to develop reliable ML methods that can effectively learn from small training sets. This work focuses on predicting the molecules atomization energy in the QM9 dataset. We investigate the advantages of employing domain knowledge-based data sampling methods for an efficient training set selection combined with informed ML techniques. In particular, we show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques such as kernel methods and graph neural networks. We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer based on the rate distortion explanation framework.

READ FULL TEXT
research
06/15/2021

Graphical Gaussian Process Regression Model for Aqueous Solvation Free Energy Prediction of Organic Molecules in Redox Flow Battery

The solvation free energy of organic molecules is a critical parameter i...
research
11/24/2020

Making Graph Neural Networks Worth It for Low-Data Molecular Machine Learning

Graph neural networks have become very popular for machine learning on m...
research
04/06/2022

How Do Graph Networks Generalize to Large and Diverse Molecular Systems?

The predominant method of demonstrating progress of atomic graph neural ...
research
06/01/2022

Graph Machine Learning for Design of High-Octane Fuels

Fuels with high-knock resistance enable modern spark-ignition engines to...
research
07/20/2023

Investigating minimizing the training set fill distance in machine learning regression

Many machine learning regression methods leverage large datasets for tra...
research
11/07/2022

Retention Time Prediction for Chromatographic Enantioseparation by Quantile Geometry-enhanced Graph Neural Network

A new research framework is proposed to incorporate machine learning tec...
research
03/31/2019

Molecular geometry prediction using a deep generative graph neural network

A molecule's geometry, also known as conformation, is one of a molecule'...

Please sign up or login with your details

Forgot password? Click here to reset