How Do Graph Networks Generalize to Large and Diverse Molecular Systems?

04/06/2022
by   Johannes Gasteiger, et al.
78

The predominant method of demonstrating progress of atomic graph neural networks are benchmarks on small and limited datasets. The implicit hypothesis behind this approach is that progress on these narrow datasets generalize to the large diversity of chemistry. This generalizability would be very helpful for research, but currently remains untested. In this work we test this assumption by identifying four aspects of complexity in which many datasets are lacking: 1. Chemical diversity (number of different elements), 2. system size (number of atoms per sample), 3. dataset size (number of data samples), and 4. domain shift (similarity of the training and test set). We introduce multiple subsets of the large Open Catalyst 2020 (OC20) dataset to independently investigate each of these aspects. We then perform 21 ablation studies and sensitivity analyses on 9 datasets testing both previously proposed and new model enhancements. We find that some improvements are consistent between datasets, but many are not and some even have opposite effects. Based on this analysis, we identify a smaller dataset that correlates well with the full OC20 dataset, and propose the GemNet-OC model, which outperforms the previous state-of-the-art on OC20 by 16 10. Overall, our findings challenge the common belief that graph neural networks work equally well independent of dataset size and diversity, and suggest that caution must be exercised when making generalizations based on narrow datasets.

READ FULL TEXT
research
06/29/2021

On Graph Neural Network Ensembles for Large-Scale Molecular Property Prediction

In order to advance large-scale graph machine learning, the Open Graph B...
research
06/15/2023

On the Interplay of Subset Selection and Informed Graph Neural Networks

Machine learning techniques paired with the availability of massive data...
research
03/18/2022

Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations

Recent progress in Graph Neural Networks (GNNs) for modeling atomic simu...
research
12/30/2021

Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks

Heterogeneous graph neural networks (HGNNs) have been blossoming in rece...
research
10/14/2022

Graph neural networks to learn joint representations of disjoint molecular graphs

Graph neural networks are widely used to learn global representations of...
research
10/09/2020

Using Graph Neural Networks for Mass Spectrometry Prediction

Detecting and quantifying products of cellular metabolism using Mass Spe...
research
03/29/2023

Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning

Language model probing is often used to test specific capabilities of th...

Please sign up or login with your details

Forgot password? Click here to reset