Pre-training Graph Neural Networks

05/29/2019
by   Weihua Hu, et al.
24

Many applications of machine learning in science and medicine, including molecular property and protein function prediction, can be cast as problems of predicting some properties of graphs, where having good graph representations is critical. However, two key challenges in these domains are (1) extreme scarcity of labeled data due to expensive lab experiments, and (2) needing to extrapolate to test graphs that are structurally different from those seen during training. In this paper, we explore pre-training to address both of these challenges. In particular, working with Graph Neural Networks (GNNs) for representation learning of graphs, we wish to obtain node representations that (1) capture similarity of nodes' network neighborhood structure, (2) can be composed to give accurate graph-level representations, and (3) capture domain-knowledge. To achieve these goals, we propose a series of methods to pre-train GNNs at both the node-level and the graph-level, using both unlabeled data and labeled data from related auxiliary supervised tasks. We perform extensive evaluation on two applications, molecular property and protein function prediction. We observe that performing only graph-level supervised pre-training often leads to marginal performance gain or even can worsen the performance compared to non-pre-trained models. On the other hand, effectively combining both node- and graph-level pre-training techniques significantly improves generalization to out-of-distribution graphs, consistently outperforming non-pre-trained GNNs across 8 datasets in molecular property prediction (resp. 40 tasks in protein function prediction), with the average ROC-AUC improvement of 7.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2021

Motif-based Graph Self-Supervised Learning forMolecular Property Prediction

Predicting molecular properties with data-driven methods has drawn much ...
research
03/03/2023

Graph-level representations using ensemble-based readout functions

Graph machine learning models have been successfully deployed in a varie...
research
03/03/2023

Denoise Pre-training on Non-equilibrium Molecules for Accurate and Transferable Neural Potentials

Machine learning methods, particularly recent advances in equivariant gr...
research
08/21/2022

MentorGNN: Deriving Curriculum for Pre-Training GNNs

Graph pre-training strategies have been attracting a surge of attention ...
research
06/21/2023

Predicting protein variants with equivariant graph neural networks

Pre-trained models have been successful in many protein engineering task...
research
08/17/2023

On Data Imbalance in Molecular Property Prediction with Pre-training

Revealing and analyzing the various properties of materials is an essent...
research
12/20/2022

MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular Representation Learning

Molecular representation learning is crucial for the problem of molecula...

Please sign up or login with your details

Forgot password? Click here to reset