Learning to Group Auxiliary Datasets for Molecule

by   Tinglin Huang, et al.

The limited availability of annotations in small molecule datasets presents a challenge to machine learning models. To address this, one common strategy is to collaborate with additional auxiliary datasets. However, having more data does not always guarantee improvements. Negative transfer can occur when the knowledge in the target dataset differs or contradicts that of the auxiliary molecule datasets. In light of this, identifying the auxiliary molecule datasets that can benefit the target dataset when jointly trained remains a critical and unresolved problem. Through an empirical analysis, we observe that combining graph structure similarity and task similarity can serve as a more reliable indicator for identifying high-affinity auxiliary datasets. Motivated by this insight, we propose MolGroup, which separates the dataset affinity into task and structure affinity to predict the potential benefits of each auxiliary molecule dataset. MolGroup achieves this by utilizing a routing mechanism optimized through a bi-level optimization framework. Empowered by the meta gradient, the routing mechanism is optimized toward maximizing the target dataset's performance and quantifies the affinity as the gating score. As a result, MolGroup is capable of predicting the optimal combination of auxiliary datasets for each target dataset. Our extensive experiments demonstrate the efficiency and effectiveness of MolGroup, showing an average improvement of 4.41 selected by MolGroup on 11 target molecule datasets.


page 1

page 2

page 3

page 4


Segmentation-grounded Scene Graph Generation

Scene graph generation has emerged as an important problem in computer v...

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

Although existing semi-supervised learning models achieve remarkable suc...

Use square root affinity to regress labels in semantic segmentation

Semantic segmentation is a basic but non-trivial task in computer vision...

Distance-aware Molecule Graph Attention Network for Drug-Target Binding Affinity Prediction

Accurately predicting the binding affinity between drugs and proteins is...

CATrans: Context and Affinity Transformer for Few-Shot Segmentation

Few-shot segmentation (FSS) aims to segment novel categories given scarc...

Optimization with access to auxiliary information

We investigate the fundamental optimization question of minimizing a tar...

Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering

Domain adaptation (DA) aims to alleviate the domain shift between source...

Please sign up or login with your details

Forgot password? Click here to reset