Reprogramming Language Models for Molecular Representation Learning

12/07/2020
by   Ria Vinod, et al.
2

Recent advancements in transfer learning have made it a promising approach for domain adaptation via transfer of learned representations. This is especially when relevant when alternate tasks have limited samples of well-defined and labeled data, which is common in the molecule data domain. This makes transfer learning an ideal approach to solve molecular learning tasks. While Adversarial reprogramming has proven to be a successful method to repurpose neural networks for alternate tasks, most works consider source and alternate tasks within the same domain. In this work, we propose a new algorithm, Representation Reprogramming via Dictionary Learning (R2DL), for adversarially reprogramming pretrained language models for molecular learning tasks, motivated by leveraging learned representations in massive state of the art language models. The adversarial program learns a linear transformation between a dense source model input space (language data) and a sparse target model input space (e.g., chemical and biological molecule data) using a k-SVD solver to approximate a sparse representation of the encoded data, via dictionary learning. R2DL achieves the baseline established by state of the art toxicity prediction models trained on domain-specific data and outperforms the baseline in a limited training-data setting, thereby establishing avenues for domain-agnostic transfer learning for tasks with molecule data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2023

Reprogramming Pretrained Language Models for Protein Sequence Representation Learning

Machine Learning-guided solutions for protein learning tasks have made s...
research
02/27/2019

An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models

A growing number of state-of-the-art transfer learning methods employ la...
research
01/13/2022

Improving VAE based molecular representations for compound property prediction

Collecting labeled data for many important tasks in chemoinformatics is ...
research
08/13/2018

Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction

Deep learning algorithms excel at extracting patterns from raw data. Thr...
research
04/12/2018

Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning

In real-world visual recognition problems, the assumption that the train...
research
05/19/2015

Multi-task additive models with shared transfer functions based on dictionary learning

Additive models form a widely popular class of regression models which r...
research
11/14/2021

Improving Compound Activity Classification via Deep Transfer and Representation Learning

Recent advances in molecular machine learning, especially deep neural ne...

Please sign up or login with your details

Forgot password? Click here to reset