Improving Small Molecule Generation using Mutual Information Machine

by   Danny Reidenbach, et al.
berkeley college

We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints (e.g., similarity to a reference molecule). Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space. MolMIM is trained with Mutual Information Machine (MIM) learning, and provides a fixed length representation of variable length SMILES strings. Since encoder-decoder models can learn representations with “holes” of invalid samples, here we propose a novel extension to the training procedure which promotes a dense latent space, and allows the model to sample valid molecules from random perturbations of latent codes. We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty. We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization. We achieve state-of-the-art results in several constrained single property optimization tasks as well as in the challenging task of multi-objective optimization, improving over previous success rate SOTA by more than 5% . We attribute the strong results to MolMIM's latent representation which clusters similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method. We also demonstrate MolMIM to be favourable in a compute limited regime, making it an attractive model for such cases.


Conditional β-VAE for De Novo Molecular Generation

Deep learning has significantly advanced and accelerated de novo molecul...

Improving black-box optimization in VAE latent space using decoder uncertainty

Optimization in the latent space of variational autoencoders is a promis...

A Two-Step Graph Convolutional Decoder for Molecule Generation

We propose a simple auto-encoder framework for molecule generation. The ...

A COLD Approach to Generating Optimal Samples

Optimising discrete data for a desired characteristic using gradient-bas...

SentenceMIM: A Latent Variable Language Model

We introduce sentenceMIM, a probabilistic auto-encoder for language mode...

Improving Chemical Autoencoder Latent Space and Molecular De novo Generation Diversity with Heteroencoders

Chemical autoencoders are attractive models as they combine chemical spa...

Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures

Generative models have achieved impressive results in many domains inclu...

Please sign up or login with your details

Forgot password? Click here to reset