Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations

by   Gihan Panapitiya, et al.

Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goal of this study is to develop a general model capable of predicting the solubility of a broad range of organic molecules. Using the largest currently available solubility dataset, we implement deep learning-based models to predict solubility from molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system (SMILES) strings, molecular graphs, and three-dimensional (3D) atomic coordinates using four different neural network architectures - fully connected neural networks (FCNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.


page 1

page 2

page 3

page 4


Graph Neural Networks for Molecules

Graph neural networks (GNNs), which are capable of learning representati...

Investigating 3D Atomic Environments for Enhanced QSAR

Predicting bioactivity and physical properties of molecules is a longsta...

Prediction of Small Molecule Kinase Inhibitors for Chemotherapy Using Deep Learning

The current state of cancer therapeutics has been moving away from one-s...

Molecular Identification from AFM images using the IUPAC Nomenclature and Attribute Multimodal Recurrent Neural Networks

Despite being the main tool to visualize molecules at the atomic scale, ...

IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System

Like many scientific fields, new chemistry literature has grown at a sta...

Harnessing Simulation for Molecular Embeddings

While deep learning has unlocked advances in computational biology once ...

Mol-PECO: a deep learning model to predict human olfactory perception from molecular structures

While visual and auditory information conveyed by wavelength of light an...

Please sign up or login with your details

Forgot password? Click here to reset