Predicting Drug Solubility Using Different Machine Learning Methods – Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network

08/23/2023
by   John Ho, et al.
0

Predicting the solubility of given molecules is an important task in the pharmaceutical industry, and consequently this is a well-studied topic. In this research, we revisited this problem with the advantage of modern computing resources. We applied two machine learning models, a linear regression model and a graph convolutional neural network model, on multiple experimental datasets. Both methods can make reasonable predictions while the GCNN model had the best performance. However, the current GCNN model is a black box, while feature importance analysis from the linear regression model offers more insights into the underlying chemical influences. Using the linear regression model, we show how each functional group affects the overall solubility. Ultimately, knowing how chemical structure influences chemical properties is crucial when designing new drugs. Future work should aim to combine the high performance of GCNNs with the interpretability of linear regression, unlocking new advances in next generation high throughput screening.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset