A smile is all you need: Predicting limiting activity coefficients from SMILES with natural language processing

06/15/2022
by   Benedikt Winter, et al.
ETH Zurich
8

Knowledge of mixtures' phase equilibria is crucial in nature and technical chemistry. Phase equilibria calculations of mixtures require activity coefficients. However, experimental data on activity coefficients is often limited due to high cost of experiments. For an accurate and efficient prediction of activity coefficients, machine learning approaches have been recently developed. However, current machine learning approaches still extrapolate poorly for activity coefficients of unknown molecules. In this work, we introduce the SMILES-to-Properties-Transformer (SPT), a natural language processing network to predict binary limiting activity coefficients from SMILES codes. To overcome the limitations of available experimental data, we initially train our network on a large dataset of synthetic data sampled from COSMO-RS (10 Million data points) and then fine-tune the model on experimental data (20 870 data points). This training strategy enables SPT to accurately predict limiting activity coefficients even for unknown molecules, cutting the mean prediction error in half compared to state-of-the-art models for activity coefficient predictions such as COSMO-RS, UNIFAC, and improving on recent machine learning approaches.

READ FULL TEXT
09/09/2022

SPT-NRTL: A physics-guided machine learning model to predict thermodynamically consistent activity coefficients

The availability of property data is one of the major bottlenecks in the...
01/29/2020

Machine Learning in Thermodynamics: Prediction of Activity Coefficients by Matrix Completion

Activity coefficients, which are a measure of the non-ideality of liquid...
06/23/2022

Graph Neural Networks for Temperature-Dependent Activity Coefficient Prediction of Solutes in Ionic Liquids

Ionic liquids (ILs) are important solvents for sustainable processes and...
02/17/2021

Decoding EEG Brain Activity for Multi-Modal Natural Language Processing

Until recently, human behavioral data from reading has mainly been of in...
12/07/2021

A deep language model to predict metabolic network equilibria

We show that deep learning models, and especially architectures like the...
06/07/2023

Machine-Learning Kronecker Coefficients

The Kronecker coefficients are the decomposition multiplicities of the t...
08/10/2023

Is there progress in activity progress prediction?

Activity progress prediction aims to estimate what percentage of an acti...

Please sign up or login with your details

Forgot password? Click here to reset