Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction

08/13/2018
by   Garrett B. Goh, et al.
0

Deep learning algorithms excel at extracting patterns from raw data. Through representation learning and automated feature engineering on large datasets, such models have been highly successful in computer vision and natural language applications. However, in many other technical domains, large datasets on which to learn representations from may not be feasible. In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature engineering as well as learned representations from raw data. We illustrate the effectiveness of such an approach in the chemical sciences, for predicting chemical properties, where labeled data is scarce owing to the high costs associated with acquiring labels through experimental measurements. By training on both raw chemical data and using engineered chemical features, while leveraging weak supervised learning and transfer learning methods, we show that the multimodal CNN-MLP network is more accurate than either a standalone CNN or MLP network that uses only raw data or engineered features respectively. Using this multimodal network, we then develop the DeepBioD model for predicting chemical biodegradability, which achieves an error classification rate of 0.125 that is 27 current state-of-the-art. Thus, our work indicates that combining traditional feature engineering with representation learning on raw data can be an effective approach, particularly in situations where labeled training data is limited. Such a framework can also be potentially applied to other technical fields, where substantial research efforts into feature engineering has been established.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset