Hierarchical Classification of Enzyme Promiscuity Using Positive, Unlabeled, and Hard Negative Examples

02/18/2020
by   Gian Marco Visani, et al.
0

Despite significant progress in sequencing technology, there are many cellular enzymatic activities that remain unknown. We develop a new method, referred to as SUNDRY (Similarity-weighting for UNlabeled Data in a Residual HierarchY), for training enzyme-specific predictors that take as input a query substrate molecule and return whether the enzyme would act on that substrate or not. When addressing this enzyme promiscuity prediction problem, a major challenge is the lack of abundant labeled data, especially the shortage of labeled data for negative cases (enzyme-substrate pairs where the enzyme does not act to transform the substrate to a product molecule). To overcome this issue, our proposed method can learn to classify a target enzyme by sharing information from related enzymes via known tree hierarchies. Our method can also incorporate three types of data: those molecules known to be catalyzed by an enzyme (positive cases), those with unknown relationships (unlabeled cases), and molecules labeled as inhibitors for the enzyme. We refer to inhibitors as hard negative cases because they may be difficult to classify well: they bind to the enzyme, like positive cases, but are not transformed by the enzyme. Our method uses confidence scores derived from structural similarity to treat unlabeled examples as weighted negatives. We compare our proposed hierarchy-aware predictor against a baseline that cannot share information across related enzymes. Using data from the BRENDA database, we show that each of our contributions (hierarchical sharing, per-example confidence weighting of unlabeled data based on molecular similarity, and including inhibitors as hard-negative examples) contributes towards a better characterization of enzyme promiscuity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2010

A bagging SVM to learn from positive and unlabeled examples

We consider the problem of learning a binary classifier from a training ...
research
05/28/2019

When can unlabeled data improve the learning rate?

In semi-supervised classification, one is given access both to labeled a...
research
02/02/2017

Recovering True Classifier Performance in Positive-Unlabeled Learning

A common approach in positive-unlabeled learning is to train a classific...
research
10/19/2020

Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

We investigate the problem of reliably assessing group fairness when lab...
research
03/22/2018

Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Cryo-electron microscopy (cryoEM) is fast becoming the preferred method ...
research
01/29/2019

Revisiting Sample Selection Approach to Positive-Unlabeled Learning: Turning Unlabeled Data into Positive rather than Negative

In the early history of positive-unlabeled (PU) learning, the sample sel...
research
03/16/2022

NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

Datacenters execute large computational jobs, which are composed of smal...

Please sign up or login with your details

Forgot password? Click here to reset