Identification of functionally related enzymes by learning-to-rank methods

by   Michiel Stock, et al.

Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes.


Intertemporal Connections Between Query Suggestions and Search Engine Results for Politics Related Queries

This short paper deals with the combination and comparison of two data s...

Learning to Rank Scientific Documents from the Crowd

Finding related published articles is an important task in any science, ...

Gene Similarity-based Approaches for Determining Core-Genes of Chloroplasts

In computational biology and bioinformatics, the manner to understand ev...

Efficient Approximation Algorithms for String Kernel Based Sequence Classification

Sequence classification algorithms, such as SVM, require a definition of...

SIG-DB: leveraging homomorphic encryption to Securely Interrogate privately held Genomic DataBases

Genomic data are becoming increasingly valuable as we develop methods to...

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

In the last five years there has been a flurry of work on information ex...

Image Collation: Matching illustrations in manuscripts

Illustrations are an essential transmission instrument. For an historian...

Please sign up or login with your details

Forgot password? Click here to reset