An Adapter Architecture for Heterogeneous Data Processing in Bioinformatics Pipelines

03/06/2022
by   Dulani Meedeniya, et al.
0

Bioinformatics is a growing field focused on both the domains of computer science and biology. A range of bioinformatics data processing tools exists at present, which takes inputs and produces outputs in varying formats depending on the algorithms and processes being used. The undesirable situation where such processes would produce outputs that may not allow the pipelining of other processes, calls for a generic bioinformatics data format converter. Though such converters currently exist, most of them are limited to text conversions and provide limited functionality. In addition, such functions have the potential capability of supporting parallelism to increase the overall throughput. A solution that can provide the said conversion functions as well as utility functions, while processing with a high throughput via parallelism is proposed through this paper. A utility function of this system requires storing bioinformatics data locally. In addition to facilitating this, an average reduction of size by 40% is achieved in data storage. Evaluation of the system using a set of 7,000,000 gene data showed the maximum time consumption for retrieval as 400ms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset