Scaling pattern mining through non-overlapping variable partitioning

12/10/2022
by   Leonardo Alexandre, et al.
0

Biclustering algorithms play a central role in the biotechnological and biomedical domains. The knowledge extracted supports the extraction of putative regulatory modules, essential to understanding diseases, aiding therapy research, and advancing biological knowledge. However, given the NP-hard nature of the biclustering task, algorithms with optimality guarantees tend to scale poorly in the presence of high-dimensionality data. To this end, we propose a pipeline for clustering-based vertical partitioning that takes into consideration both parallelization and cross-partition pattern merging needs. Given a specific type of pattern coherence, these clusters are built based on the likelihood that variables form those patterns. Subsequently, the extracted patterns per cluster are then merged together into a final set of closed patterns. This approach is evaluated using five published datasets. Results show that in some of the tested data, execution times yield statistically significant improvements when variables are clustered together based on the likelihood to form specific types of patterns, as opposed to partitions based on dissimilarity or randomness. This work offers a departuring step on the efficiency impact of vertical partitioning criteria along the different stages of pattern mining and biclustering algorithms. Availability: All the code is freely available at https://github.com/JupitersMight/pattern_merge under the MIT license.

READ FULL TEXT
research
11/30/2020

Mint: MDL-based approach for Mining INTeresting Numerical Pattern Sets

Pattern mining is well established in data mining research, especially f...
research
05/17/2021

Cross-Cluster Weighted Forests

Adapting machine learning algorithms to better handle the presence of na...
research
04/24/2023

Towards Top-K Non-Overlapping Sequential Patterns

Sequential pattern mining (SPM) has excellent prospects and application ...
research
08/24/2015

Searching for significant patterns in stratified data

Significant pattern mining, the problem of finding itemsets that are sig...
research
09/08/2022

Towards a Likelihood Ratio Approach for Bloodstain Pattern Analysis

In this work, we explore the application of likelihood ratio as a forens...
research
05/26/2018

A Survey of Utility-Oriented Pattern Mining

The main purpose of data mining and analytics is to find novel, potentia...

Please sign up or login with your details

Forgot password? Click here to reset