Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis

04/24/2020
by   Joyce Jiyoung Whang, et al.
1

The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix. A number of co-clustering techniques have been proposed including information-theoretic co-clustering and the minimum sum-squared residue co-clustering method. However, most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clusters while many real-world datasets contain not only a large overlap between co-clusters but also outliers which should not belong to any co-cluster. In this paper, we formulate the problem of Non-Exhaustive, Overlapping Co-Clustering where both of the row and column clusters are allowed to overlap with each other and outliers for each dimension of the data matrix are not assigned to any cluster. To solve this problem, we propose intuitive objective functions, and develop an an efficient iterative algorithm which we call the NEO-CC algorithm. We theoretically show that the NEO-CC algorithm monotonically decreases the proposed objective functions. Experimental results show that the NEO-CC algorithm is able to effectively capture the underlying co-clustering structure of real-world data, and thus outperforms state-of-the-art clustering and co-clustering methods. This manuscript includes an extended analysis of [21].

READ FULL TEXT
research
07/14/2023

Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations

Finding (bi-)clusters in bipartite graphs is a popular data analysis app...
research
07/05/2016

Algorithms for Generalized Cluster-wise Linear Regression

Cluster-wise linear regression (CLR), a clustering problem intertwined w...
research
10/11/2018

FeatureLego: Volume Exploration Using Exhaustive Clustering of Super-Voxels

We present a volume exploration framework, FeatureLego, that uses a nove...
research
05/28/2023

Overlapping and Robust Edge-Colored Clustering in Hypergraphs

A recent trend in data mining has explored (hyper)graph clustering algor...
research
01/23/2020

Towards Automatic Clustering Analysis using Traces of Information Gain: The InfoGuide Method

Clustering analysis has become a ubiquitous information retrieval tool i...
research
05/14/2018

Algorithms and Complexity of Range Clustering

We introduce a novel criterion in clustering that seeks clusters with li...
research
01/22/2018

An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

DBSCAN is a typically used clustering algorithm due to its clustering ab...

Please sign up or login with your details

Forgot password? Click here to reset