Pair-Wise Cluster Analysis

by   David R. Hardoon, et al.
Uppsala universitet
Apple Inc

This paper studies the problem of learning clusters which are consistently present in different (continuously valued) representations of observed data. Our setup differs slightly from the standard approach of (co-) clustering as we use the fact that some form of `labeling' becomes available in this setup: a cluster is only interesting if it has a counterpart in the alternative representation. The contribution of this paper is twofold: (i) the problem setting is explored and an analysis in terms of the PAC-Bayesian theorem is presented, (ii) a practical kernel-based algorithm is derived exploiting the inherent relation to Canonical Correlation Analysis (CCA), as well as its extension to multiple views. A content based information retrieval (CBIR) case study is presented on the multi-lingual aligned Europal document dataset which supports the above findings.


page 2

page 11


An experimental comparison of label selection methods for hierarchical document clusters

The focus of this paper is on the evaluation of sixteen labeling methods...

Multiple kernel learning for integrative consensus clustering of genomic datasets

Diverse applications - particularly in tumour subtyping - have demonstra...

Robust Bayesian Cluster Enumeration

A major challenge in cluster analysis is that the number of data cluster...

A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

We formulate weighted graph clustering as a prediction problem: given a ...

Document Clustering Evaluation: Divergence from a Random Baseline

Divergence from a random baseline is a technique for the evaluation of d...

Please sign up or login with your details

Forgot password? Click here to reset