Unsupervised hierarchical clustering using the learning dynamics of RBMs

02/03/2023
by   Aurelien Decelle, et al.
0

Datasets in the real world are often complex and to some degree hierarchical, with groups and sub-groups of data sharing common characteristics at different levels of abstraction. Understanding and uncovering the hidden structure of these datasets is an important task that has many practical applications. To address this challenge, we present a new and general method for building relational data trees by exploiting the learning dynamics of the Restricted Boltzmann Machine (RBM). Our method is based on the mean-field approach, derived from the Plefka expansion, and developed in the context of disordered systems. It is designed to be easily interpretable. We tested our method in an artificially created hierarchical dataset and on three different real-world datasets (images of digits, mutations in the human genome, and a homologous family of proteins). The method is able to automatically identify the hierarchical structure of the data. This could be useful in the study of homologous protein sequences, where the relationships between proteins are critical for understanding their function and evolution.

READ FULL TEXT

page 6

page 11

page 14

page 15

page 16

page 30

page 31

page 33

research
12/23/2017

Merging K-means with hierarchical clustering for identifying general-shaped groups

Clustering partitions a dataset such that observations placed together i...
research
12/07/2022

Learning State Transition Rules from Hidden Layers of Restricted Boltzmann Machines

Understanding the dynamics of a system is important in many scientific a...
research
03/08/2021

Meta-Learning with MAML on Trees

In meta-learning, the knowledge learned from previous tasks is transferr...
research
07/30/2018

A Group-Theoretic Approach to Abstraction: Hierarchical, Interpretable, and Task-Free Clustering

Abstraction plays a key role in concept learning and knowledge discovery...
research
09/20/2021

Neural Distance Embeddings for Biological Sequences

The development of data-dependent heuristics and representations for bio...
research
11/26/2018

HELOC Applicant Risk Performance Evaluation by Topological Hierarchical Decomposition

Strong regulations in the financial industry mean that any decisions bas...
research
02/10/2020

Learning Stochastic Behaviour of Aggregate Data

Learning nonlinear dynamics of aggregate data is a challenging problem s...

Please sign up or login with your details

Forgot password? Click here to reset