Rk-means: Fast Clustering for Relational Data

10/11/2019
by   Ryan Curtin, et al.
0

Conventional machine learning algorithms cannot be applied until a data matrix is available to process. When the data matrix needs to be obtained from a relational database via a feature extraction query, the computation cost can be prohibitive, as the data matrix may be (much) larger than the total input relation size. This paper introduces Rk-means, or relational k -means algorithm, for clustering relational data tuples without having to access the full data matrix. As such, we avoid having to run the expensive feature extraction query and storing its output. Our algorithm leverages the underlying structures in relational data. It involves construction of a small grid coreset of the data matrix for subsequent cluster construction. This gives a constant approximation for the k -means objective, while having asymptotic runtime improvements over standard approaches of first running the database query and then clustering. Empirical results show orders-of-magnitude speedup, and Rk-means can run faster on the database than even just computing the data matrix.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2019

Learning Models over Relational Data: A Brief Tutorial

This tutorial overviews the state of the art in learning models over rel...
research
04/25/2013

An implementation of the relational k-means algorithm

A C# implementation of a generalized k-means variant called relational k...
research
08/01/2020

Relational Algorithms for k-means Clustering

The majority of learning tasks faced by data scientists involve relation...
research
10/09/2022

Coresets for Relational Data and The Applications

A coreset is a small set that can approximately preserve the structure o...
research
11/16/2019

Taming Reasoning in Temporal Probabilistic Relational Models

Evidence often grounds temporal probabilistic relational models over tim...
research
08/18/2020

The Relational Data Borg is Learning

This paper overviews an approach that addresses machine learning over re...
research
01/10/2020

Multi-layer Optimizations for End-to-End Data Analytics

We consider the problem of training machine learning models over multi-r...

Please sign up or login with your details

Forgot password? Click here to reset