One-Shot Coresets: The Case of k-Clustering

11/27/2017
by   Olivier Bachem, et al.
0

Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent - the practitioner has to commit to a fixed clustering objective before even exploring the data. However, can one construct small data summaries for a wide range of clustering problems simultaneously? In this work, we affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for k-clustering problems while retaining strong theoretical guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2021

Multi-Perspective Abstractive Answer Summarization

Community Question Answering (CQA) forums such as Stack Overflow and Yah...
research
08/15/2017

Automatic Summarization of Online Debates

Debate summarization is one of the novel and challenging research areas ...
research
05/01/2014

VSCAN: An Enhanced Video Summarization using Density-based Spatial Clustering

In this paper, we present VSCAN, a novel approach for generating static ...
research
06/16/2023

Adversarially robust clustering with optimality guarantees

We consider the problem of clustering data points coming from sub-Gaussi...
research
01/30/2023

Optimal Decision Trees For Interpretable Clustering with Constraints

Constrained clustering is a semi-supervised task that employs a limited ...
research
05/27/2022

Guided Exploration of Data Summaries

Data summarization is the process of producing interpretable and represe...
research
03/19/2017

Practical Coreset Constructions for Machine Learning

We investigate coresets - succinct, small summaries of large data sets -...

Please sign up or login with your details

Forgot password? Click here to reset