Nested Mini-Batch K-Means

02/09/2016
by   James Newling, et al.
0

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t+1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1 empirical minimum 100 times earlier than the standard mini-batch algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2023

Mini-batch k-means terminates within O(d/ε) iterations

We answer the question: "Does local progress (on batches) imply global p...
research
07/22/2019

Properties of the Stochastic Approximation EM Algorithm with Mini-batch Sampling

To speed up convergence a mini-batch version of the Monte Carlo Markov C...
research
05/20/2023

Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching

Current techniques and systems for distributed model training mostly ass...
research
07/27/2023

Simplified Concrete Dropout – Improving the Generation of Attribution Masks for Fine-grained Classification

Fine-grained classification is a particular case of a classification pro...
research
10/14/2020

Optimal quantisation of probability measures using maximum mean discrepancy

Several researchers have proposed minimisation of maximum mean discrepan...
research
06/18/2021

An Investigation into Mini-Batch Rule Learning

We investigate whether it is possible to learn rule sets efficiently in ...
research
05/10/2023

Phase transitions in the mini-batch size for sparse and dense neural networks

The use of mini-batches of data in training artificial neural networks i...

Please sign up or login with your details

Forgot password? Click here to reset