Dataset Distillation Meets Provable Subset Selection

by   Murad Tukan, et al.

Deep learning has grown tremendously over recent years, yielding state-of-the-art results in various fields. However, training such models requires huge amounts of data, increasing the computational time and cost. To address this, dataset distillation was proposed to compress a large training dataset into a smaller synthetic one that retains its performance – this is usually done by (1) uniformly initializing a synthetic set and (2) iteratively updating/learning this set according to a predefined loss by uniformly sampling instances from the full data. In this paper, we improve both phases of dataset distillation: (1) we present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data, and (2) we further merge the idea of data subset selection with dataset distillation, by training the distilled set on “important” sampled points during the training procedure instead of randomly sampling the next batch. To do so, we define the notion of importance based on the relative contribution of instances with respect to two different loss functions, i.e., one for the initialization phase (a kernel fitting function for kernel ridge regression and K-means based loss function for any other distillation method), and the relative cross-entropy loss (or any other predefined loss) function for the training phase. Finally, we provide experimental results showing how our method can latch on to existing dataset distillation techniques and improve their performance.


Data Distillation for Text Classification

Deep learning techniques have achieved great success in many fields, whi...

On the Size and Approximation Error of Distilled Sets

Dataset Distillation is the task of synthesizing small datasets from lar...

Dataset Meta-Learning from Kernel Ridge-Regression

One of the most fundamental aspects of any machine learning algorithm is...

Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning

Methods for carefully selecting or generating a small set of training da...

Dataset Distillation

Model distillation aims to distill the knowledge of a complex model into...

A generalized meta-loss function for distillation and learning using privileged information for classification and regression

Learning using privileged information and distillation are powerful mach...

Bayesian Sampling Bias Correction: Training with the Right Loss Function

We derive a family of loss functions to train models in the presence of ...

Please sign up or login with your details

Forgot password? Click here to reset