DC-BENCH: Dataset Condensation Benchmark

by   Justin Cui, et al.

Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue. The quality of condensed dataset are often shadowed by many critical contributing factors to the end performance, such as data augmentation and model architectures. The lack of a systematic way to evaluate and compare condensation methods not only hinders our understanding of existing techniques, but also discourages practical usage of the synthesized datasets. This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. Leveraging this benchmark, we conduct a large-scale study of current condensation methods, and report many insightful findings that open up new possibilities for future development. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced to facilitate future research and application.


NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Deep learning has shown remarkable progress in a wide range of problems....

Immunofluorescence Capillary Imaging Segmentation: Cases Study

Nonunion is one of the challenges faced by orthopedics clinics for the t...

THE Benchmark: Transferable Representation Learning for Monocular Height Estimation

Generating 3D city models rapidly is crucial for many applications. Mono...

Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets

In addition to generating data and annotations, devising sensible data s...

Temporal Graph Benchmark for Machine Learning on Temporal Graphs

We present the Temporal Graph Benchmark (TGB), a collection of challengi...

LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting

Traffic forecasting plays a critical role in smart city initiatives and ...

Harassment detection: a benchmark on the #HackHarassment dataset

Online harassment has been a problem to a greater or lesser extent since...

Please sign up or login with your details

Forgot password? Click here to reset