repliclust: Synthetic Data for Cluster Analysis

03/24/2023
by   Michael J. Zellinger, et al.
0

We present repliclust (from repli-cate and clust-er), a Python package for generating synthetic data sets with clusters. Our approach is based on data set archetypes, high-level geometric descriptions from which the user can create many different data sets, each possessing the desired geometric characteristics. The architecture of our software is modular and object-oriented, decomposing data generation into algorithms for placing cluster centers, sampling cluster shapes, selecting the number of data points for each cluster, and assigning probability distributions to clusters. The project webpage, repliclust.org, provides a concise user guide and thorough documentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2022

A novel cluster internal evaluation index based on hyper-balls

It is crucial to evaluate the quality and determine the optimal number o...
research
11/19/2022

An experimental study on Synthetic Tabular Data Evaluation

In this paper, we present the findings of various methodologies for meas...
research
01/24/2023

Generating Multidimensional Clusters With Support Lines

Synthetic data is essential for assessing clustering techniques, complem...
research
09/27/2021

Derivative Extrapolation Using Least Squares

Here, we present three methods for differentiating discrete sets from st...
research
02/20/2020

Cluster Aware Mobility Encounter Dataset Enlargement

The recent emerging fields in data processing and manipulation has facil...
research
06/08/2017

Automatic tracking of vessel-like structures from a single starting point

The identification of vascular networks is an important topic in the med...

Please sign up or login with your details

Forgot password? Click here to reset