Benne: A Modular and Self-Optimizing Algorithm for Data Stream Clustering

09/09/2023
by   Zhengru Wang, et al.
1

In various real-world applications, ranging from the Internet of Things (IoT) to social media and financial systems, data stream clustering is a critical operation. This paper introduces Benne, a modular and highly configurable data stream clustering algorithm designed to offer a nuanced balance between clustering accuracy and computational efficiency. Benne distinguishes itself by clearly demarcating four pivotal design dimensions: the summarizing data structure, the window model for handling data temporality, the outlier detection mechanism, and the refinement strategy for improving cluster quality. This clear separation not only facilitates a granular understanding of the impact of each design choice on the algorithm's performance but also enhances the algorithm's adaptability to a wide array of application contexts. We provide a comprehensive analysis of these design dimensions, elucidating the challenges and opportunities inherent to each. Furthermore, we conduct a rigorous performance evaluation of Benne, employing diverse configurations and benchmarking it against existing state-of-the-art data stream clustering algorithms. Our empirical results substantiate that Benne either matches or surpasses competing algorithms in terms of clustering accuracy, processing throughput, and adaptability to varying data stream characteristics. This establishes Benne as a valuable asset for both practitioners and researchers in the field of data stream mining.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2022

A Framework for Simulating Real-world Stream Data of the Internet of Things

With the rapid growth in the number of devices of the Internet of Things...
research
10/02/2017

Clustering Stream Data by Exploring the Evolution of Density Mountain

Stream clustering is a fundamental problem in many streaming data analys...
research
06/17/2023

CStream: Parallel Data Stream Compression on Multicore Edge Devices

In the burgeoning realm of Internet of Things (IoT) applications on edge...
research
01/13/2022

Improved Multi-objective Data Stream Clustering with Time and Memory Optimization

The analysis of data streams has received considerable attention over th...
research
04/30/2020

Challenges in Benchmarking Stream Learning Algorithms with Real-world Data

Streaming data are increasingly present in real-world applications such ...
research
04/08/2019

Scaling Stream Processing with Transactional State Management on Multicores

Transactional state management relieves users from managing state consis...
research
09/29/2021

Data Sharing and Compression for Cooperative Networked Control

Sharing forecasts of network timeseries data, such as cellular or electr...

Please sign up or login with your details

Forgot password? Click here to reset