OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

08/29/2023
by   Yiqun Diao, et al.
0

How to get insights from relational data streams in a timely manner is a hot research topic. This type of data stream can present unique challenges, such as distribution drifts, outliers, emerging classes, and changing features, which have recently been described as open environment challenges for machine learning. While existing studies have been done on incremental learning for data streams, their evaluations are mostly conducted with manually partitioned datasets. Thus, a natural question is how those open environment challenges look like in real-world relational data streams and how existing incremental learning algorithms perform on real datasets. To fill this gap, we develop an Open Environment Benchmark named OEBench to evaluate open environment challenges in relational data streams. Specifically, we investigate 55 real-world relational data streams and establish that open environment scenarios are indeed widespread in real-world datasets, which presents significant challenges for stream learning algorithms. Through benchmarks with existing incremental learning algorithms, we find that increased data quantity may not consistently enhance the model accuracy when applied in open environment scenarios, where machine learning models can be significantly compromised by missing values, distribution shifts, or anomalies in real-world data streams. The current techniques are insufficient in effectively mitigating these challenges posed by open environments. More researches are needed to address real-world open environment challenges. All datasets and code are open-sourced in https://github.com/sjtudyq/OEBench.

READ FULL TEXT

page 11

page 12

research
06/01/2022

Open Environment Machine Learning

Conventional machine learning studies generally assume close world scena...
research
04/07/2022

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Class imbalance poses new challenges when it comes to classifying data s...
research
05/03/2023

Stream Efficient Learning

Data in many real-world applications are often accumulated over time, li...
research
01/09/2023

On the challenges to learn from Natural Data Streams

In real-world contexts, sometimes data are available in form of Natural ...
research
06/11/2022

Incremental Information Gain Mining Of Temporal Relational Streams

This paper studies the problem of mining for data values with high infor...
research
07/11/2023

EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video

In egocentric action recognition a single population model is typically ...
research
03/25/2014

Updating Formulas and Algorithms for Computing Entropy and Gini Index from Time-Changing Data Streams

Despite growing interest in data stream mining the most successful incre...

Please sign up or login with your details

Forgot password? Click here to reset