Online feature selection for rapid, low-overhead learning in networked systems

10/28/2020
by   Xiaoxuan Wang, et al.
0

Data-driven functions for operation and management often require measurements collected through monitoring for model training and prediction. The number of data sources can be very large, which requires a significant communication and computing overhead to continuously extract and collect this data, as well as to train and update the machine-learning models. We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources, which allows for rapid, low-overhead, and effective learning and prediction. OSFS is instantiated with a feature ranking algorithm and applies the concept of a stable feature set, which we introduce in the paper. We perform extensive, experimental evaluation of our method on data from an in-house testbed. We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude, from which models are trained with acceptable prediction accuracy. While our method is heuristic and can be improved in many ways, the results clearly suggests that many learning tasks do not require a lengthy monitoring phase and expensive offline training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2021

Online Feature Selection for Efficient Learning in Networked Systems

Current AI/ML methods for data-driven engineering use models that are mo...
research
09/07/2021

Predicting students' performance in online courses using multiple data sources

Data-driven decision making is serving and transforming education. We ap...
research
12/06/2022

Loss Adapted Plasticity in Deep Neural Networks to Learn from Data with Unreliable Sources

When data is streaming from multiple sources, conventional training meth...
research
06/11/2018

Aggregating Predictions on Multiple Non-disclosed Datasets using Conformal Prediction

Conformal Prediction is a machine learning methodology that produces val...
research
12/06/2020

SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

Nowadays, gathering high-quality training data from multiple data contro...
research
05/12/2018

Do Outliers Ruin Collaboration?

We consider the problem of learning a binary classifier from n different...
research
04/12/2023

Towards Solving the Challenge of Minimal Overhead Monitoring

The examination of performance changes or the performance behavior of a ...

Please sign up or login with your details

Forgot password? Click here to reset