Concurrent Classifier Error Detection (CCED) in Large Scale Machine Learning Systems

06/02/2023
by   Pedro Reviriego, et al.
0

The complexity of Machine Learning (ML) systems increases each year, with current implementations of large language models or text-to-image generators having billions of parameters and requiring billions of arithmetic operations. As these systems are widely utilized, ensuring their reliable operation is becoming a design requirement. Traditional error detection mechanisms introduce circuit or time redundancy that significantly impacts system performance. An alternative is the use of Concurrent Error Detection (CED) schemes that operate in parallel with the system and exploit their properties to detect errors. CED is attractive for large ML systems because it can potentially reduce the cost of error detection. In this paper, we introduce Concurrent Classifier Error Detection (CCED), a scheme to implement CED in ML systems using a concurrent ML classifier to detect errors. CCED identifies a set of check signals in the main ML system and feeds them to the concurrent ML classifier that is trained to detect errors. The proposed CCED scheme has been implemented and evaluated on two widely used large-scale ML models: Contrastive Language Image Pretraining (CLIP) used for image classification and Bidirectional Encoder Representations from Transformers (BERT) used for natural language applications. The results show that more than 95 percent of the errors are detected when using a simple Random Forest classifier that is order of magnitude simpler than CLIP or BERT. These results illustrate the potential of CCED to implement error detection in large-scale ML models.

READ FULL TEXT
research
06/04/2022

Exploring the Potential of Feature Density in Estimating Machine Learning Classifier Performance with Application to Cyberbullying Detection

In this research. we analyze the potential of Feature Density (HD) as a ...
research
02/17/2023

Wizard of Errors: Introducing and Evaluating Machine Learning Errors in Wizard of Oz Studies

When designing Machine Learning (ML) enabled solutions, designers often ...
research
06/08/2023

Flow-based Network Intrusion Detection Based on BERT Masked Language Model

A Network Intrusion Detection System (NIDS) is an important tool that id...
research
05/17/2021

Towards Demystifying Serverless Machine Learning Training

The appeal of serverless (FaaS) has triggered a growing interest on how ...
research
04/12/2019

Parity-Based Concurrent Error Detection Schemes for the ChaCha Stream Cipher

We propose two parity-based concurrent error detection schemes for the Q...
research
08/01/2023

GRDD: A Dataset for Greek Dialectal NLP

In this paper, we present a dataset for the computational study of a num...
research
03/20/2018

MLtuner: System Support for Automatic Machine Learning Tuning

MLtuner automatically tunes settings for training tunables (such as the ...

Please sign up or login with your details

Forgot password? Click here to reset