On Automatic Feasibility Study for Machine Learning Application Development with ease.ml/snoopy

by   Cedric Renggli, et al.

In our experience working with domain experts who are using today's AutoML systems, a common problem we encountered is what we call Unrealistic Expectation: When users have access to very noisy or challenging datasets, whilst being expected to achieve startlingly high accuracy with ML. Consequently, many computationally expensive AutoML runs and labour-intensive ML development processes are predestined to fail from the beginning. In traditional software engineering, this problem is addressed via a feasibility study, an indispensable step before developing any software system. In this paper we present ease.ml/snoopy with the goal of preforming an automatic feasibility study before building ML applications. A user provides inputs in the form of a dataset and a quality target (e.g., expected accuracy > 0.8) and the system returns its deduction on whether this target is achievable using ML given the input data. We formulate this problem as estimating the irreducible error of the underlying task, also known as the Bayes error. The key contribution of this work is the study of this problem from a system's and empirical perspective – we (1) propose practical "compromises" that enable the application of Bayes error estimators and (2) develop an evaluation framework that compares different estimators empirically on real-world data. We then systematically explore the design space by evaluating a range of estimators, reporting not only the improvements of our proposed estimator but also limitations of both our method and existing estimators.


page 1

page 2

page 3

page 4


Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment

Continuous integration is an indispensable step of modern software engin...

Ease.ml/meter: Quantitative Overfitting Management for Human-in-the-loop ML Application Development

Simplifying machine learning (ML) application development, including dis...

Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter

Simplifying machine learning (ML) application development, including dis...

Technology Readiness Levels for Machine Learning Systems

The development and deployment of machine learning systems can be execut...

Evaluating Bayes Error Estimators on Read-World Datasets with FeeBee

The Bayes error rate (BER) is a fundamental concept in machine learning ...

Collaborative Machine Learning Model Building with Families Using Co-ML

Existing novice-friendly machine learning (ML) modeling tools center aro...

Data Budgeting for Machine Learning

Data is the fuel powering AI and creates tremendous value for many domai...

Please sign up or login with your details

Forgot password? Click here to reset