Low-Level Augmented Bayesian Optimization for Finding the Best Cloud VM

by   Chin-Jung Hsu, et al.

With the advent of big data applications, which tends to have longer execution time, choosing the right cloud VM to run these applications has significant performance as well as economic implications. For example, in our large-scale empirical study of 107 different workloads on three popular big data systems, we found that a wrong choice can lead to a 20 times slowdown or an increase in cost by 10 times. Bayesian optimization is a technique for optimizing expensive (black-box) functions. Previous attempts have only used instance-level information (such as # of cores, memory size) which is not sufficient to represent the search space. In this work, we discover that this may lead to the fragility problem---either incurs high search cost or finds only the sub-optimal solution. The central insight of this paper is to use low-level performance information to augment the process of Bayesian Optimization. Our novel low-level augmented Bayesian Optimization is rarely worse than current practices and often performs much better (in 46 of 107 cases). Further, it significantly reduces the search cost in nearly half of our case studies. Based on this work, we conclude that it is often insufficient to use general-purpose off-the-shelf methods for configuring cloud instances without augmenting those methods with essential systems knowledge such as CPU utilization, working memory size and I/O wait time.


Scout: An Experienced Guide to Find the Best Cloud Configuration

Finding the right cloud configuration for workloads is an essential step...

SnAKe: Bayesian Optimization with Pathwise Exploration

Bayesian Optimization is a very effective tool for optimizing expensive ...

Cost-aware Bayesian Optimization

Bayesian optimization (BO) is a class of global optimization algorithms,...

Benchmarking and Performance Modelling of MapReduce Communication Pattern

Understanding and predicting the performance of big data applications ru...

A Nonmyopic Approach to Cost-Constrained Bayesian Optimization

Bayesian optimization (BO) is a popular method for optimizing expensive-...

Fast and Low-cost Search for Efficient Cloud Configurations for HPC Workloads

The use of cloud computational resources has become increasingly importa...

Naive Automated Machine Learning

An essential task of Automated Machine Learning (AutoML) is the problem ...

Please sign up or login with your details

Forgot password? Click here to reset