Benchmarking and Performance Modelling of MapReduce Communication Pattern

05/23/2020
by   Sheriffo Ceesay, et al.
0

Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The complexity of the low-level internals of big data frameworks and the ubiquity of application and workload configuration parameters makes it challenging and expensive to come up with comprehensive performance modelling solutions. In this paper, instead of focusing on a wide range of configurable parameters, we studied the low-level internals of the MapReduce communication pattern and used a minimal set of performance drivers to develop a set of phase level parametric models for approximating the execution time of a given application on a given cluster. Models can be used to infer the performance of unseen applications and approximate their performance when an arbitrary dataset is used as input. Our approach is validated by running empirical experiments in two setups. On average the error rate in both setups is plus or minus 10 the measured values.

READ FULL TEXT

page 1

page 7

research
08/27/2021

Machine Learning for Performance Prediction of Spark Cloud Applications

Big data applications and analytics are employed in many sectors for a v...
research
11/04/2021

Auto Tuning of Hadoop and Spark parameters

Data of the order of terabytes, petabytes, or beyond is known as Big Dat...
research
12/28/2017

Low-Level Augmented Bayesian Optimization for Finding the Best Cloud VM

With the advent of big data applications, which tends to have longer exe...
research
08/26/2018

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads

The complexity and diversity of big data and AI workloads make understan...
research
10/23/2017

Communication Efficient Checking of Big Data Operations

We propose fast probabilistic algorithms with low (i.e., sublinear in th...
research
08/17/2018

Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks

Big data analytics frameworks (BDAFs) have been widely used for data pro...
research
11/21/2017

HybridTune: Spatio-temporal Data and Model Driven Performance Diagnosis for Big Data Systems

With tremendous growing interests in Big Data systems, analyzing and fac...

Please sign up or login with your details

Forgot password? Click here to reset