A Frequency Scaling based Performance Indicator Framework for Big Data Systems

11/27/2018
by   Chen Yang, et al.
0

It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and analyze the performance bottleneck efficiently. A methodology which can construct the indicator from the performance change with the CPU frequency scaling is described. Spark is used as an example of a big data system and two typical SQL benchmarks are used as the workloads to evaluate the proposed method. Experimental results show that the proposed method is accurate compared with the resource utilization method and easy to implement compared with some white-box method. Meanwhile, the analysis with our indicators lead to some interesting findings and valuable performance optimization suggestions for big data systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2018

BigRoots: An Effective Approach for Root-cause Analysis of Stragglers in Big Data System

Stragglers are commonly believed to have a great impact on the performan...
research
02/07/2021

DV-DVFS: Merging Data Variety and DVFS Technique to Manage the Energy Consumption of Big Data Processing

Data variety is one of the most important features of Big Data. Data var...
research
03/26/2019

Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

Apache Hive is an open-source relational database system for analytic bi...
research
06/10/2019

Big Variates: Visualizing and identifying key variables in a multivariate world

Big Data involves both a large number of events but also many variables....
research
02/01/2018

Data Dwarfs: A Lens Towards Fully Understanding Big Data and AI Workloads

The complexity and diversity of big data and AI workloads make understan...
research
06/03/2019

Big-Data Clustering: K-Means or K-Indicators?

The K-means algorithm is arguably the most popular data clustering metho...
research
10/18/2018

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads

For the architecture community, reasonable simulation time is a strong r...

Please sign up or login with your details

Forgot password? Click here to reset