RHEEMix in the Data Jungle -- A Cross-Platform Query Optimizer --

05/09/2018
by   Sebastian Kruse, et al.
0

In pursuit of efficient and scalable data analytics, the insight that "one size does not fit all" has given rise to a plethora of specialized data processing platforms and today's complex data analytics are moving beyond the limits of a single platform. To cope with these new requirements, we present a cross-platform optimizer that allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to efficiently plan data movement among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. The results show that our optimizer is capable of selecting the most efficient platform combination for a given task, freeing data analysts from the need to choose and orchestrate platforms. In particular, our optimizer allows certain tasks to run more than one order of magnitude faster than on state-of-the-art platforms, such as Spark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2018

Building your Cross-Platform Application with RHEEM

Today, organizations typically perform tedious and costly tasks to juggl...
research
04/07/2020

Modularis: Modular Data Analytics for Hardware, Software, and Platform Heterogeneity

Today's data analytics displays an overwhelming diversity along many dim...
research
10/29/2021

Application-Platform Co-Design for Serverless Data Processing

"Application-platform co-design" refers to the phenomenon of new platfor...
research
07/25/2023

Smartpick: Workload Prediction for Serverless-enabled Scalable Data Analytics Systems

Many data analytic systems have adopted a newly emerging compute resourc...
research
12/02/2019

Lambada: Interactive Data Analytics on Cold Data using Serverless Cloud Infrastructure

The promise of ultimate elasticity and operational simplicity of serverl...
research
02/01/2019

OODIDA: On-board/Off-board Distributed Data Analytics for Connected Vehicles

Connected vehicles may produce gigabytes of data per hour, which makes c...
research
12/01/2020

A Scalable and Dependable Data Analytics Platform for Water Infrastructure Monitoring

With weather becoming more extreme both in terms of longer dry periods a...

Please sign up or login with your details

Forgot password? Click here to reset