Resource Allocation in Serverless Query Processing

by   Simon Kassing, et al.
ETH Zurich

Data lakes hold a growing amount of cold data that is infrequently accessed, yet require interactive response times. Serverless functions are seen as a way to address this use case since they offer an appealing alternative to maintaining (and paying for) a fixed infrastructure. Recent research has analyzed the potential of serverless for data processing. In this paper, we expand on such work by looking into the question of serverless resource allocation to data processing tasks (number and size of the functions). We formulate a general model to roughly estimate completion time and financial cost, which we apply to augment an existing serverless data processing system with an advisory tool that automatically identifies configurations striking a good balance – which we define as being close to the "knee" of their Pareto frontier. The model takes into account key aspects of serverless: start-up, computation, network transfers, and overhead as a function of the input sizes and intermediate result exchanges. Using (micro)benchmarks and parts of TPC-H, we show that this advisor is capable of pinpointing configurations desirable to the user. Moreover, we identify and discuss several aspects of data processing on serverless affecting efficiency. By using an automated tool to configure the resources, the barrier to using serverless for data processing is lowered and the narrow window where it is cost effective can be expanded by using a more optimal allocation instead of having to over-provision the design.


page 1

page 2

page 3

page 4


Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?

Distributed dataflow systems such as Apache Spark or Apache Flink enable...

DV-ARPA: Data Variety Aware Resource Provisioning for Big Data Processing in Accumulative Applications

In Cloud Computing, the resource provisioning approach used has a great ...

Revisiting Query Performance in GPU Database Systems

GPUs offer massive compute parallelism and high-bandwidth memory accesse...

Architecture-Guided Test Resource Allocation Via Logic

We introduce a new logic named Quantitative Confidence Logic (QCL) that ...

Concept-oriented model: Modeling and processing data using functions

We describe a new logical data model, called the concept-oriented model ...

Astronomical Pipeline Provenance: A Use Case Evaluation

In this decade astronomy is undergoing a paradigm shift to handle data f...

Online Convex Optimization in Changing Environments and its Application to Resource Allocation

In the era of the big data, we create and collect lots of data from all ...

Please sign up or login with your details

Forgot password? Click here to reset