Benchmarking Distributed Stream Processing Engines

02/23/2018
by   Jeyhun Karimov, et al.
0

Over the last years, stream data processing has been gaining attention both in industry and in academia due to its wide range of applications. To fulfill the need for scalable and efficient stream analytics, numerous open source stream data processing systems (SDPSs) have been developed, with high throughput and low latency being their key performance targets. In this paper, we propose a framework to evaluate the performance of three SDPSs, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations. For this benchmark, we design workloads based on real-life, industrial use-cases. The main contribution of this work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we completely separate the system under test and driver, so that the measurement results are closer to actual system performance under real conditions. Third, we build the first driver to test the actual sustainable performance of a system under test. Our detailed evaluation highlights that there is no single winner, but rather, each system excels in individual use-cases.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset