PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

by   Jia Zou, et al.

This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.


page 1

page 2

page 3

page 4


HPS: A C++11 High Performance Serialization Library

Data serialization is a common and crucial component in high performance...

DynaSOAr: A Parallel Memory Allocator for Object-oriented Programming on GPUs with Efficient Memory Access

Object-oriented programming has long been regarded as too inefficient fo...

High Performance Cluster Computing for MapReduce

MapReduce is a technique used to vastly improve distributed processing o...


Matrix libraries often focus on achieving high performance for problems ...

Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

Recently, due to rapid development of information and communication tech...

HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks

Data-intensive applications impact many domains, and their steadily incr...

Please sign up or login with your details

Forgot password? Click here to reset