PaC-trees: Supporting Parallel and Compressed Purely-Functional Collections

04/12/2022
by   Laxman Dhulipala, et al.
0

Many modern programming languages are shifting toward a functional style for collection interfaces such as sets, maps, and sequences. Functional interfaces offer many advantages, including being safe for parallelism and providing simple and lightweight snapshots. However, existing high-performance functional interfaces such as PAM, which are based on balanced purely-functional trees, incur large space overheads for large-scale data analysis due to storing every element in a separate node in a tree. This paper presents PaC-trees, a purely-functional data structure supporting functional interfaces for sets, maps, and sequences that provides a significant reduction in space over existing approaches. A PaC-tree is a balanced binary search tree which blocks the leaves and compresses the blocks using arrays. We provide novel techniques for compressing and uncompressing the blocks which yield practical parallel functional algorithms for a broad set of operations on PaC-trees such as union, intersection, filter, reduction, and range queries which are both theoretically and practically efficient. Using PaC-trees we designed CPAM, a C++ library that implements the full functionality of PAM, while offering significant extra functionality for compression. CPAM consistently matches or outperforms PAM on a set of microbenchmarks on sets, maps, and sequences while using about a quarter of the space. On applications including inverted indices, 2D range queries, and 1D interval queries, CPAM is competitive with or faster than PAM, while using 2.1–7.8x less space. For static and streaming graph processing, CPAM offers 1.6x faster batch updates while using 1.3–2.6x less space than the state-of-the-art graph processing system Aspen.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2019

Low-Latency Graph Streaming Using Compressed Purely-Functional Trees

Due to the dynamic nature of real-world graphs, there has been a growing...
research
05/08/2023

CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers

This paper introduces the batch-parallel Compressed Packed Memory Array ...
research
08/29/2019

FunSeqSet: Towards a Purely Functional Data Structure for the Linearisation Case of Dynamic Trees Problem

Dynamic trees, originally described by Sleator and Tarjan, have been stu...
research
06/09/2022

Hinted Dictionaries: Efficient Functional Ordered Sets and Maps

This article introduces hinted dictionaries for expressing efficient ord...
research
03/04/2018

Two-Dimensional Block Trees

The Block Tree (BT) is a novel compact data structure designed to compre...
research
08/18/2022

A Verified Implementation of B+-Trees in Isabelle/HOL

In this paper we present the verification of an imperative implementatio...
research
07/17/2021

PI2: Generating Visual Analysis Interfaces From Queries

Interactive visual analysis interfaces are critical in nearly every data...

Please sign up or login with your details

Forgot password? Click here to reset