Toward a Better Understanding and Evaluation of Tree Structures on Flash SSDs

06/08/2020
by   Diego Didona, et al.
0

Solid-state drives (SSDs) are extensively used to deploy persistent data stores, as they provide low latency random access, high write throughput, high data density, and low cost. Tree-based data structures are widely used to build persistent data stores, and indeed they lie at the backbone of many of the data management systems used in production and research today. In this paper, we show that benchmarking a persistent tree-based data structure on an SSD is a complex process, which may easily incur subtle pitfalls that can lead to an inaccurate performance assessment. At a high-level, these pitfalls stem from the interaction of complex software running on complex hardware. On one hand, tree structures implement internal operations that have nontrivial effects on performance. On the other hand, SSDs employ firmware logic to deal with the idiosyncrasies of the underlying flash memory, which are well known to lead to complex performance dynamics. We identify seven benchmarking pitfalls using RocksDB and WiredTiger, two widespread implementations of an LSM-Tree and a B+Tree, respectively. We show that such pitfalls can lead to incorrect measurements of key performance indicators, hinder the reproducibility and the representativeness of the results, and lead to suboptimal deployments in production environments. We also provide guidelines on how to avoid these pitfalls to obtain more reliable performance measurements, and to perform more thorough and fair comparison among different design points.

READ FULL TEXT
research
05/29/2019

Don't Persist All : Efficient Persistent Data Structures

Data structures used in software development have inbuilt redundancy to ...
research
02/05/2020

Observations on Porting In-memory KV stores to Persistent Memory

Systems that require high-throughput and fault tolerance, such as key-va...
research
12/20/2019

Circ-Tree: A B+-Tree Variant with Circular Design for Persistent Memory

Several B+-tree variants have been developed to exploit the performance ...
research
01/07/2020

Data Structure Primitives on Persistent Memory: An Evaluation

Persistent Memory (PM), as already available e.g. with Intel Optane DC P...
research
04/15/2004

The Persistent Buffer Tree : An I/O-efficient Index for Temporal Data

In a variety of applications, we need to keep track of the development o...
research
05/06/2018

Wormhole: A Fast Ordered Index for In-memory Data Management

In-memory data management systems, such as key-value store, have become ...
research
09/29/2020

Montage: A General System for Buffered Durably Linearizable Data Structures

The recent emergence of fast, dense, nonvolatile main memory suggests th...

Please sign up or login with your details

Forgot password? Click here to reset