A Model and Survey of Distributed Data-Intensive Systems

by   Alessandro Margara, et al.

Data is a precious resource in today's society, and is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in modern software platforms. These challenges radically transformed all research fields that gravitate around data management and processing, with the introduction of distributed data-intensive systems that offer new programming models and implementation strategies to handle data characteristics such as volume, velocity, heterogeneity, and distribution. Each data-intensive system brings its specific choices in terms of data model, usage assumptions, synchronization, processing strategy, deployment, guarantees in terms of consistency, fault tolerance, ordering. Yet, the problems data-intensive systems face and the solutions they propose are frequently overlapping. This paper proposes a unifying model that dissects the core functionalities of data-intensive systems, and precisely discusses alternative design and implementation strategies, pointing out their assumptions and implications. The model offers a common ground to understand and compare highly heterogeneous solutions, with the potential of fostering cross-fertilization across research communities and advancing the field. We apply our model by classifying tens of systems and this exercise guides interesting observations on the current state of things and on open research directions.


page 1

page 4

page 41

page 42


Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applications

The replication mechanism resolves some challenges with big data such as...

A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing

We are living in the era of Big Data and witnessing the explosion of dat...

Cybercosm: New Foundations for a Converged Science Data Ecosystem

Scientific communities naturally tend to organize around data ecosystems...

On Testing of Data-Intensive Software Systems

Today's software systems like cyber-physical production systems or big d...

The Noir Dataflow Platform: Efficient Data Processing without Complexity

Today, data analysis drives the decision-making process in virtually eve...

Technical Debt in Data-Intensive Software Systems

The ever-increasing amount, variety as well as generation and processing...

Data-access performance anti-patterns in data-intensive systems

Data-intensive systems handle variable, high volume, and high-velocity d...

Please sign up or login with your details

Forgot password? Click here to reset