Rethinking Abstractions for Big Data: Why, Where, How, and What

by   Mary Hall, et al.

Big data refers to large and complex data sets that, under existing approaches, exceed the capacity and capability of current compute platforms, systems software, analytical tools and human understanding. Numerous lessons on the scalability of big data can already be found in asymptotic analysis of algorithms and from the high-performance computing (HPC) and applications communities. However, scale is only one aspect of current big data trends; fundamentally, current and emerging problems in big data are a result of unprecedented complexity–in the structure of the data and how to analyze it, in dealing with unreliability and redundancy, in addressing the human factors of comprehending complex data sets, in formulating meaningful analyses, and in managing the dense, power-hungry data centers that house big data. The computer science solution to complexity is finding the right abstractions, those that hide as much triviality as possible while revealing the essence of the problem that is being addressed. The "big data challenge" has disrupted computer science by stressing to the very limits the familiar abstractions which define the relevant subfields in data analysis, data management and the underlying parallel systems. As a result, not enough of these challenges are revealed by isolating abstractions in a traditional software stack or standard algorithmic and analytical techniques, and attempts to address complexity either oversimplify or require low-level management of details. The authors believe that the abstractions for big data need to be rethought, and this reorganization needs to evolve and be sustained through continued cross-disciplinary collaboration.


page 1

page 2

page 3

page 4


Does Big Data Require Complex Systems? A Performance Comparison Between Spark and Unicage Shell Scripts

The paradigm of big data is characterized by the need to collect and pro...

The demise of the filesystem and multi level service architecture

Many astronomy data centres still work on filesystems. Industry has move...

Methods and Experiences for Developing Abstractions for Data-intensive, Scientific Applications

Developing software for scientific applications that require the integra...

Towards an Integrated Platform for Big Data Analysis

The amount of data in the world is expanding rapidly. Every day, huge am...

The Future is Big Graphs! A Community View on Graph Processing Systems

Graphs are by nature unifying abstractions that can leverage interconnec...

Comment: A brief survey of the current state of play for Bayesian computation in data science at Big-Data scale

We wish to contribute to the discussion of "Comparing Consensus Monte Ca...

ProvLet: A Provenance Management Service for Long Tail Microscopy Data

Provenance management must be present to enhance the overall security an...

Please sign up or login with your details

Forgot password? Click here to reset