DFOGraph: An I/O- and Communication-Efficient System for Distributed Fully-out-of-Core Graph Processing

01/18/2021
by   Jiping Yu, et al.
0

With the magnitude of graph-structured data continually increasing, graph processing systems that can scale-out and scale-up are needed to handle extreme-scale datasets. While existing distributed out-of-core solutions have made it possible, they suffer from limited performance due to excessive I/O and communication costs. We present DFOGraph, a distributed fully-out-of-core graph processing system that applies and assembles multiple techniques to enable I/O- and communication-efficient processing. DFOGraph builds upon two-level column-oriented partition with adaptive compressed representations to allow fine-grained selective computation and communication, and it only issues necessary disk and network requests. Our evaluation shows DFOGraph achieves performance comparable to GridGraph and FlashGraph (>2.52x and 1.06x) on a single machine and outperforms Chaos and HybridGraph significantly (>12.94x and >10.82x) when scaling out.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2021

Kudu: An Efficient and Scalable Distributed Graph Pattern Mining Engine

This paper proposes Kudu, a general distributed execution engine with a ...
research
10/09/2018

GraphMP: I/O-Efficient Big Graph Analytics on a Single Commodity Machine

Recent studies showed that single-machine graph processing systems can b...
research
12/11/2021

Distributed Graph Learning with Smooth Data Priors

Graph learning is often a necessary step in processing or representing s...
research
04/26/2022

From Sand to Flour: The Next Leap in Granular Computing with NanoSort

The granularity of distributed computing is limited by communication tim...
research
01/14/2020

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond

Distributed learning has become a critical enabler of the massively conn...
research
05/18/2015

Graph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning

Distributed computing excels at processing large scale data, but the com...
research
03/12/2019

Distributed Dependency Discovery

We analyze the problem of discovering dependencies from distributed big ...

Please sign up or login with your details

Forgot password? Click here to reset