An Efficient and Balanced Graph Partition Algorithm for the Subgraph-Centric Programming Model on Large-scale Power-law Graphs

by   Shuai Zhang, et al.

The subgraph-centric programming model is a promising approach and has been applied in many state-of-the-art distributed graph computing frameworks. The graph partition algorithm plays an important role in the overall performance of subgraph-centric frameworks. However, traditional graph partition algorithms have significant difficulties in processing large-scale power-law graphs. The major problem is the communication bottleneck found in many subgraph-centric frameworks. Detailed analysis indicates that the communication bottleneck is caused by the huge communication volume or the extreme message imbalance among partitioned subgraphs. The traditional partition algorithms do not consider both factors at the same time, especially on power-law graphs. In this paper, we propose a novel efficient and balanced greedy graph partition algorithm (EBG) which grants appropriate weights to the overall communication cost reduction and communication balance. We observe that the number of replicated vertices and the balance of edge and vertex assignment have a great influence on communication patterns of distributed subgraph-centric frameworks, which further affect the overall performance. Based on this insight, we design an evaluation function that quantifies the proportion of replicated vertices and the balance of edges and vertices assignments as important parameters. Experiments show that EBG reduces replication factor and communication by 32.3 and 24.3 in the subgraph-centric framework, it reduces the running time on power-law graphs by an average of 25.4 algorithm. Our results indicate that EBG has a great potential in improving the performance of subgraph-centric frameworks for the parallel large-scale power-law graph processing.


page 1

page 2

page 3

page 4


DRONE: a Distributed Subgraph-Centric Framework for Processing Large Scale Power-law Graphs

Nowadays, in the big data era, social networks, graph databases, knowled...

Distributed Algorithms for Subgraph-Centric Graph Platforms

Graph analytics for large scale graphs has gained interest in recent yea...

DRONE: a Distributed gRaph cOmputiNg Engine

Nowadays, in big data era, social networks, graph database, knowledge gr...

A Partition-centric Distributed Algorithm for Identifying Euler Circuits in Large Graphs

Finding the Eulerian circuit in graphs is a classic problem, but inadequ...

Composing Optimization Techniques for Vertex-Centric Graph Processing via Communication Channels

Pregel's vertex-centric model allows us to implement many interesting gr...

Fast and Robust Distributed Subgraph Enumeration

We study the classic subgraph enumeration problem under distributed sett...

Sparse Allreduce: Efficient Scalable Communication for Power-Law Data

Many large datasets exhibit power-law statistics: The web graph, social ...

Please sign up or login with your details

Forgot password? Click here to reset