DiSLR: Distributed Sampling with Limited Redundancy For Triangle Counting in Graph Streams

02/12/2018
by   Kijung Shin, et al.
0

Given a web-scale graph that grows over time, how should its edges be stored and processed on multiple machines for rapid and accurate estimation of the count of triangles? The count of triangles (i.e., cliques of size three) has proven useful in many applications, including anomaly detection, community detection, and link recommendation. For triangle counting in large and dynamic graphs, recent work has focused largely on streaming algorithms and distributed algorithms. To achieve the advantages of both approaches, we propose DiSLR, a distributed streaming algorithm that estimates the counts of global triangles and local triangles associated with each node. Making one pass over the input stream, DiSLR carefully processes and stores the edges across multiple machines so that the redundant use of computational and storage resources is minimized. Compared to its best competitors, DiSLR is (a) Accurate: giving up to 39X smaller estimation error, (b) Fast: up to 10.4X faster, scaling linearly with the number of edges in the input stream, and (c) Theoretically sound: yielding unbiased estimates with variances decreasing faster as the number of machines is scaled up.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2017

WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams

If we cannot store all edges in a graph stream, which edges should we st...
research
11/22/2018

REPT: A Streaming Algorithm of Approximating Global and Local Triangle Counts in Parallel

Recently, considerable efforts have been devoted to approximately comput...
research
03/29/2020

How the Degeneracy Helps for Triangle Counting in Graph Streams

We revisit the well-studied problem of triangle count estimation in grap...
research
11/13/2022

Reinforcement Learning Enhanced Weighted Sampling for Accurate Subgraph Counting on Fully Dynamic Graph Streams

As the popularity of graph data increases, there is a growing need to co...
research
10/07/2018

Graphlet Count Estimation via Convolutional Neural Networks

Graphlets are defined as k-node connected induced subgraph patterns. For...
research
02/28/2022

Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching

Triangle count and local clustering coefficient are two core metrics for...
research
01/06/2022

BFS based distributed algorithm for parallel local directed sub-graph enumeration

Estimating the frequency of sub-graphs is of importance for many tasks, ...

Please sign up or login with your details

Forgot password? Click here to reset