Sequential Stratified Regeneration: MCMC for Large State Spaces with an Application to Subgraph Counting Estimation

12/07/2020
by   Carlos H. C. Teixeira, et al.
0

This work considers the general task of estimating the sum of a bounded function over the edges of a graph that is unknown a priori, where graph vertices and edges are built on-the-fly by an algorithm and the resulting graph is too large to be kept in memory or disk. Prior work proposes Markov Chain Monte Carlo (MCMC) methods that simultaneously sample and generate the graph, eliminating the need for storage. Unfortunately, these existing methods are not scalable to massive real-world graphs. In this paper, we introduce Ripple, an MCMC-based estimator which achieves unprecedented scalability in this task by stratifying the MCMC Markov chain state space with a new technique that we denote ordered sequential stratified Markov regenerations. We show that the Ripple estimator is consistent, highly parallelizable, and scales well. In particular, applying Ripple to the task of estimating connected induced subgraph counts on large graphs, we empirically demonstrate that Ripple is accurate and is able to estimate counts of up to 12-node subgraphs, a task at a scale that has been considered unreachable, not only by prior MCMC-based methods, but also by other sampling approaches. For instance, in this target application, we present results where the Markov chain state space is as large as 10^43, for which Ripple computes estimates in less than 4 hours on average.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2019

Non-Markovian Monte Carlo on Directed Graphs

Markov Chain Monte Carlo (MCMC) has been the de facto technique for samp...
research
04/05/2023

A large deviation principle for the empirical measures of Metropolis-Hastings chains

To sample from a given target distribution, Markov chain Monte Carlo (MC...
research
01/26/2020

Improved mixing time for k-subgraph sampling

Understanding the local structure of a graph provides valuable insights ...
research
09/18/2018

State-Dependent Kernel Selection for Conditional Sampling of Graphs

This paper introduces new efficient algorithms for two problems: samplin...
research
08/18/2019

StreamNet: A DAG System with Streaming Graph Computing

To achieve high throughput in the POW based blockchain systems, a series...
research
09/27/2018

Fast and Scalable Position-Based Layout Synthesis

The arrangement of objects into a layout can be challenging for non-expe...
research
05/25/2021

Convergence criteria for sampling random graphs with specified degree sequences

The configuration model is a standard tool for generating random graphs ...

Please sign up or login with your details

Forgot password? Click here to reset