Concentration Bounds for Co-occurrence Matrices of Markov Chains

by   Jiezhong Qiu, et al.

Co-occurrence statistics for sequential data are common and important data signals in machine learning, which provide rich correlation and clustering information about the underlying object space. We give the first bound on the convergence rate of estimating the co-occurrence matrix of a regular (aperiodic and irreducible) finite Markov chain from a single random trajectory. Our work is motivated by the analysis of a well-known graph learning algorithm DeepWalk by [Qiu et al. WSDM '18], who study the convergence (in probability) of co-occurrence matrix from random walk on undirected graphs in the limit, but left the convergence rate an open problem. We prove a Chernoff-type bound for sums of matrix-valued random variables sampled via an ergodic Markov chain, generalizing the regular undirected graph case studied by [Garg et al. STOC '18]. Using the Chernoff-type bound, we show that given a regular Markov chain with n states and mixing time τ, we need a trajectory of length O(τ (log(n)+log(τ))/ϵ^2) to achieve an estimator of the co-occurrence matrix with error bound ϵ. We conduct several experiments and the experimental results are consistent with the exponentially fast convergence rate from theoretical analysis. Our result gives the first sample complexity analysis in graph representation learning.


page 1

page 2

page 3

page 4


A New Berry-Esseen Theorem for Expander Walks

We prove that the sum of t boolean-valued random variables sampled by a ...

High-precision Estimation of Random Walks in Small Space

In this paper, we provide a deterministic Õ(log N)-space algorithm for e...

Geometric Bounds on the Fastest Mixing Markov Chain

In the Fastest Mixing Markov Chain problem, we are given a graph G = (V,...

Comparing the Switch and Curveball Markov Chains for Sampling Binary Matrices with Fixed Marginals

The Curveball algorithm is a variation on well-known switch-based Markov...

Utilizing Network Structure to Bound the Convergence Rate in Markov Chain Monte Carlo Algorithms

We consider the problem of estimating the measure of subsets in very lar...

A Unified Markov Chain Approach to Analysing Randomised Search Heuristics

The convergence, convergence rate and expected hitting time play fundame...

Step-by-Step Community Detection for Volume-Regular Graphs

Spectral techniques have proved amongst the most effective approaches to...

Please sign up or login with your details

Forgot password? Click here to reset