Estimating the Size of a Large Network and its Communities from a Random Sample

10/26/2016
by   Lin Chen, et al.
0

Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V;E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that correctly estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios. We conclude with extensions and directions for future work.

READ FULL TEXT
research
02/21/2018

Counting Motifs with Graph Sampling

Applied researchers often construct a network from a random sample of no...
research
08/14/2018

Estimating the size of a hidden finite set: large-sample behavior of estimators

A finite set is "hidden" if its elements are not directly enumerable or ...
research
01/12/2018

Estimating the Number of Connected Components in a Graph via Subgraph Sampling

Learning properties of large graphs from samples has been an important p...
research
01/21/2018

Preferential Attachment Graphs with Planted Communities

A variation of the preferential attachment random graph model of Barabás...
research
01/12/2022

Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models

Network-based clustering methods frequently require the number of commun...
research
10/24/2017

Provable and practical approximations for the degree distribution using sublinear graph samples

The degree distribution is one of the most fundamental properties used i...
research
10/23/2018

Novel Adaptive Algorithms for Estimating Betweenness, Coverage and k-path Centralities

An important index widely used to analyze social and information network...

Please sign up or login with your details

Forgot password? Click here to reset