Efficient Centrality Maximization with Rademacher Averages

by   Leonardo Pellegrina, et al.

The identification of the set of k most central nodes of a graph, or centrality maximization, is a key task in network analysis, with various applications ranging from finding communities in social and biological networks to understanding which seed nodes are important to diffuse information in a graph. As the exact computation of centrality measures does not scale to modern-sized networks, the most practical solution is to resort to rigorous, but efficiently computable, randomized approximations. In this work we present CentRA, the first algorithm based on progressive sampling to compute high-quality approximations of the set of k most central nodes. CentRA is based on a novel approach to efficiently estimate Monte Carlo Rademacher Averages, a powerful tool from statistical learning theory to compute sharp data-dependent approximation bounds. Then, we study the sample complexity of centrality maximization using the VC-dimension, a key concept from statistical learning theory. We show that the number of random samples required to compute high-quality approximations scales with finer characteristics of the graph, such as its vertex diameter, or of the centrality of interest, significantly improving looser bounds derived from standard techniques. We apply CentRA to analyze large real-world networks, showing that it significantly outperforms the state-of-the-art approximation algorithm in terms of number of samples, running times, and accuracy.


page 1

page 2

page 3

page 4


SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher Bounds

Betweenness centrality is a popular centrality measure with applications...

ONBRA: Rigorous Estimation of the Temporal Betweenness Centrality in Temporal Networks

In network analysis, the betweenness centrality of a node informally cap...

Importance Sample-based Approximation Algorithm for Cost-aware Targeted Viral Marketing

Cost-aware Targeted Viral Marketing (CTVM), a generalization of Influenc...

On approximating the temporal betweenness centrality through sampling

We present a collection of sampling-based algorithms for approximating t...

PRESTO: Simple and Scalable Sampling Techniques for the Rigorous Approximation of Temporal Motif Counts

The identification and counting of small graph patterns, called network ...

Combinatorial Trace Method for Network Immunization

Immunizing a subset of nodes in a network - enabling them to identify an...

Scaling Expected Force: Efficient Identification of Key Nodes in Network-based Epidemic Models

Centrality measures are fundamental tools of network analysis as they hi...

Please sign up or login with your details

Forgot password? Click here to reset