HyperLogLog (HLL) Security: Inflating Cardinality Estimates

by   Pedro Reviriego, et al.

Counting the number of distinct elements on a set is needed in many applications, for example to track the number of unique users in Internet services or the number of distinct flows on a network. In many cases, an estimate rather than the exact value is sufficient and thus many algorithms for cardinality estimation that significantly reduce the memory and computation requirements have been proposed. Among them, Hyperloglog has been widely adopted in both software and hardware implementations. The security of Hyperloglog has been recently studied showing that an attacker can create a set of elements that produces a cardinality estimate that is much smaller than the real cardinality of the set. This set can be used for example to evade detection systems that use Hyperloglog. In this paper, the security of Hyperloglog is considered from the opposite angle: the attacker wants to create a small set that when inserted on the Hyperloglog produces a large cardinality estimate. This set can be used to trigger false alarms in detection systems that use Hyperloglog but more interestingly, it can be potentially used to inflate the visits to websites or the number of hits of online advertisements. Our analysis shows that an attacker can create a set with a number of elements equal to the number of registers used in the Hyperloglog implementation that produces any arbitrary cardinality estimate. This has been validated in two commercial implementations of Hyperloglog: Presto and Redis. Based on those results, we also consider the protection of Hyperloglog against such an attack.


page 1

page 2

page 3

page 4


Security of HyperLogLog (HLL) Cardinality Estimation: Vulnerabilities and Protection

Count distinct or cardinality estimates are widely used in network monit...

Cardinality estimation using Gumbel distribution

Cardinality estimation is the task of approximating the number of distin...

Cardinality Estimation in a Virtualized Network Device Using Online Machine Learning

Cardinality estimation algorithms receive a stream of elements, with pos...

Cardinality Estimators do not Preserve Privacy

Cardinality estimators like HyperLogLog are sketching algorithms that es...

Privacy-Preserving Record Linkage for Cardinality Counting

Several applications require counting the number of distinct items in th...

A Comparison of Lex Bounds for Multiset Variables in Constraint Programming

Set and multiset variables in constraint programming have typically been...

Secure (S)Hell: Introducing an SSH Deception Proxy Framework

Deceiving an attacker in the network security domain is a well establish...

Please sign up or login with your details

Forgot password? Click here to reset