Locally Uniform Hashing

08/27/2023
by   Ioana O. Bercea, et al.
0

Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper, we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their performance. To this end, we present tornado tabulation hashing, which is simple, fast, and exhibits a certain full, local randomness property that provably makes diverse algorithms perform almost as if (abstract) fully-random hashing was used. For example, this includes classic linear probing, the widely used HyperLogLog algorithm of Flajolet, Fusy, Gandouet, Meunier [AOFA 97] for counting distinct elements, and the one-permutation hashing of Li, Owen, and Zhang [NIPS 12] for large-scale machine learning. We also provide a very efficient solution for the classical problem of obtaining fully-random hashing on a fixed (but unknown to the hash function) set of n keys using O(n) space. As a consequence, we get more efficient implementations of the splitting trick of Dietzfelbinger and Rink [ICALP'09] and the succinct space uniform hashing of Pagh and Pagh [SICOMP'08]. Tornado tabulation hashing is based on a simple method to systematically break dependencies in tabulation-based hashing techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2020

The Power of Hashing with Mersenne Primes

The classic way of computing a k-universal hash function is to use a ran...
research
02/07/2021

Additive Feature Hashing

The hashing trick is a machine learning technique used to encode categor...
research
08/21/2018

Composite Hashing for Data Stream Sketches

In rapid and massive data streams, it is often not possible to estimate ...
research
11/23/2017

Practical Hash Functions for Similarity Estimation and Dimensionality Reduction

Hashing is a basic tool for dimensionality reduction employed in several...
research
12/26/2018

Towards a Theoretical Understanding of Hashing-Based Neural Nets

Parameter reduction has been an important topic in deep learning due to ...
research
07/16/2018

A Lyra2 FPGA Core for Lyra2REv2-Based Cryptocurrencies

Lyra2REv2 is a hashing algorithm that consists of a chain of individual ...
research
12/23/2018

AnchorHash: A Scalable Consistent Hash

Consistent hashing (CH) is a central building block in many networking a...

Please sign up or login with your details

Forgot password? Click here to reset