A High Throughput Parallel Hash Table on FPGA using XOR-based Memory

by   Ruizhi Zhang, et al.

Hash table is a fundamental data structure for quick search and retrieval of data. It is a key component in complex graph analytics and AI/ML applications. State-of-the-art parallel hash table implementations either make some simplifying assumptions such as supporting only a subset of hash table operations or employ optimizations that lead to performance that is highly data dependent and in the worst case can be similar to a sequential implementation. In contrast, in this work we develop a dynamic hash table that supports all the hash table queries - search, insert, delete, update, while allowing us to support 'p' parallel queries (p>1) per clock cycle via p processing engines (PEs) in the worst case i.e. the performance is data agnostic. We achieve this by implementing novel XOR based multi-ported block memories on FPGAs. Additionally, we develop a technique to optimize the memory requirement of the hash table if the ratio of search to insert/update/delete queries is known beforehand. We implement our design on state-of-the-art FPGA devices. Our design is scalable to 16 PEs and supports throughput up to 5926 MOPS. It matches the throughput of the state-of-the-art hash table design - FASTHash, which only supports search and insert operations. Comparing with the best FPGA design that supports the same set of operations, our hash table achieves up to 12.3x speedup.


page 1

page 3

page 4


A Dynamic Hash Table for the GPU

We design and implement a fully concurrent dynamic hash table for GPUs w...

SIMD-Optimized Search Over Sorted Data

Applications often require a fast, single-threaded search algorithm over...

GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis

Streaming graph processing involves performing updates and analytics on ...

Two Dimensional Router: Design and Implementation

Higher dimensional classification has attracted more attentions with inc...

High Performance Architecture for Flow-Table Lookup in SDN on FPGA

We propose Range-based Ternary Search Tree (RTST), a tree-based approach...

Storing a Trie with Compact and Predictable Space

This paper proposed a storing approach for trie structures, called coord...

Quick NAT: High performance NAT system on commodity platforms

NAT gateway is an important network system in today's IPv4 network when ...

Please sign up or login with your details

Forgot password? Click here to reset