Raw Filtering of JSON Data on FPGAs

by   Tobias Hahn, et al.

Many Big Data applications include the processing of data streams on semi-structured data formats such as JSON. A disadvantage of such formats is that an application may spend a significant amount of processing time just on unselectively parsing all data. To relax this issue, the concept of raw filtering is proposed with the idea to remove data from a stream prior to the costly parsing stage. However, as accurate filtering of raw data is often only possible after the data has been parsed, raw filters are designed to be approximate in the sense of allowing false-positives in order to be implemented efficiently. Contrary to previously proposed CPU-based raw filtering techniques that are restricted to string matching, we present FPGA-based primitives for filtering strings, numbers and also number ranges. In addition, a primitive respecting the basic structure of JSON data is proposed that can be used to further increase the accuracy of introduced raw filters. The proposed raw filter primitives are designed to allow for their composition according to a given filter expression of a query. Thus, complex raw filters can be created for FPGAs which enable a drastical decrease in the amount of generated false-positives, particularly for IoT workload. As there exists a trade-off between accuracy and resource consumption, we evaluate primitives as well as composed raw filters using different queries from the RiotBench benchmark. Our results show that up to 94.3 can be filtered without producing any observed false-positives using only a few hundred LUTs.


page 1

page 2

page 3

page 4


Approximate Membership Query Filters with a False Positive Free Set

In the last decade, significant efforts have been made to reduce the fal...

Stretching Your Data With Taffy Filters

Popular approximate membership query structures such as Bloom filters an...

Partitioned Learned Bloom Filter

Bloom filters are space-efficient probabilistic data structures that are...

Matrix Bloom Filter: An Efficient Probabilistic Data Structure for 2-tuple Batch Lookup

With the growing scale of big data, probabilistic structures receive inc...

A Bloom Filter Survey: Variants for Different Domain Applications

There is a plethora of data structures, algorithms, and frameworks deali...

Image Stylization: From Predefined to Personalized

We present a framework for interactive design of new image stylizations ...

Practical Verifiable In-network Filtering for DDoS defense

In light of ever-increasing scale and sophistication of modern DDoS atta...

Please sign up or login with your details

Forgot password? Click here to reset