Skew-Oblivious Data Routing for Data-Intensive Applications on FPGAs with HLS

by   Xinyu Chen, et al.

FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the stateof-the-art designs in both throughput and BRAM usage efficiency.


Development of a Burst Buffer System for Data-Intensive Applications

Modern parallel filesystems such as Lustre are designed to provide high,...

PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework

Graph mining applications, such as subgraph pattern matching and mining,...

Query Complexity Based Optimal Processing of Raw Data

The paper aims to find an efficient way for processing large datasets ha...

Dynamic Control of Data-Intensive Services over Edge Computing Networks

Next-generation distributed computing networks (e.g., edge and fog compu...

Sea: A lightweight data-placement library for Big Data scientific computing

The recent influx of open scientific data has contributed to the transit...

Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design

In recent years, the CNNs have achieved great successes in the image pro...

High Level Synthesis Implementation of a Three-dimensional Systolic Array Architecture for Matrix Multiplications on Intel Stratix 10 FPGAs

In this paper, we consider the HLS implementation of a three-dimensional...

Please sign up or login with your details

Forgot password? Click here to reset