High Level Synthesis Implementation of a Three-dimensional Systolic Array Architecture for Matrix Multiplications on Intel Stratix 10 FPGAs
In this paper, we consider the HLS implementation of a three-dimensional systolic array architecture for matrix multiplication that targets specific characteristics of Intel Stratix 10 FPGAs in order to produce designs that achieve a high floating-point throughput using most of the DSPs at high frequencies in a way that avoids the congestion of the routing fabric. The investigated three-dimensional systolic array architecture is able to produce hardware designs that use 99 that let us achieve performances above 3 TFLOPS.
READ FULL TEXT