Accelerator Codesign as Non-Linear Optimization

by   Nirmal Prajapati, et al.

We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable GPGPUs. We first introduce a simple, analytical model for the silicon area usage of accelerator architectures and a workload characterization of stencil computations. We combine this characterization with a parametric execution time model and formulate a mathematical optimization problem. That problem seeks to maximize a common objective function of 'all the hardware and software parameters'. The solution to this problem, therefore "solves" the codesign problem: simultaneously choosing software-hardware parameters to optimize total performance. We validate this approach by proposing architectural variants of the NVIDIA Maxwell GTX-980 (respectively, Titan X) specifically tuned to a predetermined workload of four common 2D stencils (Heat, Jacobi, Laplacian, and Gradient) and two 3D ones (Heat and Laplacian). Our model predicts that performance would potentially improve by 28 hardware parameters such as adapting coarse and fine-grained parallelism by changing the number of streaming multiprocessors and the number of compute cores each contains. We propose a set of Pareto-optimal design points to exploit the trade-off between performance and silicon area and show that by additionally eliminating GPU caches, we can get a further 2-fold improvement.


VSCNN: Convolution Neural Network Accelerator With Vector Sparsity

Hardware accelerator for convolution neural network (CNNs) enables real ...

HyGCN: A GCN Accelerator with Hybrid Architecture

In this work, we first characterize the hybrid execution patterns of GCN...

DANCE: Differentiable Accelerator/Network Co-Exploration

To cope with the ever-increasing computational demand of the DNN executi...

Predictable Accelerator Design with Time-Sensitive Affine Types

Field-programmable gate arrays (FPGAs) provide an opportunity to co-desi...

hxtorch: PyTorch for ANNs on BrainScaleS-2

We present software facilitating the usage of the BrainScaleS-2 analog n...

UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing

The recent development of deep learning has mostly been focusing on Eucl...

Please sign up or login with your details

Forgot password? Click here to reset