maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

01/27/2015
by   Andrew Lavin, et al.
0

This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3 computational efficiency on typical deep learning network architectures. The design combines ideas from cuda-convnet2 with the Maxas SGEMM assembly code. We only address forward propagation (FPROP) operation of the network, but we believe that the same techniques used here will be effective for backward propagation (BPROP) as well.

READ FULL TEXT
research
03/30/2021

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based ...
research
01/25/2016

Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add

Convolutional neural networks (CNNs) are currently state-of-the-art for ...
research
09/08/2022

Kernel-Segregated Transpose Convolution Operation

Transpose convolution has shown prominence in many deep learning applica...
research
03/27/2018

Diagonalwise Refactorization: An Efficient Training Method for Depthwise Convolutions

Depthwise convolutions provide significant performance benefits owing to...
research
01/23/2023

A Structural Approach to the Design of Domain Specific Neural Network Architectures

This is a master's thesis concerning the theoretical ideas of geometric ...
research
07/16/2018

Computationally Efficient Approaches for Image Style Transfer

In this work, we have investigated various style transfer approaches and...
research
12/15/2014

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

We present highly efficient algorithms for performing forward and backwa...

Please sign up or login with your details

Forgot password? Click here to reset