AI Chat AI Image Generator AI Video Text to Speech

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

01/27/2015

∙

by Andrew Lavin, et al.

∙

∙

This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3 computational efficiency on typical deep learning network architectures. The design combines ideas from cuda-convnet2 with the Maxas SGEMM assembly code. We only address forward propagation (FPROP) operation of the network, but we believe that the same techniques used here will be effective for backward propagation (BPROP) as well.

research

∙ 03/30/2021

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based ...

0 Marc Jordà, et al. ∙

research

∙ 01/25/2016

Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add

Convolutional neural networks (CNNs) are currently state-of-the-art for ...

0 Tyler Highlander, et al. ∙

research

∙ 09/08/2022

Kernel-Segregated Transpose Convolution Operation

Transpose convolution has shown prominence in many deep learning applica...

0 Vijay Srinivas Tida, et al. ∙

research

∙ 03/27/2018

Diagonalwise Refactorization: An Efficient Training Method for Depthwise Convolutions

Depthwise convolutions provide significant performance benefits owing to...

0 Zheng Qin, et al. ∙

research

∙ 01/23/2023

A Structural Approach to the Design of Domain Specific Neural Network Architectures

This is a master's thesis concerning the theoretical ideas of geometric ...

0 Gerrit Nolte, et al. ∙

research

∙ 07/16/2018

Computationally Efficient Approaches for Image Style Transfer

In this work, we have investigated various style transfer approaches and...

0 Ram Krishna Pandey, et al. ∙

research

∙ 12/15/2014

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

We present highly efficient algorithms for performing forward and backwa...

0 Hongsheng Li, et al. ∙