Deep Tensor Convolution on Multicores

11/20/2016
by   David Budden, et al.
0

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster Winograd-class of convolutional algorithms to the N-dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multicore scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2-dimensional ConvNets as a special (and the least beneficial) case of our approach, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2020

SParSH-AMG: A library for hybrid CPU-GPU algebraic multigrid and preconditioned iterative methods

Hybrid CPU-GPU algorithms for Algebraic Multigrid methods (AMG) to effic...
research
04/19/2020

Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms

The widely-adopted practice is to train deep learning models with specia...
research
05/21/2018

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for ...
research
04/26/2021

Capstan: A Vector RDA for Sparsity

This paper proposes Capstan: a scalable, parallel-patterns-based, reconf...
research
12/22/2022

Accelerating CNN inference on long vector architectures via co-design

CPU-based inference can be an alternative to off-chip accelerators, and ...
research
04/16/2017

In-Datacenter Performance Analysis of a Tensor Processing Unit

Many architects believe that major improvements in cost-energy-performan...
research
03/18/2019

PZnet: Efficient 3D ConvNet Inference on Manycore CPUs

Convolutional nets have been shown to achieve state-of-the-art accuracy ...

Please sign up or login with your details

Forgot password? Click here to reset