ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

07/07/2023
by   Gamze İslamoğlu, et al.
0

Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movement and energy consumption. ITA achieves competitive energy efficiency with respect to state-of-the-art transformer accelerators with 16.9 TOPS/W, while outperforming them in area efficiency with 5.93 TOPS/mm^2 in 22 nm fully-depleted silicon-on-insulator technology at 0.8 V.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2023

TRON: Transformer Neural Network Acceleration with Non-Coherent Silicon Photonics

Transformer neural networks are rapidly being integrated into state-of-t...
research
04/08/2023

SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers

Transformers' compute-intensive operations pose enormous challenges for ...
research
05/09/2022

Row-wise Accelerator for Vision Transformer

Following the success of the natural language processing, the transforme...
research
03/16/2021

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

Transformers have transformed the field of natural language processing. ...
research
12/14/2015

Origami: A 803 GOp/s/W Convolutional Network Accelerator

An ever increasing number of computer vision and image/video processing ...
research
02/20/2023

Optical Transformers

The rapidly increasing size of deep-learning models has caused renewed a...
research
11/14/2022

BiViT: Extremely Compressed Binary Vision Transformer

Model binarization can significantly compress model size, reduce energy ...

Please sign up or login with your details

Forgot password? Click here to reset