L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks

by   Shuang Wu, et al.

Batch Normalization (BN) has been proven to be quite effective at accelerating and improving the training of deep neural networks (DNNs). However, BN brings additional computation, consumes more memory and generally slows down the training process by a large margin, which aggravates the training effort. Furthermore, the nonlinear square and root operations in BN also impede the low bit-width quantization techniques, which draws much attention in deep learning hardware community. In this work, we propose an L1-norm BN (L1BN) with only linear operations in both the forward and the backward propagations during training. L1BN is shown to be approximately equivalent to the original L2-norm BN (L2BN) by multiplying a scaling factor. Experiments on various convolutional neural networks (CNNs) and generative adversarial networks (GANs) reveal that L1BN maintains almost the same accuracies and convergence rates compared to L2BN but with higher computational efficiency. On FPGA platform, the proposed signum and absolute operations in L1BN can achieve 1.5× speedup and save 50% power consumption, compared with the original costly square and root operations, respectively. This hardware-friendly normalization method not only surpasses L2BN in speed, but also simplify the hardware design of ASIC accelerators with higher energy efficiency. Last but not the least, L1BN promises a fully quantized training of DNNs, which is crucial to future adaptive terminal devices.


page 1

page 6

page 7


Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers

Deep neural network (DNN) quantization converting floating-point (FP) da...

FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference

To improve the throughput and energy efficiency of Deep Neural Networks ...

An Efficient FPGA-Based Accelerator for Swin Transformer

Since introduced, Swin Transformer has achieved remarkable results in th...

CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks

Accelerating the inference of a trained DNN is a well studied subject. I...

LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training

When training early-stage deep neural networks (DNNs), generating interm...

Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization

Huge computational costs brought by convolution and batch normalization ...

Batch Normalization Sampling

Deep Neural Networks (DNNs) thrive in recent years in which Batch Normal...

Please sign up or login with your details

Forgot password? Click here to reset