AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural Networks

12/06/2021
by   Huu Le, et al.
0

We propose a new algorithm for training deep neural networks (DNNs) with binary weights. In particular, we first cast the problem of training binary neural networks (BiNNs) as a bilevel optimization instance and subsequently construct flexible relaxations of this bilevel program. The resulting training method shares its algorithmic simplicity with several existing approaches to train BiNNs, in particular with the straight-through gradient estimator successfully employed in BinaryConnect and subsequent methods. In fact, our proposed method can be interpreted as an adaptive variant of the original straight-through estimator that conditionally (but not always) acts like a linear mapping in the backward pass of error propagation. Experimental results demonstrate that our new algorithm offers favorable performance compared to existing approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2022

AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks

In this paper, we develop a new algorithm, Annealed Skewed SGD - AskewSG...
research
03/03/2022

AdaFamily: A family of Adam-like adaptive gradient methods

We propose AdaFamily, a novel method for training deep neural networks. ...
research
02/25/2020

Training Binary Neural Networks using the Bayesian Learning Rule

Neural networks with binary weights are computation-efficient and hardwa...
research
03/04/2019

Optimistic Adaptive Acceleration for Optimization

We consider a new variant of AMSGrad. AMSGrad RKK18 is a popular adaptiv...
research
09/06/2020

PSO-PS: Parameter Synchronization with Particle Swarm Optimization for Distributed Training of Deep Neural Networks

Parameter updating is an important stage in parallelism-based distribute...
research
03/15/2022

NINNs: Nudging Induced Neural Networks

New algorithms called nudging induced neural networks (NINNs), to contro...
research
03/24/2022

On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks

Adam and AdaBelief compute and make use of elementwise adaptive stepsize...

Please sign up or login with your details

Forgot password? Click here to reset