Deep Learning as a Mixed Convex-Combinatorial Optimization Problem

10/31/2017
by   Abram L. Friesen, et al.
0

As neural networks grow deeper and wider, learning networks with hard-threshold activations is becoming increasingly important, both for network quantization, which can drastically reduce time and energy requirements, and for creating large integrated systems of deep networks, which may have non-differentiable components and must avoid vanishing and exploding gradients for effective learning. However, since gradient descent is not applicable to hard-threshold functions, it is not clear how to learn them in a principled way. We address this problem by observing that setting targets for hard-threshold hidden units in order to minimize loss is a discrete optimization problem, and can be solved as such. The discrete optimization goal is to find a set of targets such that each unit, including the output, has a linearly separable problem to solve. Given these targets, the network decomposes into individual perceptrons, which can then be learned with standard convex approaches. Based on this, we develop a recursive mini-batch algorithm for learning deep hard-threshold networks that includes the popular but poorly justified straight-through estimator as a special case. Empirically, we show that our algorithm improves classification accuracy in a number of settings, including for AlexNet and ResNet-18 on ImageNet, when compared to the straight-through estimator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2023

Globally Optimal Training of Neural Networks with Threshold Activation Functions

Threshold activation functions are highly preferable in neural networks ...
research
08/20/2015

The backtracking survey propagation algorithm for solving random K-SAT problems

Discrete combinatorial optimization has a central role in many scientifi...
research
06/20/2019

Submodular Batch Selection for Training Deep Neural Networks

Mini-batch gradient descent based methods are the de facto algorithms fo...
research
03/08/2019

Localizing an Unknown Number of mmW Transmitters Under Path Loss Model Uncertainties

This work estimates the position and the transmit power of multiple co-c...
research
04/16/2019

Most Frequent Itemset Optimization

In this paper we are dealing with the frequent itemset mining. We concen...
research
08/08/2022

Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions

Integrating functions on discrete domains into neural networks is key to...
research
06/23/2020

Differentiable Segmentation of Sequences

Segmented models are widely used to describe non-stationary sequential d...

Please sign up or login with your details

Forgot password? Click here to reset