Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

02/25/2021
by   Bo Liu, et al.
0

In this paper, it is shown theoretically that spurious local minima are common for deep fully-connected networks and convolutional neural networks (CNNs) with piecewise linear activation functions and datasets that cannot be fitted by linear models. A motivating example is given to explain the reason for the existence of spurious local minima: each output neuron of deep fully-connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) output, and different pieces of CPWL output can fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to prevalence of spurious local minima. This result is proved in general settings with any continuous loss function. The main proof technique is to represent a CPWL function as a maximization over minimization of linear pieces. Deep ReLU networks are then constructed to produce these linear pieces and implement maximization and minimization operations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2020

Piecewise linear activations substantially shape the loss surfaces of neural networks

Understanding the loss surface of a neural network is fundamentally impo...
research
12/28/2018

Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations

In this paper, we study the loss surface of the over-parameterized fully...
research
11/20/2021

SPINE: Soft Piecewise Interpretable Neural Equations

Relu Fully Connected Networks are ubiquitous but uninterpretable because...
research
10/02/2020

No Spurious Local Minima: on the Optimization Landscapes of Wide and Deep Neural Networks

Empirical studies suggest that wide neural networks are comparably easy ...
research
10/30/2018

Piecewise Strong Convexity of Neural Networks

We study the loss surface of a fully connected neural network with ReLU ...
research
02/23/2022

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

We study the loss landscape of training problems for deep artificial neu...
research
10/23/2020

On the Number of Linear Functions Composing Deep Neural Network: Towards a Refined Definition of Neural Networks Complexity

The classical approach to measure the expressive power of deep neural ne...

Please sign up or login with your details

Forgot password? Click here to reset