Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

02/25/2021

∙

In this paper, it is shown theoretically that spurious local minima are common for deep fully-connected networks and convolutional neural networks (CNNs) with piecewise linear activation functions and datasets that cannot be fitted by linear models. A motivating example is given to explain the reason for the existence of spurious local minima: each output neuron of deep fully-connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) output, and different pieces of CPWL output can fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to prevalence of spurious local minima. This result is proved in general settings with any continuous loss function. The main proof technique is to represent a CPWL function as a maximization over minimization of linear pieces. Deep ReLU networks are then constructed to produce these linear pieces and implement maximization and minimization operations.

READ FULL TEXT

Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

Piecewise linear activations substantially shape the loss surfaces of neural networks

Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations

SPINE: Soft Piecewise Interpretable Neural Equations

No Spurious Local Minima: on the Optimization Landscapes of Wide and Deep Neural Networks

Piecewise Strong Convexity of Neural Networks

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

On the Number of Linear Functions Composing Deep Neural Network: Towards a Refined Definition of Neural Networks Complexity

Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

Related Research

Piecewise linear activations substantially shape the loss surfaces of neural networks

Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations

SPINE: Soft Piecewise Interpretable Neural Equations

No Spurious Local Minima: on the Optimization Landscapes of Wide and Deep Neural Networks

Piecewise Strong Convexity of Neural Networks

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

On the Number of Linear Functions Composing Deep Neural Network: Towards a Refined Definition of Neural Networks Complexity