Wide and Deep Neural Networks Achieve Optimality for Classification

While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are optimal for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that achieve optimality. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and Neural Tangent Kernels, we provide explicit activation functions that can be used to construct networks that achieve optimality. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: (1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); (2) majority vote (model predictions are given by the label of the class with greatest representation in the training set); or (3) singular kernel classifiers (a set of classifiers containing those that achieve optimality). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.


page 1

page 2

page 3

page 4


Ensemble of Convolutional Neural Networks Trained with Different Activation Functions

Activation functions play a vital role in the training of Convolutional ...

Activation function dependence of the storage capacity of treelike neural networks

The expressive power of artificial neural networks crucially depends on ...

Most Activation Functions Can Win the Lottery Without Excessive Depth

The strong lottery ticket hypothesis has highlighted the potential for t...

Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks

The primary neural networks decision-making units are activation functio...

DANTE: Deep AlterNations for Training nEural networks

We present DANTE, a novel method for training neural networks using the ...

Polynomial Networks in Deep Classifiers

Deep neural networks have been the driving force behind the success in c...

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Do neural networks, trained on well-understood algorithmic tasks, reliab...

Please sign up or login with your details

Forgot password? Click here to reset