MGiaD: Multigrid in all dimensions. Efficiency and robustness by coarsening in resolution and channel dimensions
Current state-of-the-art deep neural networks for image classification are made up of 10 - 100 million learnable weights and are therefore inherently prone to overfitting. The complexity of the weight count can be seen as a function of the number of channels, the spatial extent of the input and the number of layers of the network. Due to the use of convolutional layers the scaling of weight complexity is usually linear with regards to the resolution dimensions, but remains quadratic with respect to the number of channels. Active research in recent years in terms of using multigrid inspired ideas in deep neural networks have shown that on one hand a significant number of weights can be saved by appropriate weight sharing and on the other that a hierarchical structure in the channel dimension can improve the weight complexity to linear. In this work, we combine these multigrid ideas to introduce a joint framework of multigrid inspired architectures, that exploit multigrid structures in all relevant dimensions to achieve linear weight complexity scaling and drastically reduced weight counts. Our experiments show that this structured reduction in weight count is able to reduce overfitting and thus shows improved performance over state-of-the-art ResNet architectures on typical image classification benchmarks at lower network complexity.
READ FULL TEXT