Latent State Models of Training Dynamics

08/18/2023
by   Michael Y. Hu, et al.
0

The impact of randomness on model training is poorly understood. How do differences in data order and initialization actually manifest in the model, such that some training runs outperform others or converge faster? Furthermore, how can we interpret the resulting training dynamics and the phase transitions that characterize different trajectories? To understand the effect of randomness on the dynamics and outcomes of neural network training, we train models multiple times with different random seeds and compute a variety of metrics throughout training, such as the L_2 norm, mean, and variance of the neural network's weights. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. The HMM represents training as a stochastic process of transitions between latent states, providing an intuitive overview of significant changes during training. Using our method, we produce a low-dimensional, discrete representation of training dynamics on grokking tasks, image classification, and masked language modeling. We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.

READ FULL TEXT
research
11/11/2017

Parkinson's Disease Digital Biomarker Discovery with Optimized Transitions and Inferred Markov Emissions

We search for digital biomarkers from Parkinson's Disease by observing a...
research
12/04/2018

Parameter Re-Initialization through Cyclical Batch Size Schedules

Optimal parameter initialization remains a crucial problem for neural ne...
research
11/20/2022

Instability in clinical risk stratification models using deep learning

While it has been well known in the ML community that deep learning mode...
research
05/29/2019

Dimension Reduction Approach for Interpretability of Sequence to Sequence Recurrent Neural Networks

Encoder-decoder recurrent neural network models (Seq2Seq) have achieved ...
research
03/07/2017

Unsupervised learning of phase transitions: from principal component analysis to variational autoencoders

We employ unsupervised machine learning techniques to learn latent param...
research
08/10/2021

Regularized Sequential Latent Variable Models with Adversarial Neural Networks

The recurrent neural networks (RNN) with richly distributed internal sta...
research
05/12/2021

Latent Event-Predictive Encodings through Counterfactual Regularization

A critical challenge for any intelligent system is to infer structure fr...

Please sign up or login with your details

Forgot password? Click here to reset