An Ensemble of Knowledge Sharing Models for Dynamic Hand Gesture Recognition
The focus of this paper is dynamic gesture recognition in the context of the interaction between humans and machines. We propose a model consisting of two sub-networks, a transformer and an ordered-neuron long-short-term-memory (ON-LSTM) based recurrent neural network (RNN). Each sub-network is trained to perform the task of gesture recognition using only skeleton joints. Since each sub-network extracts different types of features due to the difference in architecture, the knowledge can be shared between the sub-networks. Through knowledge distillation, the features and predictions from each sub-network are fused together into a new fusion classifier. In addition, a cyclical learning rate can be used to generate a series of models that are combined in an ensemble, in order to yield a more generalizable prediction. The proposed ensemble of knowledge-sharing models exhibits an overall accuracy of 86.11 using only skeleton information, as tested using the Dynamic Hand Gesture-14/28 dataset
READ FULL TEXT