Training Strategies for Improved Lip-reading

09/03/2022
by   Pingchuan Ma, et al.
10

Several training strategies and temporal models have been recently proposed for isolated word lip-reading in a series of independent works. However, the potential of combining the best strategies and investigating the impact of each of them has not been explored. In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundary indicators. Our results show that Time Masking (TM) is the most important augmentation followed by mixup and Densely-Connected Temporal Convolutional Networks (DC-TCN) are the best temporal model for lip-reading of isolated words. Using self-distillation and word boundary indicators is also beneficial but to a lesser extent. A combination of all the above methods results in a classification accuracy of 93.4 the current state-of-the-art performance on the LRW dataset. The performance can be further improved to 94.1 error analysis of the various training strategies reveals that the performance improves by increasing the classification accuracy of hard-to-recognise words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2020

Lip-reading with Densely Connected Temporal Convolutional Networks

In this work, we present the Densely Connected Temporal Convolutional Ne...
research
01/23/2020

Lipreading using Temporal Convolutional Networks

Lip-reading has attracted a lot of research attention lately thanks to a...
research
12/20/2019

The State of Knowledge Distillation for Classification

We survey various knowledge distillation (KD) strategies for simple clas...
research
06/15/2018

Deep Lip Reading: a comparison of models and an online application

The goal of this paper is to develop state-of-the-art models for lip rea...
research
11/01/2022

SADT: Combining Sharpness-Aware Minimization with Self-Distillation for Improved Model Generalization

Methods for improving deep neural network training times and model gener...
research
06/09/2022

PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies

PointNet++ is one of the most influential neural architectures for point...
research
11/14/2019

Towards Pose-invariant Lip-Reading

Lip-reading models have been significantly improved recently thanks to p...

Please sign up or login with your details

Forgot password? Click here to reset