Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention

12/28/2020
by   Hang Chen, et al.
3

In this paper, we propose a novel deep learning architecture to improving word-level lip-reading. On the one hand, we first introduce the multi-scale processing into the spatial feature extraction for lip-reading. Specially, we proposed hierarchical pyramidal convolution (HPConv) to replace the standard convolution in original module, leading to improvements over the model's ability to discover fine-grained lip movements. On the other hand, we merge information in all time steps of the sequence by utilizing self-attention, to make the model pay more attention to the relevant frames. These two advantages are combined together to further enhance the model's classification power. Experiments on the Lip Reading in the Wild (LRW) dataset show that our proposed model has achieved 86.83 the current state-of-the-art. We also conducted extensive experiments to better understand the behavior of the proposed model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2020

Mutual Information Maximization for Effective Lip Reading

Lip reading has received an increasing research interest in recent years...
research
04/06/2021

Hyperspectral and LiDAR data classification based on linear self-attention

An efficient linear self-attention fusion model is proposed in this pape...
research
12/04/2021

Multi-scale Graph Convolutional Networks with Self-Attention

Graph convolutional networks (GCNs) have achieved remarkable learning ab...
research
11/15/2021

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

Recently, self-attention operators have shown superior performance as a ...
research
04/08/2023

Word-level Persian Lipreading Dataset

Lip-reading has made impressive progress in recent years, driven by adva...
research
08/31/2021

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory

Lip reading, aiming to recognize spoken sentences according to the given...
research
02/13/2022

DEEPCHORUS: A Hybrid Model of Multi-scale Convolution and Self-attention for Chorus Detection

Chorus detection is a challenging problem in musical signal processing a...

Please sign up or login with your details

Forgot password? Click here to reset