Automatic recognition of child speech for robotic applications in noisy environments

by   Samuel Fernando, et al.

Automatic speech recognition (ASR) allows a natural and intuitive interface for robotic educational applications for children. However there are a number of challenges to overcome to allow such an interface to operate robustly in realistic settings, including the intrinsic difficulties of recognising child speech and high levels of background noise often present in classrooms. As part of the EU EASEL project we have provided several contributions to address these challenges, implementing our own ASR module for use in robotics applications. We used the latest deep neural network algorithms which provide a leap in performance over the traditional GMM approach, and apply data augmentation methods to improve robustness to noise and speaker variation. We provide a close integration between the ASR module and the rest of the dialogue system, allowing the ASR to receive in real-time the language models relevant to the current section of the dialogue, greatly improving the accuracy. We integrated our ASR module into an interactive, multimodal system using a small humanoid robot to help children learn about exercise and energy. The system was installed at a public museum event as part of a research study where 320 children (aged 3 to 14) interacted with the robot, with our ASR achieving 90 accuracy for fluent and near-fluent speech.


page 5

page 6

page 7


Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Recent advancements in Automatic Speech Recognition (ASR) systems, exemp...

Convoifilter: A case study of doing cocktail party speech recognition

This paper presents an end-to-end model designed to improve automatic sp...

Romanian Speech Recognition Experiments from the ROBIN Project

One of the fundamental functionalities for accepting a socially assistiv...

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

Automatic speech recognition (ASR) has been significantly advanced with ...

'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on YouTube

Over the last few years, YouTube Kids has emerged as one of the highly c...

Data augmentation using prosody and false starts to recognize non-native children's speech

This paper describes AaltoASR's speech recognition system for the INTERS...

ChildBot: Multi-Robot Perception and Interaction with Children

In this paper we present an integrated robotic system capable of partici...

Please sign up or login with your details

Forgot password? Click here to reset