Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

09/18/2023
by   Ryandhimas E. Zezario, et al.
0

Automated assessment of speech intelligibility in hearing aid (HA) devices is of great importance. Our previous work introduced a non-intrusive multi-branched speech intelligibility prediction model called MBI-Net, which achieved top performance in the Clarity Prediction Challenge 2022. Based on the promising results of the MBI-Net model, we aim to further enhance its performance by leveraging Whisper embeddings to enrich acoustic features. In this study, we propose two improved models, namely MBI-Net+ and MBI-Net++. MBI-Net+ maintains the same model architecture as MBI-Net, but replaces self-supervised learning (SSL) speech embeddings with Whisper embeddings to deploy cross-domain features. On the other hand, MBI-Net++ further employs a more elaborate design, incorporating an auxiliary task to predict frame-level and utterance-level scores of the objective speech intelligibility metric HASPI (Hearing Aid Speech Perception Index) and multi-task learning. Experimental results confirm that both MBI-Net++ and MBI-Net+ achieve better prediction performance than MBI-Net in terms of multiple metrics, and MBI-Net++ is better than MBI-Net+.

READ FULL TEXT
research
04/07/2022

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Recently, deep learning (DL)-based non-intrusive speech assessment model...
research
04/07/2022

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Improving the user's hearing ability to understand speech in noisy envir...
research
11/12/2022

Efficient Speech Quality Assessment using Self-supervised Framewise Embeddings

Automatic speech quality assessment is essential for audio researchers, ...
research
06/22/2018

Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions

This paper introduces an improved generative model for statistical param...
research
08/18/2023

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

This study proposes a multi-task pseudo-label learning (MPL)-based non-i...
research
08/14/2017

Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net

Models applied on real time response task, like click-through rate (CTR)...
research
08/24/2023

MultiPA: a multi-task speech pronunciation assessment system for a closed and open response scenario

The design of automatic speech pronunciation assessment can be categoriz...

Please sign up or login with your details

Forgot password? Click here to reset