Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages

05/30/2023
by   Phat Do, et al.
0

We train a MOS prediction model based on wav2vec 2.0 using the open-access data sets BVCC and SOMOS. Our test with neural TTS data in the low-resource language (LRL) West Frisian shows that pre-training on BVCC before fine-tuning on SOMOS leads to the best accuracy for both fine-tuned and zero-shot prediction. Further fine-tuning experiments show that using more than 30 percent of the total data does not lead to significant improvements. In addition, fine-tuning with data from a single listener shows promising system-level accuracy, supporting the viability of one-participant pilot tests. These findings can all assist the resource-conscious development of TTS for LRLs by progressing towards better zero-shot MOS prediction and informing the design of listening tests, especially in early-stage evaluation.

READ FULL TEXT
research
09/21/2023

PEFTT: Parameter-Efficient Fine-Tuning for low-resource Tibetan pre-trained language models

In this era of large language models (LLMs), the traditional training of...
research
04/04/2021

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

Models pre-trained on multiple languages have shown significant promise ...
research
09/19/2023

Using fine-tuning and min lookahead beam search to improve Whisper

The performance of Whisper in low-resource languages is still far from p...
research
05/14/2018

Parser Training with Heterogeneous Treebanks

How to make the most of multiple heterogeneous treebanks when training a...
research
12/04/2021

Emojich – zero-shot emoji generation using Russian language: a technical report

This technical report presents a text-to-image neural network "Emojich" ...
research
10/12/2022

SQuId: Measuring Speech Naturalness in Many Languages

Much of text-to-speech research relies on human evaluation, which incurs...
research
03/06/2023

Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language

Activity and property prediction models are the central workhorses in dr...

Please sign up or login with your details

Forgot password? Click here to reset