Unified Mandarin TTS Front-end Based on Distilled BERT Model

12/31/2020
by   Yang Zhang, et al.
0

The front-end module in a typical Mandarin text-to-speech system (TTS) is composed of a long pipeline of text processing components, which requires extensive efforts to build and is prone to large accumulative model size and cascade errors. In this paper, a pre-trained language model (PLM) based model is proposed to simultaneously tackle the two most important tasks in TTS front-end, i.e., prosodic structure prediction (PSP) and grapheme-to-phoneme (G2P) conversion. We use a pre-trained Chinese BERT[1] as the text encoder and employ multi-task learning technique to adapt it to the two TTS front-end tasks. Then, the BERT encoder is distilled into a smaller model by employing a knowledge distillation technique called TinyBERT[2], making the whole model size 25 performance on both tasks. With the proposed the methods, we are able to run the whole TTS front-end module in a light and unified manner, which is more friendly to deployment on mobile devices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2020

Unified Multi-Criteria Chinese Word Segmentation with BERT

Multi-Criteria Chinese Word Segmentation (MCCWS) aims at finding word bo...
research
11/11/2019

A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis

In Mandarin text-to-speech (TTS) system, the front-end text processing m...
research
02/24/2021

From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection

Natural language processing is a fast-growing field of artificial intell...
research
03/20/2022

g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin

Polyphone disambiguation is the most crucial task in Mandarin grapheme-t...
research
09/04/2023

Geo-Encoder: A Chunk-Argument Bi-Encoder Framework for Chinese Geographic Re-Ranking

Chinese geographic re-ranking task aims to find the most relevant addres...
research
03/28/2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

This paper introduces PnG BERT, a new encoder model for neural TTS. This...
research
03/15/2022

Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs

Pre-trained Transformers are good foundations for unified multi-task mod...

Please sign up or login with your details

Forgot password? Click here to reset