In this study, we investigate whether speech symbols, learned through de...
This paper proposes a method for extracting a lightweight subset from a
...
We propose HumanDiffusion, a diffusion model trained from humans'
percep...
We examine the speech modeling potential of generative spoken language
m...
We propose ChatGPT-EDSS, an empathetic dialogue speech synthesis (EDSS)
...
We present CALLS, a Japanese speech corpus that considers phone calls in...
We present JNV (Japanese Nonverbal Vocalizations) corpus, a corpus of
Ja...
We present a large-scale in-the-wild Japanese laughter corpus and a laug...
One way of expressing an environmental sound is using vocal imitations, ...
While neural text-to-speech (TTS) has achieved human-like natural synthe...
We construct a corpus of Japanese a cappella vocal ensembles (jaCappella...
We present a multi-speaker Japanese audiobook text-to-speech (TTS) syste...
This paper proposes a method for selecting training data for text-to-spe...
In this paper, we propose a method for intermediating multiple speakers'...
We propose a training method for spontaneous speech synthesis models tha...
We propose a method for synthesizing environmental sounds from visually
...
We present a comprehensive empirical study for personalized spontaneous
...
Although several methods of environmental sound synthesis have been prop...
We present an emotion recognition system for nonverbal vocalizations (NV...
We propose an end-to-end empathetic dialogue speech synthesis (DSS) mode...
This paper presents a speaking-rate-controllable HiFi-GAN neural vocoder...
We present the UTokyo-SaruLab mean opinion score (MOS) prediction system...
We present STUDIES, a new speech corpus for developing a voice agent tha...
This paper proposes visual-text to speech (vTTS), a method for synthesiz...
We present a self-supervised speech restoration method without paired sp...
In this paper, we propose a method to generate personalized filled pause...
In this paper, we construct a Japanese audiobook speech corpus called "J...
In this paper, we construct a new Japanese speech corpus called
"JTubeSp...
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...
Incremental text-to-speech (TTS) synthesis generates utterances in small...
In this paper, we propose a new framework for environmental sound synthe...
We propose a conditional generative adversarial network (GAN) incorporat...
Text-to-speech (TTS) synthesis, a technique for artificially generating
...
In this paper, we construct a new Japanese speech corpus for speech-base...
Environmental sound synthesis is a technique for generating a natural
en...
This paper presents a free Japanese singing voice corpus that can be use...
In this paper, we propose computationally efficient and high-quality met...
Thanks to developments in machine learning techniques, it has become pos...
We propose the HumanGAN, a generative adversarial network (GAN) incorpor...
Synthesizing and converting environmental sounds have the potential for ...
Thanks to improvements in machine learning techniques, including deep
le...
This paper presents a new voice impersonation attack using voice convers...
This paper proposes novel algorithms for speaker embedding using subject...
This paper proposes a generative moment matching network (GMMN)-based
po...
This paper presents a deep neural network (DNN)-based phase reconstructi...
In this paper, we address a multichannel audio source separation task an...
Thanks to improvements in machine learning techniques including deep
lea...
A method for statistical parametric speech synthesis incorporating gener...
This paper presents sampling-based speech parameter generation using
mom...
Voice conversion (VC) using sequence-to-sequence learning of context
pos...