Shinnosuke Takamichi

research

∙ 09/18/2023

Do learned speech symbols follow Zipf's law?

In this study, we investigate whether speech symbols, learned through de...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 09/15/2023

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

This paper proposes a method for extracting a lightweight subset from a ...

0 Kentaro Seki, et al. ∙

research

∙ 06/21/2023

HumanDiffusion: diffusion model using perceptual gradients

We propose HumanDiffusion, a diffusion model trained from humans' percep...

0 Yota Ueda, et al. ∙

research

∙ 06/01/2023

How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics

We examine the speech modeling potential of generative spoken language m...

0 Joonyong Park, et al. ∙

research

∙ 05/23/2023

ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings

We propose ChatGPT-EDSS, an empathetic dialogue speech synthesis (EDSS) ...

0 Yuki Saito, et al. ∙

research

∙ 05/23/2023

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center

We present CALLS, a Japanese speech corpus that considers phone calls in...

0 Yuki Saito, et al. ∙

research

∙ 05/21/2023

JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions

We present JNV (Japanese Nonverbal Vocalizations) corpus, a corpus of Ja...

0 Detai Xin, et al. ∙

research

∙ 05/21/2023

Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus

We present a large-scale in-the-wild Japanese laughter corpus and a laug...

0 Detai Xin, et al. ∙

research

∙ 04/29/2023

Environmental sound conversion from vocal imitations and sound event labels

One way of expressing an environmental sound is using vocal imitations, ...

0 Yuki Okamoto, et al. ∙

research

∙ 01/30/2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

While neural text-to-speech (TTS) has achieved human-like natural synthe...

0 Takaaki Saeki, et al. ∙

research

∙ 11/29/2022

jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus

We construct a corpus of Japanese a cappella vocal ensembles (jaCappella...

0 Tomohiko Nakamura, et al. ∙

research

∙ 11/04/2022

Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts

We present a multi-speaker Japanese audiobook text-to-speech (TTS) syste...

0 Detai Xin, et al. ∙

research

∙ 10/26/2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

This paper proposes a method for selecting training data for text-to-spe...

0 Kentaro Seki, et al. ∙

research

∙ 10/18/2022

Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models

In this paper, we propose a method for intermediating multiple speakers'...

0 Aya Watanabe, et al. ∙

research

∙ 10/18/2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

We propose a training method for spontaneous speech synthesis models tha...

0 Yuta Matsunaga, et al. ∙

research

∙ 10/17/2022

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

We propose a method for synthesizing environmental sounds from visually ...

0 Hien Ohnaka, et al. ∙

research

∙ 10/14/2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

We present a comprehensive empirical study for personalized spontaneous ...

0 Yuta Matsunaga, et al. ∙

research

∙ 08/16/2022

How Should We Evaluate Synthesized Environmental Sounds

Although several methods of environmental sound synthesis have been prop...

0 Yuki Okamoto, et al. ∙

research

∙ 06/21/2022

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations

We present an emotion recognition system for nonverbal vocalizations (NV...

0 Detai Xin, et al. ∙

research

∙ 06/16/2022

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History

We propose an end-to-end empathetic dialogue speech synthesis (DSS) mode...

0 Yuto Nishimura, et al. ∙

research

∙ 04/22/2022

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation

This paper presents a speaking-rate-controllable HiFi-GAN neural vocoder...

0 Detai Xin, et al. ∙

research

∙ 04/05/2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

We present the UTokyo-SaruLab mean opinion score (MOS) prediction system...

0 Takaaki Saeki, et al. ∙

research

∙ 03/28/2022

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent

We present STUDIES, a new speech corpus for developing a voice agent tha...

0 Yuki Saito, et al. ∙

research

∙ 03/28/2022

vTTS: visual-text to speech

This paper proposes visual-text to speech (vTTS), a method for synthesiz...

0 Yoshifumi Nakano, et al. ∙

research

∙ 03/24/2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

We present a self-supervised speech restoration method without paired sp...

0 Takaaki Saeki, et al. ∙

research

∙ 03/18/2022

Personalized filled-pause generation with group-wise prediction models

In this paper, we propose a method to generate personalized filled pause...

0 Yuta Matsunaga, et al. ∙

research

∙ 01/26/2022

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis

In this paper, we construct a Japanese audiobook speech corpus called "J...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 12/17/2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

In this paper, we construct a new Japanese speech corpus called "JTubeSp...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 10/15/2021

ESPnet2-TTS: Extending the Edge of TTS Research

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...

0 Tomoki Hayashi, et al. ∙

research

∙ 09/22/2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Incremental text-to-speech (TTS) synthesis generates utterances in small...

0 Takaaki Saeki, et al. ∙

research

∙ 02/11/2021

Onoma-to-wave: Environmental sound synthesis from onomatopoeic words

In this paper, we propose a new framework for environmental sound synthe...

0 Yuki Okamoto, et al. ∙

research

∙ 02/08/2021

HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

We propose a conditional generative adversarial network (GAN) incorporat...

0 Yota Ueda, et al. ∙

research

∙ 12/23/2020

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model

Text-to-speech (TTS) synthesis, a technique for artificially generating ...

0 Takaaki Saeki, et al. ∙

research

∙ 10/05/2020

JSSS: free Japanese speech corpus for summarization and simplification

In this paper, we construct a new Japanese speech corpus for speech-base...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 07/09/2020

RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis

Environmental sound synthesis is a technique for generating a natural en...

0 Yuki Okamoto, et al. ∙

research

∙ 06/04/2020

PJS: phoneme-balanced Japanese singing voice corpus

This paper presents a free Japanese singing voice corpus that can be use...

0 Junya Koguchi, et al. ∙

research

∙ 02/17/2020

Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials

In this paper, we propose computationally efficient and high-quality met...

0 Takaaki Saeki, et al. ∙

research

∙ 01/20/2020

JVS-MuSiC: Japanese multispeaker singing-voice corpus

Thanks to developments in machine learning techniques, it has become pos...

0 Hiroki Tamaru, et al. ∙

research

∙ 09/25/2019

HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling

We propose the HumanGAN, a generative adversarial network (GAN) incorpor...

0 Kazuki Fujii, et al. ∙

research

∙ 08/27/2019

Overview of Tasks and Investigation of Subjective Evaluation Methods in Environmental Sound Synthesis and Conversion

Synthesizing and converting environmental sounds have the potential for ...

0 Yuki Okamoto, et al. ∙

research

∙ 08/17/2019

JVS corpus: free Japanese multi-speaker voice corpus

Thanks to improvements in machine learning techniques, including deep le...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 08/05/2019

V2S attack: building DNN-based voice conversion from automatic speaker verification

This paper presents a new voice impersonation attack using voice convers...

0 Taiki Nakamura, et al. ∙

research

∙ 07/19/2019

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

This paper proposes novel algorithms for speaker embedding using subject...

4 Yuki Saito, et al. ∙

research

∙ 02/09/2019

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

This paper proposes a generative moment matching network (GMMN)-based po...

0 Hiroki Tamaru, et al. ∙

research

∙ 07/10/2018

Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network

This paper presents a deep neural network (DNN)-based phase reconstructi...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 06/27/2018

Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation

In this paper, we address a multichannel audio source separation task an...

0 Shinichi Mogami, et al. ∙

research

∙ 10/28/2017

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis

Thanks to improvements in machine learning techniques including deep lea...

0 Ryosuke Sonobe, et al. ∙

research

∙ 09/23/2017

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

A method for statistical parametric speech synthesis incorporating gener...

0 Yuki Saito, et al. ∙

research

∙ 04/12/2017

Sampling-based speech parameter generation using moment-matching networks

This paper presents sampling-based speech parameter generation using mom...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 04/10/2017

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

Voice conversion (VC) using sequence-to-sequence learning of context pos...

0 Hiroyuki Miyoshi, et al. ∙

Shinnosuke Takamichi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro