Yasunori Ohishi

research

∙ 08/23/2023

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

We proposed Audio Difference Captioning (ADC) as a new extension task of...

0 Daiki Takeuchi, et al. ∙

research

∙ 05/23/2023

Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation

Self-supervised learning general-purpose audio representations have demo...

0 Daisuke Niizumi, et al. ∙

research

∙ 03/01/2023

First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline

This paper provides a baseline system for First-shot-compliant unsupervi...

0 Noboru Harada, et al. ∙

research

∙ 10/26/2022

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

Masked Autoencoders is a simple yet powerful self-supervised learning me...

0 Daisuke Niizumi, et al. ∙

research

∙ 07/25/2022

ConceptBeam: Concept Driven Target Speech Extraction

We propose a novel framework for target speech extraction based on seman...

0 Yasunori Ohishi, et al. ∙

research

∙ 07/20/2022

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

The amount of audio data available on public websites is growing rapidly...

0 Daiki Takeuchi, et al. ∙

research

∙ 05/17/2022

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

Many application studies rely on audio DNN models pre-trained on a large...

0 Daisuke Niizumi, et al. ∙

research

∙ 04/26/2022

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

Recent general-purpose audio representations show state-of-the-art perfo...

0 Daisuke Niizumi, et al. ∙

research

∙ 04/15/2022

BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

Pre-trained models are essential as feature extractors in modern machine...

0 Daisuke Niizumi, et al. ∙

research

∙ 04/08/2022

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

In many situations, we would like to hear desired sound events (SEs) whi...

0 Marc Delcroix, et al. ∙

research

∙ 02/18/2022

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

We tackle a challenging task: multi-view and multi-modal event detection...

0 Masahiro Yasuda, et al. ∙

research

∙ 02/18/2022

Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments

Our goal is to develop a sound event localization and detection (SELD) s...

0 Masahiro Yasuda, et al. ∙

research

∙ 03/11/2021

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

Inspired by the recent progress in self-supervised learning for computer...

0 Daisuke Niizumi, et al. ∙

research

∙ 12/14/2020

Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

The goal of audio captioning is to translate input audio into its descri...

0 Yuma Koizumi, et al. ∙

research

∙ 09/24/2020

Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

The system we used for Task 6 (Automated Audio Captioning)of the Detecti...

0 Daiki Takeuchi, et al. ∙

research

∙ 07/01/2020

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

This technical report describes the system participating to the Detectio...

0 Yuma Koizumi, et al. ∙

research

∙ 04/09/2019

Crossmodal Voice Conversion

Humans are able to imagine a person's voice from the person's appearance...

0 Hirokazu Kameoka, et al. ∙

Yasunori Ohishi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro