Puyuan Peng

research

∙ 09/19/2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Audio-visual representation learning aims to develop systems with human-...

0 Yuan Tseng, et al. ∙

research

∙ 06/27/2023

Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos

To realize human-robot collaboration, robots need to execute actions for...

0 Chiori Hori, et al. ∙

research

∙ 05/19/2023

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode

In this paper, we show that representations capturing syllabic units eme...

0 Puyuan Peng, et al. ∙

research

∙ 05/18/2023

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

We investigate the emergent abilities of the recently proposed web-scale...

0 Puyuan Peng, et al. ∙

research

∙ 11/03/2022

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

For the majority of the machine learning community, the expensive nature...

0 Anuj Diwan, et al. ∙

research

∙ 03/30/2022

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer

In this paper, we propose a simple yet powerful improvement over the rec...

22 Alan Baade, et al. ∙

research

∙ 03/28/2022

Word Discovery in Visually Grounded, Self-Supervised Speech Models

We present a method for visually-grounded spoken term discovery. After t...

5 Puyuan Peng, et al. ∙

research

∙ 02/07/2022

Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling

In this paper, we describe our submissions to the ZeroSpeech 2021 Challe...

0 Puyuan Peng, et al. ∙

research

∙ 09/16/2021

Fast-Slow Transformer for Visually Grounding Speech

We present Fast-Slow Transformer for Visually Grounding Speech, or FaST-...

0 Puyuan Peng, et al. ∙

research

∙ 12/03/2020

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

We propose a new unsupervised model for mapping a variable-duration spee...

0 Puyuan Peng, et al. ∙

Puyuan Peng

Featured Co-authors

Sign in with Google

Consider DeepAI Pro