LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

08/20/2023
by   Zihan Zhao, et al.
0

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features. To address the SQA challenge on LLMs, we initially curated the free-form and open-ended LibriSQA dataset from Librispeech, comprising Part I with natural conversational formats and Part II encompassing multiple-choice questions followed by answers and analytical segments. Both parts collectively include 107k SQA pairs that cover various topics. Given the evident paucity of existing speech-text LLMs, we propose a lightweight, end-to-end framework to execute the SQA task on the LibriSQA, witnessing significant results. By reforming ASR into the SQA format, we further substantiate our framework's capability in handling ASR tasks. Our empirical findings bolster the LLMs' aptitude for aligning and comprehending multimodal information, paving the way for the development of universal multimodal LLMs. The dataset and demo can be found at https://github.com/ZihanZhaoSJTU/LibriSQA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2017

Speech-Based Visual Question Answering

This paper introduces speech-based visual question answering (VQA), the ...
research
10/21/2020

Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering

Spoken conversational question answering (SCQA) requires machines to mod...
research
08/07/2018

ODSQA: Open-domain Spoken Question Answering Dataset

Reading comprehension by machine has been widely studied, but machine co...
research
03/09/2022

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

Spoken Question Answering (SQA) is to find the answer from a spoken docu...
research
09/09/2023

MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

In the real world, knowledge often exists in a multimodal and heterogene...
research
10/21/2020

Knowledge Distillation for Improved Accuracy in Spoken Question Answering

Spoken question answering (SQA) is a challenging task that requires the ...
research
05/23/2022

On Measuring Social Biases in Prompt-Based Multi-Task Learning

Large language models trained on a mixture of NLP tasks that are convert...

Please sign up or login with your details

Forgot password? Click here to reset