Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation

07/05/2022
by   Bin Li, et al.
0

This paper introduces the schemes of Team LingJing's experiments in NLPCC-2022-Shared-Task-4 Multi-modal Dialogue Understanding and Generation (MDUG). The MDUG task can be divided into two phases: multi-modal context understanding and response generation. To fully leverage the visual information for both scene understanding and dialogue generation, we propose the scene-aware prompt for the MDUG task. Specifically, we utilize the multi-tasking strategy for jointly modelling the scene- and session- multi-modal understanding. The visual captions are adopted to aware the scene information, while the fixed-type templated prompt based on the scene- and session-aware labels are used to further improve the dialogue generation performance. Extensive experimental results show that the proposed method has achieved state-of-the-art (SOTA) performance compared with other competitive methods, where we rank the 1-st in all three subtasks in this MDUG competition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2021

OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts

In order to better simulate the real human conversation process, models ...
research
07/19/2021

Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images

In multi-modal dialogue systems, it is important to allow the use of ima...
research
05/17/2023

IMAD: IMage-Augmented multi-modal Dialogue

Currently, dialogue systems have achieved high performance in processing...
research
11/09/2020

After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation

From object segmentation to word vector representations, Scene Graph Gen...
research
07/28/2023

'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges

Referential ambiguities arise in dialogue when a referring expression do...
research
10/13/2021

Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts

The VALUE (Video-And-Language Understanding Evaluation) benchmark is new...
research
12/11/2022

AliCHI: A Large-scale Multi-modal Dataset and Automated Evaluation Tool for Human-like Dialogue Systems

A well-designed interactive human-like dialogue system is expected to ta...

Please sign up or login with your details

Forgot password? Click here to reset