Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

by   Shota Orihashi, et al.

This paper presents a novel knowledge distillation method for dialogue sequence labeling. Dialogue sequence labeling is a supervised learning task that estimates labels for each utterance in the target dialogue document, and is useful for many applications such as dialogue act estimation. Accurate labeling is often realized by a hierarchically-structured large model consisting of utterance-level and dialogue-level networks that capture the contexts within an utterance and between utterances, respectively. However, due to its large model size, such a model cannot be deployed on resource-constrained devices. To overcome this difficulty, we focus on knowledge distillation which trains a small model by distilling the knowledge of a large and high performance teacher model. Our key idea is to distill the knowledge while keeping the complex contexts captured by the teacher model. To this end, the proposed method, hierarchical knowledge distillation, trains the small model by distilling not only the probability distribution of the label classification, but also the knowledge of utterance-level and dialogue-level contexts trained in the teacher model by training the model to mimic the teacher model's output in each level. Experiments on dialogue act estimation and call scene segmentation demonstrate the effectiveness of the proposed method.


page 1

page 2

page 3

page 4


Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation

We present a novel large-context end-to-end automatic speech recognition...

Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

NSFW (Not Safe for Work) content, in the context of a dialogue, can have...

Heterogeneous-Branch Collaborative Learning for Dialogue Generation

With the development of deep learning, advanced dialogue generation meth...

DialogAct2Vec: Towards End-to-End Dialogue Agent by Multi-Task Representation Learning

In end-to-end dialogue modeling and agent learning, it is important to (...

Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition

We investigate the feasibility of sequence-level knowledge distillation ...

hierarchical network with decoupled knowledge distillation for speech emotion recognition

The goal of Speech Emotion Recognition (SER) is to enable computers to r...

Collective Relevance Labeling for Passage Retrieval

Deep learning for Information Retrieval (IR) requires a large amount of ...

Please sign up or login with your details

Forgot password? Click here to reset