USNID: A Framework for Unsupervised and Semi-supervised New Intent Discovery

04/16/2023
by   Hanlei Zhang, et al.
0

New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services. However, most existing methods struggle to capture the complicated semantics of discrete text representations when limited or no prior knowledge of labeled data is available. To tackle this problem, we propose a novel framework called USNID for unsupervised and semi-supervised new intent discovery, which has three key technologies. First, it takes full use of unsupervised or semi-supervised data to mine shallow semantic similarity relations and provide well-initialized representations for clustering. Second, it designs a centroid-guided clustering mechanism to address the issue of cluster allocation inconsistency and provide high-quality self-supervised targets for representation learning. Third, it captures high-level semantics in unsupervised or semi-supervised data to discover fine-grained intent-wise clusters by optimizing both cluster-level and instance-level objectives. We also propose an effective method for estimating the cluster number in open-world scenarios without knowing the number of new intents beforehand. USNID performs exceptionally well on several intent benchmark datasets, achieving new state-of-the-art results in unsupervised and semi-supervised new intent discovery and demonstrating robust performance with different cluster numbers.

READ FULL TEXT
research
12/16/2020

Discovering New Intents with Deep Aligned Clustering

Discovering new intents is a crucial task in a dialogue system. Most exi...
research
05/25/2022

New Intent Discovery with Pre-training and Contrastive Learning

New intent discovery aims to uncover novel intent categories from user u...
research
07/02/2023

Large Language Models Enable Few-Shot Clustering

Unlike traditional unsupervised clustering, semi-supervised clustering a...
research
04/25/2021

Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing

Intent understanding plays an important role in dialog systems, and is t...
research
02/01/2022

A Semi-Supervised Deep Clustering Pipeline for Mining Intentions From Texts

Mining the latent intentions from large volumes of natural language inpu...
research
05/17/2023

CLIP-GCD: Simple Language Guided Generalized Category Discovery

Generalized Category Discovery (GCD) requires a model to both classify k...
research
10/21/2019

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

We release the largest public ECG dataset of continuous raw signals for ...

Please sign up or login with your details

Forgot password? Click here to reset