Snippext: Semi-supervised Opinion Mining with Augmented Data

02/07/2020
by   Zhengjie Miao, et al.
0

Online services are interested in solutions to opinion mining, which is the problem of extracting aspects, opinions, and sentiments from text. One method to mine opinions is to leverage the recent success of pre-trained language models which can be fine-tuned to obtain high-quality extractions from reviews. However, fine-tuning language models still requires a non-trivial amount of training data. In this paper, we study the problem of how to significantly reduce the amount of labeled training data required in fine-tuning language models for opinion mining. We describe Snippext, an opinion mining system developed over a language model that is fine-tuned through semi-supervised learning with augmented data. A novelty of Snippext is its clever use of a two-prong approach to achieve state-of-the-art (SOTA) performance with little labeled training data through: (1) data augmentation to automatically generate more labeled training data from existing ones, and (2) a semi-supervised learning technique to leverage the massive amount of unlabeled data in addition to the (limited amount of) labeled data. We show with extensive experiments that Snippext performs comparably and can even exceed previous SOTA results on several opinion mining tasks with only half the training data required. Furthermore, it achieves new SOTA results when all training data are leveraged. By comparison to a baseline pipeline, we found that Snippext extracts significantly more fine-grained opinions which enable new opportunities of downstream applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2023

Opinion Mining Using Population-tuned Generative Language Models

We present a novel method for mining opinions from text collections usin...
research
04/25/2020

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

This paper presents MixText, a semi-supervised learning method for text ...
research
02/01/2022

A Semi-Supervised Deep Clustering Pipeline for Mining Intentions From Texts

Mining the latent intentions from large volumes of natural language inpu...
research
10/25/2022

Semi-Supervised Learning Based on Reference Model for Low-resource TTS

Most previous neural text-to-speech (TTS) methods are mainly based on su...
research
05/17/2023

CLIP-GCD: Simple Language Guided Generalized Category Discovery

Generalized Category Discovery (GCD) requires a model to both classify k...
research
05/09/2022

Few-shot Mining of Naturally Occurring Inputs and Outputs

Creating labeled natural language training data is expensive and require...
research
05/19/2023

Self-Agreement: A Framework for Fine-tuning Language Models to Find Agreement among Diverse Opinions

Finding an agreement among diverse opinions is a challenging topic in mu...

Please sign up or login with your details

Forgot password? Click here to reset