A Pitfall of Learning from User-generated Data: In-depth Analysis of Subjective Class Problem

03/24/2020
by   Kei Nemoto, et al.
0

Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large number of machine learning based systems started being trained with user-generated data, using categorical data as true labels. However, little work has been done in the area of supervised learning with user-defined labels where users are not necessarily experts and might be motivated to provide incorrect labels in order to improve their own utility from the system. In this article, we propose two types of classes in user-defined labels: subjective class and objective class - showing that the objective classes are as reliable as if they were provided by domain experts, whereas the subjective classes are subject to bias and manipulation by the user. We define this as a subjective class issue and provide a framework for detecting subjective labels in a dataset without querying oracle. Using this framework, data mining practitioners can detect a subjective class at an early stage of their projects, and avoid wasting their precious time and resources by dealing with subjective class problem with traditional machine learning techniques.

READ FULL TEXT
research
12/14/2021

Towards A Reliable Ground-Truth For Biased Language Detection

Reference texts such as encyclopedias and news articles can manifest bia...
research
02/11/2021

OpinionRank: Extracting Ground Truth Labels from Unreliable Expert Opinions with Graph-Based Spectral Ranking

As larger and more comprehensive datasets become standard in contemporar...
research
12/03/2020

Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels with overclustering

A long-standing issue with deep learning is the need for large and consi...
research
05/02/2022

Reducing the Cost of Training Security Classifier (via Optimized Semi-Supervised Learning)

Background: Most of the existing machine learning models for security ta...
research
11/15/2018

Exploiting Class Learnability in Noisy Data

In many domains, collecting sufficient labeled training data for supervi...
research
05/31/2017

Descriptions of Objectives and Processes of Mechanical Learning

In [1], we introduced mechanical learning and proposed 2 approaches to m...
research
10/27/2021

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, a...

Please sign up or login with your details

Forgot password? Click here to reset