Leveraging Distributional Semantics for Multi-Label Learning

09/18/2017
by   Rahul Wadbude, et al.
0

We present a novel and scalable label embedding framework for large-scale multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning such embeddings can be reduced to a certain matrix factorization. Our approach is novel in that it highlights interesting connections between label embedding methods used for multi-label learning and paragraph/document embedding methods commonly used for learning representations of text data. The framework can also be easily extended to incorporate auxiliary information such as label-label correlations; this is crucial especially when there are a lot of missing labels in the training data. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed learning methods perform favorably compared to several baselines and state-of-the-art methods for large-scale multi-label learning. To facilitate end-to-end learning, we develop a joint learning algorithm that can learn the embeddings as well as a regression model that predicts these embeddings given input features, via efficient gradient-based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2019

Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification

CNNs, RNNs, GCNs, and CapsNets have shown significant insights in repres...
research
07/28/2021

XFL: eXtreme Function Labeling

Reverse engineers would benefit from identifiers like function names, bu...
research
10/10/2019

Multi-label Categorization of Accounts of Sexism using a Neural Framework

Sexism, an injustice that subjects women and girls to enormous suffering...
research
07/31/2021

T_kML-AP: Adversarial Attacks to Top-k Multi-Label Learning

Top-k multi-label learning, which returns the top-k predicted labels fro...
research
04/20/2023

Light-weight Deep Extreme Multilabel Classification

Extreme multi-label (XML) classification refers to the task of supervise...
research
11/11/2021

HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest for Annotating Antimicrobial Peptides

Identifying the targets of an antimicrobial peptide is a fundamental ste...
research
03/25/2020

Heavy-tailed Representations, Text Polarity Classification Data Augmentation

The dominant approaches to text representation in natural language rely ...

Please sign up or login with your details

Forgot password? Click here to reset