LSHTC: A Benchmark for Large-Scale Text Classification

03/30/2015
by   Ioannis Partalas, et al.
0

LSHTC is a series of challenges which aims to assess the performance of classification systems in large-scale classification in a a large number of classes (up to hundreds of thousands). This paper describes the dataset that have been released along the LSHTC series. The paper details the construction of the datsets and the design of the tracks as well as the evaluation measures that we implemented and a quick overview of the results. All of these datasets are available online and runs may still be submitted on the online server of the challenges.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2022

Datasheet for the Pile

This datasheet describes the Pile, a 825 GiB dataset of human-authored t...
research
06/21/2016

An empirical study on large scale text classification with skip-gram embeddings

We investigate the integration of word embeddings as classification feat...
research
05/05/2021

How do Voices from Past Speech Synthesis Challenges Compare Today?

Shared challenges provide a venue for comparing systems trained on commo...
research
05/04/2022

Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification

In this paper, we ask the research question of whether all the datasets ...
research
09/20/2023

A Large-scale Dataset for Audio-Language Representation Learning

The AI community has made significant strides in developing powerful fou...
research
02/01/2021

Search-Based Software Re-Modularization: A Case Study at Adyen

Deciding what constitutes a single module, what classes belong to which ...
research
07/09/2020

Pollen13K: A Large Scale Microscope Pollen Grain Image Dataset

Pollen grain classification has a remarkable role in many fields from me...

Please sign up or login with your details

Forgot password? Click here to reset