UnibucKernel: Geolocating Swiss German Jodels Using Ensemble Learning

02/18/2021
by   Mihaela Gaman, et al.
0

In this work, we describe our approach addressing the Social Media Variety Geolocation task featured in the 2021 VarDial Evaluation Campaign. We focus on the second subtask, which is based on a data set formed of approximately 30 thousand Swiss German Jodels. The dialect identification task is about accurately predicting the latitude and longitude of test samples. We frame the task as a double regression problem, employing an XGBoost meta-learner with the combined power of a variety of machine learning approaches to predict both latitude and longitude. The models included in our ensemble range from simple regression techniques, such as Support Vector Regression, to deep neural models, such as a hybrid neural network and a neural transformer. To minimize the prediction error, we approach the problem from a few different perspectives and consider various types of features, from low-level character n-grams to high-level BERT embeddings. The XGBoost ensemble resulted from combining the power of the aforementioned methods achieves a median distance of 23.6 km on the test data, which places us on the third place in the ranking, at a difference of 6.05 km and 2.9 km from the submissions on the first and second places, respectively.

READ FULL TEXT
research
10/07/2020

Combining Deep Learning and String Kernels for the Localization of Swiss German Tweets

In this work, we introduce the methods proposed by the UnibucKernel team...
research
07/21/2020

XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders

This paper presents six document classification models using the latest ...
research
03/20/2018

UnibucKernel: A kernel-based learning method for complex word identification

In this paper, we present a kernel-based learning approach for the 2018 ...
research
10/06/2021

Alejandro Mosquera at DETOXIS 2021: Deep Learning Approaches to Toxicity Detection in Spanish Social Media Texts

This paper presents the system submitted to the DETOXIS 2021 challenge f...
research
09/24/2018

An Iterative Refinement Approach for Social Media Headline Prediction

In this study, we propose a novel iterative refinement approach to predi...
research
05/08/2017

High-Level Concepts for Affective Understanding of Images

This paper aims to bridge the affective gap between image content and th...
research
08/19/2022

Pseudo-Labels Are All You Need

Automatically estimating the complexity of texts for readers has a varie...

Please sign up or login with your details

Forgot password? Click here to reset