MQDD: Pre-training of Multimodal Question Duplicity Detection for Software Engineering Domain

03/26/2022
by   Jan Pašek, et al.
0

This work proposes a new pipeline for leveraging data collected on the Stack Overflow website for pre-training a multimodal model for searching duplicates on question answering websites. Our multimodal model is trained on question descriptions and source codes in multiple programming languages. We design two new learning objectives to improve duplicate detection capabilities. The result of this work is a mature, fine-tuned Multimodal Question Duplicity Detection (MQDD) model, ready to be integrated into a Stack Overflow search system, where it can help users find answers for already answered questions. Alongside the MQDD model, we release two datasets related to the software engineering domain. The first Stack Overflow Dataset (SOD) represents a massive corpus of paired questions and answers. The second Stack Overflow Duplicity Dataset (SODD) contains data for training duplicate detection models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2021

Attention-based model for predicting question relatedness on Stack Overflow

Stack Overflow is one of the most popular Programming Community-based Qu...
research
11/02/2019

How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering

Using deep learning models on small scale datasets would result in overf...
research
06/20/2023

Software Engineers' Questions and Answers on Stack Exchange

There exists a large number of research works analyzing questions and an...
research
07/12/2017

Quasar: Datasets for Question Answering by Search and Reading

We present two new large-scale datasets aimed at evaluating systems desi...
research
08/04/2023

Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions

Over the last decade, Q A platforms have played a crucial role in how ...
research
04/17/2020

An Annotated Dataset of Stack Overflow Post Edits

To improve software engineering, software repositories have been mined f...
research
06/11/2019

Contextual Documentation Referencing on Stack Overflow

Software engineering is knowledge-intensive and requires software develo...

Please sign up or login with your details

Forgot password? Click here to reset