Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer

08/24/2022
by   Fengji Zhang, et al.
0

Stack Overflow is one of the most popular programming communities where developers can seek help for their encountered problems. Nevertheless, if inexperienced developers fail to describe their problems clearly, it is hard for them to attract sufficient attention and get the anticipated answers. We propose M_3NSCT5, a novel approach to automatically generate multiple post titles from the given code snippets. Developers may use the generated titles to find closely related posts and complete their problem descriptions. M_3NSCT5 employs the CodeT5 backbone, which is a pre-trained Transformer model having an excellent language understanding and generation ability. To alleviate the ambiguity issue that the same code snippets could be aligned with different titles under varying contexts, we propose the maximal marginal multiple nucleus sampling strategy to generate multiple high-quality and diverse title candidates at a time for the developers to choose from. We build a large-scale dataset with 890,000 question posts covering eight programming languages to validate the effectiveness of M_3NSCT5. The automatic evaluation results on the BLEU and ROUGE metrics demonstrate the superiority of M_3NSCT5 over six state-of-the-art baseline models. Moreover, a human evaluation with trustworthy results also demonstrates the great potential of our approach for real-world application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2022

SOTitle: A Transformer-based Post Title Generation Approach for Stack Overflow

On Stack Overflow, developers can not only browse question posts to solv...
research
10/28/2022

I Know What You Are Searching For: Code Snippet Recommendation from Stack Overflow Posts

Stack Overflow has been heavily used by software developers to seek prog...
research
05/20/2020

Generating Question Titles for Stack Overflow from Mined Code Snippets

Stack Overflow has been heavily used by software developers as a popular...
research
02/01/2021

Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

As a popular Q A site for programming, Stack Overflow is a treasure fo...
research
09/27/2021

Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

Context: Stack Overflow is very helpful for software developers who are ...
research
04/22/2023

An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation

Code comment generation aims at generating natural language descriptions...
research
05/18/2019

Microblog Hashtag Generation via Encoding Conversation Contexts

Automatic hashtag annotation plays an important role in content understa...

Please sign up or login with your details

Forgot password? Click here to reset