Learning to Answer Semantic Queries over Code

by   Surya Prakash Sahu, et al.

During software development, developers need answers to queries about semantic aspects of code. Even though extractive question-answering using neural approaches has been studied widely in natural languages, the problem of answering semantic queries over code using neural networks has not yet been explored. This is mainly because there is no existing dataset with extractive question and answer pairs over code involving complex concepts and long chains of reasoning. We bridge this gap by building a new, curated dataset called CodeQueries, and proposing a neural question-answering methodology over code. We build upon state-of-the-art pre-trained models of code to predict answer and supporting-fact spans. Given a query and code, only some of the code may be relevant to answer the query. We first experiment under an ideal setting where only the relevant code is given to the model and show that our models do well. We then experiment under three pragmatic considerations: (1) scaling to large-size code, (2) learning from a limited number of examples and (3) robustness to minor syntax errors in code. Our results show that while a neural model can be resilient to minor syntax errors in code, increasing size of code, presence of code that is not relevant to the query, and reduced number of training examples limit the model performance. We are releasing our data and models to facilitate future work on the proposed problem of answering semantic queries over code.


CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

We introduce CS1QA, a dataset for code-based question answering in the p...

CodeQA: A Question Answering Dataset for Source Code Comprehension

We propose CodeQA, a free-form question answering dataset for the purpos...

Generative Long-form Question Answering: Relevance, Faithfulness and Succinctness

In this thesis, we investigated the relevance, faithfulness, and succinc...

Automated Crossword Solving

We present the Berkeley Crossword Solver, a state-of-the-art approach fo...

NS3: Neuro-Symbolic Semantic Code Search

Semantic code search is the task of retrieving a code snippet given a te...

Answering Count Queries with Explanatory Evidence

A challenging case in web search and question answering are count querie...

PUG: A Framework and Practical Implementation for Why & Why-Not Provenance (extended version)

Explaining why an answer is (or is not) returned by a query is important...

Please sign up or login with your details

Forgot password? Click here to reset