Large-scale information retrieval in software engineering – an experience report from industrial application

08/22/2023
by   Michael Unterkalmsteiner, et al.
0

Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the performance of different IR techniques. Each experiment provides lessons learned from implementation, execution, and results, feeding to its successor. The three experiments led to the following observations: 1) there is a lack of research on scalable parameter optimization of IR techniques for software engineering problems; 2) scaling IR techniques to industry data is challenging, in particular for latent semantic analysis; 3) the IR context poses constraints on the empirical evaluation of IR techniques, requiring more research on developing valid statistical approaches. We believe that our experiences in conducting a series of IR experiments with industry grade data are valuable for peer researchers so that they can avoid the pitfalls that we have encountered. Furthermore, we identified challenges that need to be addressed in order to bridge the gap between laboratory IR experiments and real applications of IR in the industry.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2018

Protocol and Tools for Conducting Agile Software Engineering Research in an Industrial-Academic Setting: A Preliminary Study

Conducting empirical research in software engineering industry is a proc...
research
06/13/2019

An IR-based Approach Towards Automated Integration of Geo-spatial Datasets in Map-based Software Systems

Data is arguably the most valuable asset of the modern world. In this er...
research
03/26/2022

Tutorial: Modern Theoretical Tools for Understanding and Designing Next-generation Information Retrieval System

In the relatively short history of machine learning, the subtle balance ...
research
04/18/2023

BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information Retrieval

Efficient information retrieval (IR) from building information models (B...
research
12/03/2020

Nonintrusive reduced order model for parametric solutions of inertia relief problems

The Inertia Relief (IR) technique is widely used by industry and produce...
research
06/09/2022

When Traceability Goes Awry: an Industrial Experience Report

The concept of traceability between artifacts is considered an enabler f...
research
05/30/2023

The Information Retrieval Experiment Platform

We integrate ir_datasets, ir_measures, and PyTerrier with TIRA in the In...

Please sign up or login with your details

Forgot password? Click here to reset