Making Sense of Failure Logs in an Industrial DevOps Environment

01/09/2023
by   Muhammad Abbas, et al.
0

Processing and reviewing nightly test execution failure logs for large industrial systems is a tedious activity. Furthermore, multiple failures might share one root/common cause during test execution sessions, and the review might therefore require redundant efforts. This paper presents the LogGrouper approach for automated grouping of failure logs to aid root/common cause analysis and for enabling the processing of each log group as a batch. LogGrouper uses state-of-art natural language processing and clustering approaches to achieve meaningful log grouping. The approach is evaluated in an industrial setting in both a qualitative and quantitative manner. Results show that LogGrouper produces good quality groupings in terms of our two evaluation metrics (Silhouette Coefficient and Calinski-Harabasz Index) for clustering quality. The qualitative evaluation shows that experts perceive the groups as useful, and the groups are seen as an initial pointer for root cause analysis and failure assignment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2021

What Distributed Systems Say: A Study of Seven Spark Application Logs

Execution logs are a crucial medium as they record runtime information o...
research
11/01/2019

Fast Dimensional Analysis for Root Cause Investigation in Large-Scale Service Environment

Root cause analysis in a large-scale production environment is challengi...
research
04/21/2021

Improving Test Distance for Failure Clustering with Hypergraph Modelling

Automated debugging techniques, such as Fault Localisation (FL) or Autom...
research
06/22/2022

An Application of a Modified Beta Factor Method for the Analysis of Software Common Cause Failures

This paper presents an approach for modeling software common cause failu...
research
03/21/2020

Causality-Guided Adaptive Interventional Debugging

Runtime nondeterminism is a fact of life in modern database applications...
research
08/01/2021

Groot: An Event-graph-based Approach for Root Cause Analysis in Industrial Settings

For large-scale distributed systems, it's crucial to efficiently diagnos...
research
08/16/2020

Spectrum-Based Log Diagnosis

We present and evaluate Spectrum-Based Log Diagnosis (SBLD), a method to...

Please sign up or login with your details

Forgot password? Click here to reset