Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories

03/24/2021
by   Daan Hommersom, et al.
12

The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML) - specifically, natural language processing (NLP) - to address this problem. Our method consists of three phases. First, an advisory record containing key information about a vulnerability is extracted from an advisory (expressed in natural language). Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project by filtering out commits that are known to be irrelevant for the task at hand. Finally, for each such candidate commit, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. The feature vectors are then exploited for building a final ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to interpret of the predictions. We evaluated our approach using a prototype implementation named Prospector on a manually curated data set that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03 a fix commit on the first position for 65.06 conclusion, our method reduces considerably the effort needed to search OSS repositories for the commits that fix known vulnerabilities.

READ FULL TEXT
research
07/06/2018

A Practical Approach to the Automatic Classification of Security-Relevant Commits

The lack of reliable sources of detailed information on the vulnerabilit...
research
05/07/2021

Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers

The sources of reliable, code-level information about vulnerabilities th...
research
03/17/2018

Cost-aware Vulnerability Prediction: the HARMLESS Approach

Society needs more secure software. But predicting vulnerabilities is di...
research
03/06/2022

Vulnerability Detection in Open Source Software: An Introduction

This paper is an introductory discussion on the cause of open source sof...
research
11/10/2022

Semantic Learning and Emulation Based Cross-platform Binary Vulnerability Seeker

Clone detection is widely exploited for software vulnerability search. T...
research
03/21/2021

Automated Software Vulnerability Assessment with Concept Drift

Software Engineering researchers are increasingly using Natural Language...
research
11/16/2021

CVSS-BERT: Explainable Natural Language Processing to Determine the Severity of a Computer Security Vulnerability from its Description

When a new computer security vulnerability is publicly disclosed, only a...

Please sign up or login with your details

Forgot password? Click here to reset