VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model

by   Tianyu Chen, et al.

To avoid potential risks posed by vulnerabilities in third-party libraries, security researchers maintain vulnerability databases (e.g., NVD) containing vulnerability reports, each of which records the description of a vulnerability and the name list of libraries affected by the vulnerability (a.k.a. vulnerable libraries). However, recent studies on about 200,000 vulnerability reports in NVD show that 53.3 libraries, and 59.82 incomplete or incorrect. To address the preceding issue, in this paper, we propose the first generative approach named VulLibGen to generate the name list of vulnerable libraries (out of all the existing libraries) for the given vulnerability by utilizing recent enormous advances in Large Language Models (LLMs), in order to achieve high accuracy. VulLibGen takes only the description of a vulnerability as input and achieves high identification accuracy based on LLMs' prior knowledge of all the existing libraries. VulLibGen also includes the input augmentation technique to help identify zero-shot vulnerable libraries (those not occurring during training) and the post-processing technique to help address VulLibGen's hallucinations. We evaluate VulLibGen using three state-of-the-art/practice approaches (LightXML, Chronos, and VulLibMiner) that identify vulnerable libraries on an open-source dataset (VulLib). Our evaluation results show that VulLibGen can accurately identify vulnerable libraries with an average F1 score of 0.626 while the state-of-the-art/practice approaches achieve only 0.561. The post-processing technique helps VulLibGen achieve an average improvement of F1@1 by 9.3 technique helps VulLibGen achieve an average improvement of F1@1 by 39 identifying zero-shot libraries.


page 1

page 2

page 3

page 4


Identifying Vulnerable Third-Party Libraries from Textual Descriptions of Vulnerabilities and Libraries

To address security vulnerabilities arising from third-party libraries, ...

CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports

Tools that alert developers about library vulnerabilities depend on accu...

Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?

Software vulnerabilities bear enterprises significant costs. Despite ext...

Dataset: Dependency Networks of Open Source Libraries Available Through CocoaPods, Carthage and Swift PM

Third party libraries are used to integrate existing solutions for commo...

VFFINDER: A Graph-based Approach for Automated Silent Vulnerability-Fix Identification

The increasing reliance of software projects on third-party libraries ha...

Automated Characterization of Software Vulnerabilities

Preventing vulnerability exploits is a critical software maintenance tas...

Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization

ExploitDB is one of the important public websites, which contributes a l...

Please sign up or login with your details

Forgot password? Click here to reset