DeepVulSeeker: A Novel Vulnerability Identification Framework via Code Graph Structure and Pre-training Mechanism
Software vulnerabilities can pose severe harms to a computing system. They can lead to system crash, privacy leakage, or even physical damage. Correctly identifying vulnerabilities among enormous software codes in a timely manner is so far the essential prerequisite to patch them. Unfortantely, the current vulnerability identification methods, either the classic ones or the deep-learning-based ones, have several critical drawbacks, making them unable to meet the present-day demands put forward by the software industry. To overcome the drawbacks, in this paper, we propose DeepVulSeeker, a novel fully automated vulnerability identification framework, which leverages both code graph structures and the semantic features with the help of the recently advanced Graph Representation Self-Attention and pre-training mechanisms. Our experiments show that DeepVulSeeker not only reaches an accuracy as high as 0.99 on traditional CWE datasets, but also outperforms all other exisiting methods on two highly-complicated datasets. We also testified DeepVulSeeker based on three case studies, and found that DeepVulSeeker is able to understand the implications of the vulnerbilities. We have fully implemented DeepVulSeeker and open-sourced it for future follow-up research.
READ FULL TEXT