Code Compliance Assessment as a Learning Problem

09/10/2022
by   Neela Sawant, et al.
0

Manual code reviews and static code analyzers are the traditional mechanisms to verify if source code complies with coding policies. However, these mechanisms are hard to scale. We formulate code compliance assessment as a machine learning (ML) problem, to take as input a natural language policy and code, and generate a prediction on the code's compliance, non-compliance, or irrelevance. This can help scale compliance classification and search for policies not covered by traditional mechanisms. We explore key research questions on ML model formulation, training data, and evaluation setup. The core idea is to obtain a joint code-text embedding space which preserves compliance relationships via the vector distance of code and policy embeddings. As there is no task-specific data, we re-interpret and filter commonly available software datasets with additional pre-training and pre-finetuning tasks that reduce the semantic gap. We benchmarked our approach on two listings of coding policies (CWE and CBP). This is a zero-shot evaluation as none of the policies occur in the training set. On CWE and CBP respectively, our tool Policy2Code achieves classification accuracies of (59 (0.05, 0.21) compared to CodeBERT with classification accuracies of (37 and MRR of (0.02, 0.02). In a user study, 24 accepted compared to 7

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2017

An End-To-End Machine Learning Pipeline That Ensures Fairness Policies

In consequential real-world applications, machine learning (ML) based sy...
research
06/15/2022

NatGen: Generative pre-training by "Naturalizing" source code

Pre-trained Generative Language models (e.g. PLBART, CodeT5, SPT-Code) f...
research
09/22/2018

The Privacy Policy Landscape After the GDPR

Every new privacy regulation brings along the question of whether it res...
research
08/28/2022

Measuring design compliance using neural language models – an automotive case study

As the modern vehicle becomes more software-defined, it is beginning to ...
research
08/19/2021

Checking Security Compliance between Models and Code

The verification that planned security mechanisms are actually implement...
research
01/24/2022

Text and Code Embeddings by Contrastive Pre-Training

Text embeddings are useful features in many applications such as semanti...
research
09/08/2021

Cross-Policy Compliance Detection via Question Answering

Policy compliance detection is the task of ensuring that a scenario conf...

Please sign up or login with your details

Forgot password? Click here to reset