Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

12/12/2022
by   Cristina Improta, et al.
0

AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses automatic metrics, which compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This practical experience report analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.

READ FULL TEXT

page 1

page 7

research
02/08/2022

Can We Generate Shellcodes via Natural Language? An Empirical Study

Writing software exploits is an important practice for offensive securit...
research
04/27/2021

Shellcode_IA32: A Dataset for Automatic Shellcode Generation

We take the first step to address the task of automatically generating s...
research
06/08/2023

Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

In this work, we present a method to add perturbations to the code descr...
research
09/01/2021

EVIL: Exploiting Software via Natural Language

Writing exploits for security assessment is a challenging task. The writ...
research
02/15/2023

Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming

AI code generators like OpenAI Codex have the potential to assist novice...
research
06/15/2021

Code to Comment Translation: A Comparative Study on Model Effectiveness Errors

Automated source code summarization is a popular software engineering re...
research
03/06/2023

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

The ability to solve problems is a hallmark of intelligence and has been...

Please sign up or login with your details

Forgot password? Click here to reset