Security Implications of Large Language Model Code Assistants: A User Study

08/20/2022
by   Gustavo Sandoval, et al.
0

Advances in Deep Learning have led to the emergence of Large Language Models (LLMs) such as OpenAI Codex which powers GitHub Copilot. LLMs have been fine tuned and packaged so that programmers can use them in an Integrated Development Environment (IDE) to write code. An emerging line of work is assessing the code quality of code written with the help of these LLMs, with security studies warning that LLMs do not fundamentally have any understanding of the code they are writing, so they are more likely to make mistakes that may be exploitable. We thus conducted a user study (N=58) to assess the security of code written by student programmers when guided by LLMs. Half of the students in our study had the help of the LLM and the other half did not. The students were asked to write code in C that performed operations over a singly linked list, including node operations such as inserting, updating, removing, combining, and others. While the results of our study showed that the students who had the help of an LLM were more likely to write functional code, no generalizable impact on security was observed – the security impacts were localized to individual functions. We also investigate systematic stylistic differences between unaided and LLM-assisted code, finding that LLM code is more repetitive, which may have an amplifying effect if vulnerable code is repeated in addition to the impact on source code attribution.

READ FULL TEXT

page 1

page 7

page 8

page 9

page 11

research
08/12/2023

Copilot Security: A User Study

Code generation tools driven by artificial intelligence have recently be...
research
07/17/2023

In-IDE Generation-based Information Support with a Large Language Model

Understanding code is challenging, especially when working in new and co...
research
06/07/2023

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Code LLMs are being rapidly deployed and there is evidence that they can...
research
07/07/2021

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly availabl...
research
07/10/2023

Calculating Originality of LLM Assisted Source Code

The ease of using a Large Language Model (LLM) to answer a wide variety ...
research
11/04/2022

Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book

Advances in natural language processing have resulted in large language ...
research
06/13/2022

VSC-WebGPU: A Selenium-based VS Code Extension For Local Edit And Cloud Compilation on WebGPU

With the rapid development of information transmission, Software as a Se...

Please sign up or login with your details

Forgot password? Click here to reset