Uncovering and Quantifying Social Biases in Code Generation

by   Yan Liu, et al.

With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias. (This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.)


page 1

page 2

page 3

page 4


Uncovering and Categorizing Social Biases in Text-to-SQL

Content Warning: This work contains examples that potentially implicate ...

A Simple, Yet Effective Approach to Finding Biases in Code Generation

Recently, scores of high-performing code generation systems have surface...

Evaluation of Social Biases in Recent Large Pre-Trained Models

Large pre-trained language models are widely used in the community. Thes...

HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

Fairness has become a trending topic in natural language processing (NLP...

SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models

A common limitation of diagnostic tests for detecting social biases in N...

Latent Racial Bias – Evaluating Racism in Police Stop-and-Searches

In this paper, we introduce the latent racial bias, a metric and method ...

Quantifying Voter Biases in Online Platforms: An Instrumental Variable Approach

In content-based online platforms, use of aggregate user feedback (say, ...

Please sign up or login with your details

Forgot password? Click here to reset