Greener yet Powerful: Taming Large Code Generation Models with Quantization

03/09/2023
by   Xiaokai Wei, et al.
0

ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a 6B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2022

Online Model Compression for Federated Learning with Large Models

This paper addresses the challenges of training large neural network mod...
research
09/04/2023

Softmax Bias Correction for Quantized Generative Models

Post-training quantization (PTQ) is the go-to compression technique for ...
research
02/16/2022

Probing Pretrained Models of Source Code

Deep learning models are widely used for solving challenging code proces...
research
07/19/2018

Statistical Model Compression for Small-Footprint Natural Language Understanding

In this paper we investigate statistical model compression applied to na...
research
07/16/2023

Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study

Despite the superior performance, Large Language Models (LLMs) require s...
research
05/23/2023

QLoRA: Efficient Finetuning of Quantized LLMs

We present QLoRA, an efficient finetuning approach that reduces memory u...
research
02/02/2018

Representation Learning for Resource Usage Prediction

Creating a model of a computer system that can be used for tasks such as...

Please sign up or login with your details

Forgot password? Click here to reset