East: Efficient and Accurate Secure Transformer Framework for Inference

by   Yuanchao Ding, et al.

Transformer has been successfully used in practical applications, such as ChatGPT, due to its powerful advantages. However, users' input is leaked to the model provider during the service. With people's attention to privacy, privacy-preserving Transformer inference is on the demand of such services. Secure protocols for non-linear functions are crucial in privacy-preserving Transformer inference, which are not well studied. Thus, designing practical secure protocols for non-linear functions is hard but significant to model performance. In this work, we propose a framework East to enable efficient and accurate secure Transformer inference. Firstly, we propose a new oblivious piecewise polynomial evaluation algorithm and apply it to the activation functions, which reduces the runtime and communication of GELU by over 1.5× and 2.5×, compared to prior arts. Secondly, the secure protocols for softmax and layer normalization are carefully designed to faithfully maintain the desired functionality. Thirdly, several optimizations are conducted in detail to enhance the overall efficiency. We applied East to BERT and the results show that the inference accuracy remains consistent with the plaintext inference without fine-tuning. Compared to Iron, we achieve about 1.8× lower communication within 1.2× lower runtime.


page 1

page 2

page 3

page 4


PUMA: Secure Inference of LLaMA-7B in Five Minutes

With ChatGPT as a representative, tons of companies have began to provid...

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

As more and more pre-trained language models adopt on-cloud deployment, ...

Bicoptor: Two-round Secure Three-party Non-linear Computation without Preprocessing for Privacy-preserving Machine Learning

The overhead of non-linear functions dominates the performance of the se...

Compact: Approximating Complex Activation Functions for Secure Computation

Secure multi-party computation (MPC) techniques can be used to provide d...

Towards Secure and Practical Machine Learning via Secret Sharing and Random Permutation

With the increasing demands for privacy protection, privacy-preserving m...

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Non-linear operations such as GELU, Layer normalization, and Softmax are...

ENSEI: Efficient Secure Inference via Frequency-Domain Homomorphic Convolution for Privacy-Preserving Visual Recognition

In this work, we propose ENSEI, a secure inference (SI) framework based ...

Please sign up or login with your details

Forgot password? Click here to reset