Optimized Vectorization Implementation of CRYSTALS-Dilithium

06/03/2023
by   Jieyu Zheng, et al.
0

CRYSTALS-Dilithium is a lattice-based signature scheme to be standardized by NIST as the primary post-quantum signature algorithm. In this work, we make a thorough study of optimizing the implementations of Dilithium by utilizing the Advanced Vector Extension (AVX) instructions, specifically AVX2 and the latest AVX512. We first present an improved parallel small polynomial multiplication with tailored early evaluation (PSPM-TEE) to further speed up the signing procedure, which results in a speedup of 5%-6% compared with the original PSPM Dilithium implementation. We then present a tailored reduction method that is simpler and faster than Montgomery reduction. Our optimized AVX2 implementation exhibits a speedup of 3%-8% compared with the state-of-the-art of Dilithium AVX2 software. Finally, for the first time, we propose a fully and highly vectorized implementation of Dilithium using AVX-512. This is achieved by carefully vectorizing most of Dilithium functions with the AVX512 instructions in order to improve efficiency both for time and for space simultaneously. With all the optimization efforts, our AVX-512 implementation improves the performance by 37.3%/50.7%/39.7% in key generation, 34.1%/37.1%/42.7% in signing, and 38.1%/38.7%/40.7% in verification for the parameter sets of Dilithium2/3/5 respectively. To the best of our knowledge, our AVX512 implementation has the best performance for Dilithium on the Intel x64 CPU platform to date.

READ FULL TEXT
research
11/22/2022

High-Throughput GPU Implementation of Dilithium Post-Quantum Digital Signature

In this work, we present a well-optimized GPU implementation of Dilithiu...
research
09/02/2022

Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs

Homomorphic Encryption (HE) enables users to securely outsource both the...
research
03/30/2021

Intel HEXL: Accelerating Homomorphic Encryption with Intel AVX512-IFMA52

Modern implementations of homomorphic encryption (HE) rely heavily on po...
research
10/13/2022

A Unified Cryptoprocessor for Lattice-based Signature and Key-exchange

We propose design methodologies for building a compact, unified and prog...
research
05/12/2022

Vectorized and performance-portable Quicksort

Recent works showed that implementations of Quicksort using vector CPU i...
research
11/18/2019

SySCD: A System-Aware Parallel Coordinate Descent Algorithm

In this paper we propose a novel parallel stochastic coordinate descent ...
research
01/07/2022

A SIMD algorithm for the detection of epistatic interactions of any order

Epistasis is a phenomenon in which a phenotype outcome is determined by ...

Please sign up or login with your details

Forgot password? Click here to reset