Distributed Memory Techniques for Classical Simulation of Quantum Circuits
In this paper we describe, implement, and test the performance of distributed memory simulations of quantum circuits on the MSU Laconia Top500 supercomputer. Using OpenMP and MPI hybrid parallelization, we first use a distributed matrix-vector multiplication with one-dimensional partitioning and discuss the shortcomings of this method due to the exponential memory requirements in simulating quantum computers. We then describe a more efficient method that stores only the 2^n amplitudes of the n qubit state vector |ψ〉 and optimize its single node performance. In our multi-node implementation, we use a single amplitude communication protocol that maximizes the number of qubits able to be simulated and minimizes the ratio of qubits that require communication to those that do not, and we present an algorithm for efficiently determining communication pairs among processors. We simulate up to 30 qubits on a single node and 33 qubits with the state vector partitioned across 64 nodes. Lastly, we discuss the advantages and disadvantages of our communication scheme, propose potential improvements, and describe other optimizations such as storing the state vector non-sequentially in memory to map communication requirements to idle qubits in the circuit.
READ FULL TEXT