Parallelization of the FFT on SO(3)
In this paper, a work-optimal parallelization of Kostelec and Rockmore's well-known fast Fourier transform and its inverse on the three-dimensional rotation group SO(3) is designed, implemented, and tested. To this end, the sequential algorithms are reviewed briefly first. In the subsequent design and implementation of the parallel algorithms, we use the well-known Forster (PCAM) method and the OpenMP standard. The parallelization itself is based on symmetries of the underlying basis functions and a geometric approach in which the resulting index range is transformed in such a way that distinct work packages can be distributed efficiently to the computation nodes. The benefit of the parallel algorithms in practice is demonstrated in a speedup- and efficiency-assessing benchmark test on a system with 64 cores. Here, for the first time, we present positive results for the full transforms for the both accuracy- and memory-critical bandwidth 512. Using all 64 available cores, the speedup for the largest considered bandwidths 128, 256, and 512 amounted to 29.57, 36.86, and 34.36 in the forward, and 24.57, 26.69, and 24.25 in the inverse transform, respectively.
READ FULL TEXT