Automatic Generation of Sparse Tensor Kernels with Workspaces
Recent advances in compiler theory describe how to compile sparse tensor algebra. Prior work, however, does not describe how to generate efficient code that takes advantage of temporary workspaces. These are often used to hand-optimize important kernels such as sparse matrix multiplication and the matricized tensor times Khatri-Rao product. Without this capability, compilers and code generators cannot automatically generate efficient kernels for many important tensor algebra expressions. We describe a compiler optimization called operator splitting that breaks up tensor sub-computations by introducing workspaces. Our case studies demonstrate that operator splitting is surprisingly general, and our results show that it increases the performance of important generated tensor kernels to match hand-optimized code.
READ FULL TEXT