Unified Sparse Formats for Tensor Algebra Compilers
This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code for any tensor algebra expression on any combination of the aforementioned formats. To demonstrate our modular code generator design, we have implemented it in the open-source taco tensor algebra compiler. Our evaluation shows that we get better performance by supporting more formats specialized to different tensor structures, and our plugins makes it easy to add new formats. For example, when data is provided in the COO format, computing a single matrix-vector multiplication with COO is up to 3.6× faster than with CSR. Furthermore, DIA is specialized to tensor convolutions and stencil operations and therefore performs up to 22 the importance of support for many formats, we show that the best vector format for matrix-vector multiplication varies with input sparsities, from hash maps to sparse vectors to dense vectors. Finally, we show that the performance of generated code for these formats is competitive with hand-optimized implementations.
READ FULL TEXT