ACM Transactions on Mathematical Software (TOMS) | 2019

Spin Summations

 
 
 

Abstract


In addition to tensor contractions, one of the most pronounced computational bottlenecks in the nonorthogonally spin-adapted forms of the quantum chemistry methods CCSDT and CCSDTQ, and their approximate forms—including CCSD(T) and CCSDT(Q)—are spin summations. At a first sight, spin summations are operations similar to tensor transpositions, but a closer look reveals additional challenges to high-performance calculations, including temporal locality and scattered memory accesses. This article explores a sequence of algorithmic solutions for spin summations, each exploiting individual properties of either the underlying hardware (e.g., caches, vectorization) or the problem itself (e.g., factorizability). The final algorithm combines the advantages of all the solutions while avoiding their drawbacks; this algorithm achieves high performance through parallelization and vectorization, and by exploiting the temporal locality inherent to spin summations. Combined, these optimizations result in speedups between 2.4× and 5.5× over the NCC quantum chemistry software package. In addition to such a performance boost, our algorithm can perform the spin summations in-place, thus reducing the memory footprint by 2× over an out-of-place variant.

Volume 45
Pages 1 - 22
DOI 10.1145/3301319
Language English
Journal ACM Transactions on Mathematical Software (TOMS)

Full Text