The developed code is faster, more accurate and requires less memory in tridiagonal systems
This research was funded by the Human Brain project and the BSC/UPC GPU Center of Excellence
Barcelona Supercomputing Center (BSC) researchers have developed a code called cuThomasBatch that is 3× (in double precision) and 4× (in single precision) faster than its counterpart cuSparse routine (gtsvStridedbatch) with a relatively large number of tridiagonal systems. This code has been integrated into the cuSPARSE library, as part of the new routine gtsvInterleavedBatch. cuSPARSE allows developers to access the computational resources of NVIDIA graphics processing unit (GPUs) and is implemented on top of the NVIDIA® CUDA® runtime for sparse lineal algebra operations. cuSPARSE, as part of the CUDA SDK, is a very popular and widely known software tool in the high-performance computing (HPC) community.
This code is able to solve a (batch) large number of tridiagonal systems, which is necessary in numerous scientific simulations, such as computational fluid dynamics (CFD), ocean modelling, or the simulation of the Human Brain, to mention just a few. The implementation minimizes the requirements (memory and number of threads) achieving high scalability on NVIDIA GPUs.
“We are glad to see that cuThomasBatch is faster when computing a high number of tridiagonal systems, as well as being more accurate and requiring much less memory, ” says Pedro Valero, a researcher in BSC’s Programming Models group and collaborator with the Accelerators and Communications for High Performance Computing team.
This study is a result of the conference paper entitled “NVIDIA GPUs scalability to solve multiple batch tridiagonal system implementation of cuThomasBatch” whose first author is Pedro Valero, followed by other BSC researchers such as Ivan Martinez-Perez, Raül Sirvent, Xavier Martorell and Antonio J. Peña. It was presented with a poster at the last GPU Technology Conference 2018 and is also a result of the Human Brain Project and BSC/UPC GPU Center of Excellence.
BSC/UPC is a recognized GPU Computing Pioneer
The Barcelona Supercomputing Center (BSC) in association with Universitat Politecnica de Catalunya (UPC) was named an NVIDIA GPU Center of Excellence in 2011 and the deep collaboration continues to make great contributions to the supercomputing and research community.
The centre develops multi-GPU and cluster-aware programming environments for GPUs promoting a unified resource management and is laying the groundwork for the forthcoming Exascale supercomputing era with GPU acceleration using the task-based StarSs programming model and its OmpSs implementation integrated with CUDA and OpenACC.
Among other activities, the BSC/UPC GPU Center of Excellence sponsors the ninth edition of the Programming and Tuning Massively Parallel Systems + Artificial Intelligence summer school (PUMPS+AI). This course is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.