Toward Communication Optimization in CGYRO Fusion Application
Edward D'Azevedo, Reuben Budiardja
Abstract
CGYRO is a new electromagnetic gyrokinetic solver for the study of turbulence in plasma fusion (tokamak) devices. One of the most expensive kernel in CGYRO is in data rearrangement for evaluating FFT on GPU. The kernel first performs multiple independent matrix transpose operations locally. Then MPI all-to-all communication is required to redistribute the data. A similar rearrangement is needed after the FFT. Different techniques such as, optimized transpose with array padding, asynchronous MPI point-to-point communication, and MPI derived data type, are explored to optimize this kernel on Titan.