Toward Communication Optimization in CGYRO Fusion Application

Edward D'Azevedo, Reuben Budiardja

Abstract

CGYRO is a new electromagnetic gyrokinetic solver for the study of turbulence in plasma fusion (tokamak) devices. One of the most expensive kernel in CGYRO is in data rearrangement for evaluating FFT on GPU. The kernel first performs multiple independent matrix transpose operations locally. Then MPI all-to-all communication is required to redistribute the data. A similar rearrangement is needed after the FFT. Different techniques such as, optimized transpose with array padding, asynchronous MPI point-to-point communication, and MPI derived data type, are explored to optimize this kernel on Titan.