WorkshopsFast Methods 1: Automatic Tuning for Parallel FFTs
reads
Daisuke Takahashi
2013-03-28
09:30:00 - 09:55:00
101 , Mathematics Research Center Building (ori. New Math. Bldg.)
In this talk, an automatic performance tuning for parallel fast Fourier transforms (FFTs) is presented. An blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal depth of recursion may depend on the problem size, a method to determine the optimal block size that minimizes the number of cache misses is proposed. In addition, an automatic tuning of all-to-all communication is also implemented. Performance results of parallel FFTs with automatic performance tuning on clusters of multi-core processors are reported.