Next: , Previous: MPI data distribution, Up: Distributed-memory FFTW with MPI


6.5 Multi-dimensional MPI DFT of Real Data

FFTW's MPI interface also supports multi-dimensional DFTs of real data, similar to the serial r2c and c2r interfaces. (Parallel one-dimensional real-data DFTs are not currently supported; you must use a complex transform and set the imaginary parts of the inputs to zero.)

The key points to understand for r2c and c2r MPI transforms (compared to the MPI complex DFTs or the serial r2c/c2r transforms), are:

For example suppose we are performing an out-of-place r2c transform of L × M × N real data [padded to L × M × 2(N/2+1)], resulting in L × M × N/2+1 complex data. Similar to the example in 2d MPI example, we might do something like:

     #include <fftw3-mpi.h>
     
     int main(int argc, char **argv)
     {
         const ptrdiff_t L = ..., M = ..., N = ...;
         fftw_plan plan;
         double *rin;
         fftw_complex *cout;
         ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
     
         MPI_Init(&argc, &argv);
         fftw_mpi_init();
     
         /* get local data size and allocate */
         alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
                                              &local_n0, &local_0_start);
         rin = fftw_alloc_real(2 * alloc_local);
         cout = fftw_alloc_complex(alloc_local);
     
         /* create plan for in-place r2c DFT */
         plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
                                         FFTW_MEASURE);
     
         /* initialize rin to some function my_func(x,y,z) */
         for (i = 0; i < local_n0; ++i)
            for (j = 0; j < M; ++j)
              for (k = 0; k < N; ++k)
            rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
     
         /* compute transforms as many times as desired */
         fftw_execute(plan);
     
         fftw_destroy_plan(plan);
     
         MPI_Finalize();
     }

Note that we allocated rin using fftw_alloc_real with an argument of 2 * alloc_local: since alloc_local is the number of complex values to allocate, the number of real values is twice as many. The rin array is then local_n0 × M × 2(N/2+1) in row-major order, so its (i,j,k) element is at the index (i*M + j) * (2*(N/2+1)) + k (see Multi-dimensional Array Format).