12.1.2. Interpreting -log_summary Output: Parallel Performance

Up: Contents Next: Using -log and -log_all with PETScView Previous: Interpreting -log_summary Output: The Basics

We next discuss performance summaries for parallel programs, as shown within Figures 19 and 20 , which present the combined output generated by the -log_summary option. The program that generated this data is ${}PETSC_DIR/src/sles/examples/ex21.c. The code loads a matrix and right-hand-side vector from a binary file and then solves the resulting linear system; the program then repeats this process for a second linear system. This particular case was run on four processors of an IBM SP, using restarted GMRES and the block Jacobi preconditioner, where each block was solved with ILU.

Figure 19 presents an overall performance summary, including times, floating-point operations, computational rates, and message-passing activity (such as the number and size of messages sent and collective operations). Summaries for various user-defined stages of monitoring (as discussed in Section Profiling Multiple Sections of Code ) are also given. Information about the various phases of computation then follow (as shown separately here in Figure 20 ). Finally, a summary of memory usage and object creation and destruction is presented.


Figure 19: Profiling a PETSc Program: Part I - Overall Summary

We next focus on the summaries for the various phases of the computation, as given in the table within Figure 20 . The summary for each phase presents the maximum times and flop rates over all processors, as well as the ratio of maximum to minimum times and flop rates for all processors. A ratio of approximately 1 indicates that computations within a given phase are well balanced among the processors; as the ratio increases, the balance becomes increasingly poor. Also, the total computational rate (in units of MFlops/sec) is given for each phase in the final column of the phase summary table.

Note: Total computational rates < 1 MFlop are listed as 0 in this column of the phase summary table. Additional statistics for each phase include the total number of messages sent, the average message length, and the number of global reductions.


Figure 20: Profiling a PETSc Program: Part II - Phase Summaries

As discussed in the preceding section, the performance summaries for higher-level PETSc routines include the statistics for the lower levels of which they are made up. For example, the communication within matrix-vector products MatMult() consists of vector scatter operations, as given by the routines VecScatterBegin() and VecScatterEnd().

The final data presented are the percentages of the various statistics (time ( %T), flops/sec ( %F), messages( %M), average message length ( %L), and reductions ( %R)) for each event relative to the total computation and to any user-defined stages (discussed in Section Profiling Multiple Sections of Code ). These statistics can aid in optimizing performance, since they indicate the sections of code that could benefit from various kinds of tuning. Chapter Hints for Performance Tuning gives suggestions about achieving good performance with PETSc codes.


Up: Contents Next: Using -log and -log_all with PETScView Previous: Interpreting -log_summary Output: The Basics