12.1.1. Interpreting -log_summary Output: The Basics

Up: Contents Next: Interpreting -log_summary Output: Parallel Performance Previous: Basic Profiling Information

As shown in Figure 7 (in Part I), the option -log_summary activates printing of profile data to standard output at the conclusion of a program. Profiling data can also be printed at any time within a program by calling PLogPrintSummary().

We print performance data for each routine, organized by PETSc components, followed by any user-defined events (discussed in Section Profiling Application Codes ). For each routine, the output data include the maximum time and floating point operation (flop) rate over all processors. Information about parallel performance is also included, as discussed in the following section.

For the purpose of PETSc floating point operation counting, we define one flop as one operation of any of the following types: multiplication, division, addition, or subtraction. For example, one VecAXPY() operation, which computes for vectors of length N, requires 2N flops (consisting of N additions and N multiplications). Bear in mind that flop rates present only a limited view of performance, since memory loads and stores are the real performance barrier.

For simplicity, the remainder of this discussion focuses on interpreting profile data for the SLES component, which provides the linear solvers at the heart of the PETSc package. Recall the hierarchical organization of the PETSc library, as shown in Figure 1 . Each SLES solver is composed of a PC (preconditioner) and KSP (Krylov subspace) component, which are in turn built on top of the Mat (matrix) and Vec (vector) modules. Thus, operations in the SLES module are composed of lower-level operations in these components. Note also that the nonlinear solvers component, SNES, is build on top of the SLES module, and the timestepping component, TS, is in turn built on top of SNES.

We briefly discuss interpretation of the sample output in Figure 7 , which was generated by solving a linear system on one processor using restarted GMRES and ILU preconditioning. The linear solvers in SLES consist of two basic phases, SLESSetUp() and SLESSolve(), each of which consists of a variety of actions, depending on the particular solution technique. For the case of using the PCILU preconditioner and KSPGMRES Krylov subspace method, the breakdown of PETSc routines is listed below. As indicated by the levels of indentation, the operations in SLESSetUp() include all of the operations within PCSetUp(), which in turn include MatILUFactor(), and so on.

The summaries printed via -log_summary reflect this routine hierarchy. For example, the performance summaries for a particular high-level routine such as SLESSolve include all of the operations accumulated in the lower-level components that make up the routine.

Admittedly, we do not currently present the output with -log_summary so that the hierarchy of PETSc operations is completely clear, primarily because we have not determined a clean and uniform way to do so throughout the library. Improvements may follow. However, for a particular problem, the user should generally have an idea of the basic operations that are required for its implementation (e.g., which operations are performed when using GMRES and ILU, as described above), so that interpreting the -log_summary data should be relatively straightforward.


Up: Contents Next: Interpreting -log_summary Output: Parallel Performance Previous: Basic Profiling Information