Up: Contents
Next: Other PETSc Features
Previous: Machine-Specific Optimizations
The performance of a code can be affected by a variety of factors,
including the cache behavior, other users on the machine, etc.
Below we briefly describe some common problems and possibilities for
overcoming them.
- Problem too large for physical memory size: When timing a program, one
should always leave at least a ten percent margin between the total
memory a process is using and the physical size of
the machine's memory. One way to estimate the amount of
memory used by given process is with the UNIX ps command.
Also, the PETSc option -log_summary prints the amount of
memory used by the basic PETSc objects, thus providing a lower
bound on the memory used. Another useful option is -trmalloc_log
which reports all memory, including any Fortran arrays in an
application code.
- Effects of other users: If other users are running
jobs on the same physical processor nodes on which a program is being profiled,
the timing results are essentially meaningless.
- Overhead of timing routines on certain machines: On certain machines,
even calling the system clock in order to time routines is
slow; this skews all of the flop rates and timing results. The file
${}PETSC_DIR/src/benchmarks/PetscTime.c contains a
simple test problem that will approximate the ammount of time
required to get the current time in a running program. On good
systems it will on the order of 1.e-6 seconds or less.
- Problem too large for good cache performance: Certain machines
with lower memory bandwidths (slow memory access) attempt to
compensate by having a very large cache. Thus, if a significant
portion of an application fits within the cache, the program will achieve very
good performance; if the code is too large, the performance can degrade markedly.
To analyze whether this situation affects a particular code, one can
try plotting the total flop rate as a function of problem
size. If the flop rate decreases rapidly at some point, then the
problem may likely be too large for the cache size.
- Inconsistent timings: Inconsistent timings are likely due to other
users on the machine, thrashing (using more virtual memory than available
physical memory), or paging in of the initial executable.
Section Accurate Profiling: Overcoming the Overhead of Paging
provides information on overcoming paging
overhead when profiling a code. We have found on all systems that if you
follow all the advise above your timings will be consistent within a variation
of less than five percent.
Up: Contents
Next: Other PETSc Features
Previous: Machine-Specific Optimizations