When developing large codes, one is often in the position of having a correctly (or at least believed to be correctly) running code; making a change to the code then changes the results for some unknown reason. Often even determining the precise point at which the old and new codes diverge is a major pain. In other cases, a code generates different results when run on different numbers of processors, although in exact arithmetic the same answer is expected. (Of course, this assumes that exactly the same solver and parameters are used in the two cases.)
PETSc provides some support for determining exactly where in the code
the computations lead to different results. First, compile both programs
with different names. Next, start running
both programs as a single MPI job. This procedure is dependent on the particular
MPI implementation being used.
For example, when using MPICH on workstations,
procgroup files can be used to specify the processors on which the job is
to be run. Thus, to run two programs, old and new,
each on two processors, one should create the procgroup file with the
following contents:
local 0 workstation1 1 /home/bsmith/old workstation2 1 /home/bsmith/new workstation3 1 /home/bsmith/new(Of course, workstation1, etc. can be the same machine.) Then, one can execute the command
mpirun -p4pg <procgroup_filemame> old -compare <tolerance> [your_program_options]Note that the same runtime options must be used for the two programs. The first time an inner product or norm detects an inconsistency larger than <tolerance>, PETSc will generate an error. The usual runtime options -start_in_debugger and -on_error_attach_debugger may be used. The user can also place the commands
PetscCompareDouble() PetscCompareScalar() PetscCompareInt()in portions of the application code to check for consistency between the two versions.