PBS accounting statistics tools

This describes the very simple accounting statistics tool pbsacct for the Portable Batch System (PBS version 2.2).

The latest version of this software may be downloaded from here.

pbsacct

Usage:

   pbsacct files
where files are daily records (such as 20000705) located in $PBSHOME/server_priv/accounting/ (PBSHOME is usually /var/spool/pbs).

A sample output is:

 
# pbsacct 200006??
 
Portable Batch System accounting statistics
-------------------------------------------
 
A total of 30 accounting files will be processed.
First record is dated 06/01/2000, last record is dated 06/30/2000.
 
                                             Average Average
Username   #jobs CPU-days Wall-days   Efcy.   #nodes  q-days
--------   ----- -------- ---------   -----  ------- -------
   TOTAL     237  1415.31   1578.68   0.897     6.80    3.50
user0001      12   278.67    301.06   0.926    12.00    5.98
user0002      29   226.96    244.98   0.926     4.71    4.04
user0003      52   221.65    271.37   0.817    10.83    3.21
user0004      26   201.35    204.27   0.986     5.66    4.94
user0005      13   130.26    151.13   0.862     8.69    3.44
user0006      38   112.23    114.87   0.977     3.22    2.33
user0007      18   109.85    117.90   0.932     7.53    4.75
user0008      14    75.43     85.88   0.878     8.86    2.58
user0009       8    38.88     41.63   0.934     6.72    2.62
user0010       4    12.05     12.36   0.975     3.97    3.25
user0011       4     5.88     31.12   0.189     6.40    7.30
user0012       3     1.47      1.48   0.991     2.08    2.61
user0013       5     0.37      0.37   0.986     1.00    1.36
user0014      10     0.26      0.27   0.973     1.00    0.87
user0015       1     0.00      0.00   0.797     1.00    2.84      

The usernames have been made anonymous. We prefer to count CPU- and wall-time in days rather than hours or seconds.

It should be noted that PBS records only the CPU-time spent on the Master-node of parallel jobs. The spawning of parallel processes by, e.g., MPI is outside the control of PBS, and no accounting of the Slave nodes is currently performed. The total CPU-time is estimated as the CPU-time on the Master times the number of nodes. The only reliable measure is actually the Wall-time times the number of nodes.

The column "Efcy." is the ratio of CPU-time to wall-time. Some jobs spend a long time in waiting states, likely because of I/O, or because of parallel processes waiting for network communication. This measure may indicate that some users' jobs need to be analyzed for possible improvements.

The column "Average #nodes" is a weighted average of the number of nodes used in parallel by the user's jobs.

The column "Average q-days" is the average number of days that the jobs spent in the queue while being eligible to run. This shows how difficult it is for jobs to get CPU-time on this system.

pbsreportmonth

The script pbsreportmonth is a convenient way to automatically generate a monthly report for the previous month. It may be run on the first day of every month using crontab with a line like this:
0 2 1 * * (cd Report-directory; /usr/local/bin/pbsreportmonth)

The accounting report may be mailed to the administrators by uncommenting some lines at the end of the script.

pbsjobs

The helper script pbsjobs processes the raw accounting files, looking for records with an "E" in the second field, meaning a job that Ended. The script extracts some fields of interest, and prints out 1 line of relevant information for each job. This list of information is then summarized by the pbsacct script. The PBS server records accounting information in the module src/server/accounting.c, wherein the explanation of the various accounting fields may be learned. This is also documented in the PBS External Reference Specification, see the chapter on Batch Server Functions.


Author: Ole Holm Nielsen
Address:Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Lyngby, Denmark.
E-mail: Ole.H.Nielsen@fysik.dtu.dk