Nexus_duroc_ring

Files:

Description:

This program shows how to use "globus_duroc_runtime" to setup a simple communication in ring between a set of process, and then use this ring to transfer a simple message using Nexus.
This example is iddentical to the "globus_myjob_ring", but the set of process can be managed by different ressource manager. We therefor can not use myjob to establish the communication, and we must use "duroc".

Some definitions when using duroc:

A set of process working together to achieve a common task is called a job.
A subset of those processes located on the same machine is grouped in a "subjob".(more precisely, I should not say "same machine" but the same "resource manager". See duroc documentation for a precise description)
"Duroc" assign a uniq address to each subjob.
The number of process participating in a subjob is called the "size" of the subjob.
Each process in a job is assigned a unique position in the subjob, or "rank".
A procses of rank 0 in a sub job is called a "master".
Note that process's ranks in a subjob are numbered from 0 to subjob size - 1, in a contiguous way. But not such assumption should be done concerning subjob address. The only assumption one can do concerning subjob address is that it is uniq and uniform in the job. (uniform: a subjob address is the same view by any other subjob)


In this example, we first create a communication ring inside each subjob ("small ring").

We then open the small rings to connect them all together.

The "master" with the smallest address initiates the comunication by sending a message in its own small ring. The message goes around its small ring, then to the next ring it and so forth, until the message comes back to the master which initiated the communication.

How to run this program:

In order to let globus set up a correct communication environment for all the process of the job executed by this program, this program should be started using "duroc".

Amoung the several possibilities existing to do so, the easiest is probably to use globusrun, as in the example below:

globusrun -f spec
Where spec is a file containing the following rsl specification:


+(
   &( directory = /your/directory/Examples/nexus_duroc_ring )
    ( executable = nexus_duroc_ring )
    ( stdout = my_std_out )
    ( stderr = my_std_err )
    ( count = 2 )
    ( arguments = "Test me")
    ( label= my_subjob1 )
    ( resourceManagerContact = "machineone.mcs.anl.gov:8713:/C=US/O=Globus/O=Argonne National Laboratory/OU=MCS/CN=machineone.mcs.anl.gov-fork" )
 )
 (
   &( directory = /your/directory/Examples/nexus_duroc_ring )
    ( executable = nexus_duroc_ring )
    ( stdout = my_std_out )
    ( stderr = my_std_err )
    ( count = 5 )
    ( arguments = "Test me" )
    ( label= my_subjob2 )
    ( resourceManagerContact = "machinetwo.mcs.anl.gov:8713:/C=US/O=Globus/O=Argonne National Laboratory/OU=MCS/CN=machinetwo.mcs.anl.gov-fork" )
 )


It will create a job containning 2 subjobs (labeled my_sunbjob1 and my_subjob2, for example)

(Ref. globusrun for more information)

The message is the first and only argument of the program. If the message contains blanks, it should be double quoted.

This command should output the following text:

making globus_duroc request: +(&("directory" = "/usr/globus/Examples/nexus_duroc_ring" )("executable" = "nexus_duroc_ring" )("stdout" = "j1_my_std_out" )("stderr" = "j1_my_std_err" )("count" = "2" )("arguments" = "Test me" )("label" = "my_subjob1" )("resourceManagerContact" = "machineone.mcs.anl.gov:8713:/C=US/O=Globus/O=Argonne National Laboratory/OU=MCS/CN=pitcairn.mcs.anl.gov-fork" ))(&("directory" = "/usr/globus/Examples/nexus_duroc_ring" )("executable" = "nexus_duroc_ring" )("stdout" = "j2_my_std_out" )("stderr" = "j2_my_std_err" )("count" = "5" )("arguments" = "Test me" )("label" = "my_subjob2" )("resourceManagerContact" = "machinetwo.mcs.anl.gov:8713:/C=US/O=Globus/O=Argonne National Laboratory/OU=MCS/CN=tuva.mcs.anl.gov-fork" ))
duroc request status: 0
duroc job contact: "1"
duroc subrequests' status:
    subrequest 0: 0
    subrequest 1: 0
releasing barrier in automatic mode...
waiting for job termination

To see the result of the execution, check the file j1_my_std_out and j2_my_std_out (and j1_my_std_err and j2_my_std_err) in the execution directory. (in our example: /usr/local/globus/Examples/gram_duroc_ring/)

j1_my_std_out should look like this (the order of the messages might be different: the order is preserved only inside a job):


	Job of rank 4 
	Total number of jobs (subjob size): 5
4: Send my startpoint to process 3 in my subjob
4: Received startpoint 
4: Got a NEXUS remote service call
4: Message lenght : 7
4: Message received: Test me
4: Message size to send 7
4: test complete.

	Job of rank 3 
	Total number of jobs (subjob size): 5
3: Received startpoint 
3: Send my startpoint to process 2 in my subjob
3: Got a NEXUS remote service call
3: Message lenght : 7
3: Message received: Test me
3: Message size to send 7
3: test complete.

	Job of rank 2 
	Total number of jobs (subjob size): 5
2: Send my startpoint to process 1 in my subjob
2: Received startpoint 
2: Got a NEXUS remote service call
2: Message lenght : 7
2: Message received: Test me
2: Message size to send 7
2: test complete.

	Job of rank 1 
	Total number of jobs (subjob size): 5
1: Received startpoint 
1: Send my startpoint to process 0 in my subjob
1: Got a NEXUS remote service call
1: Message lenght : 7
1: Message received: Test me
1: Message size to send 7
1: test complete.

	Job of rank 0 
	Total number of jobs (subjob size): 5

	SubJob of address 2 
	Total number of Subjobs (including me): 2
	And I am NOT the first job master

0: Send my startpoint to process 4 in my subjob
0: Received startpoint 
0:I am a master and there is other subjobs ("small rings"):I want to lilnk them
0:Waiting for inter job message (Startpoint)
0: INTER Received startpoint 
0: INTER Send my startpoint to 1
0: Got a NEXUS remote service call
0: Message lenght : 7
0: Message received: Test me
0: Message size to send 7
0: test complete.

	Job of rank 0 
	Total number of jobs (subjob size): 2

	SubJob of address 1 
	Total number of Subjobs (including me): 2
	And I am the first job master

0: Send my startpoint to process 1 in my subjob
0: Received startpoint 
I am a master and there is other subjobs ("small rings"):I want to lilnk them
0: INTER Send my startpoint to 2
Now waiting for inter job message (Startpoint)
0: INTER Received startpoint 
Message to send around: Test me
0: Message size to send 7
0: Got a NEXUS remote service call
0: Message lenght : 7
0: Message received: Test me
0: Received the correct message !
0: test complete.

	Job of rank 1 
	Total number of jobs (subjob size): 2
1: Received startpoint 
1: Send my startpoint to process 0 in my subjob
1: Got a NEXUS remote service call
1: Message lenght : 7
1: Message received: Test me
1: Message size to send 7
1: test complete.

Note:
- Although this program does not use threads, we use the reentrant version of "libc" functions supplied with globus: globus_libc_*. You should use them for "threaded" programs.

Instructional Goals: