In this example, we first create a communication ring inside each subjob ("small ring").
We then open the small rings to connect them all together.
The "master" with the smallest address initiates the comunication by sending
a message in its own small ring. The message goes around its small ring, then to the next ring it and so forth, until the message comes back to the master which initiated the communication.
Amoung the several possibilities existing to do so, the easiest is probably to use globusrun, as in the example below:
globusrun -f spec
Where spec is a file containing the following rsl specification:
+( &( directory = /your/directory/Examples/nexus_duroc_ring ) ( executable = nexus_duroc_ring ) ( stdout = my_std_out ) ( stderr = my_std_err ) ( count = 2 ) ( arguments = "Test me") ( label= my_subjob1 ) ( resourceManagerContact = "machineone.mcs.anl.gov:8713:/C=US/O=Globus/O=Argonne National Laboratory/OU=MCS/CN=machineone.mcs.anl.gov-fork" ) ) ( &( directory = /your/directory/Examples/nexus_duroc_ring ) ( executable = nexus_duroc_ring ) ( stdout = my_std_out ) ( stderr = my_std_err ) ( count = 5 ) ( arguments = "Test me" ) ( label= my_subjob2 ) ( resourceManagerContact = "machinetwo.mcs.anl.gov:8713:/C=US/O=Globus/O=Argonne National Laboratory/OU=MCS/CN=machinetwo.mcs.anl.gov-fork" ) )
It will create a job containning 2 subjobs (labeled my_sunbjob1 and my_subjob2, for example)
The message is the first and only argument of the program. If the message contains blanks, it should be double quoted.
This command should output the following text:
making globus_duroc request: +(&("directory" = "/usr/globus/Examples/nexus_duroc_ring" )("executable" = "nexus_duroc_ring" )("stdout" = "j1_my_std_out" )("stderr" = "j1_my_std_err" )("count" = "2" )("arguments" = "Test me" )("label" = "my_subjob1" )("resourceManagerContact" = "machineone.mcs.anl.gov:8713:/C=US/O=Globus/O=Argonne National Laboratory/OU=MCS/CN=pitcairn.mcs.anl.gov-fork" ))(&("directory" = "/usr/globus/Examples/nexus_duroc_ring" )("executable" = "nexus_duroc_ring" )("stdout" = "j2_my_std_out" )("stderr" = "j2_my_std_err" )("count" = "5" )("arguments" = "Test me" )("label" = "my_subjob2" )("resourceManagerContact" = "machinetwo.mcs.anl.gov:8713:/C=US/O=Globus/O=Argonne National Laboratory/OU=MCS/CN=tuva.mcs.anl.gov-fork" )) duroc request status: 0 duroc job contact: "1" duroc subrequests' status: subrequest 0: 0 subrequest 1: 0 releasing barrier in automatic mode... waiting for job termination
To see the result of the execution, check the file j1_my_std_out and j2_my_std_out (and j1_my_std_err and j2_my_std_err) in the execution directory. (in our example: /usr/local/globus/Examples/gram_duroc_ring/)
j1_my_std_out should look like this (the order of the messages might be different: the order is preserved only inside a job):
Job of rank 4 Total number of jobs (subjob size): 5 4: Send my startpoint to process 3 in my subjob 4: Received startpoint 4: Got a NEXUS remote service call 4: Message lenght : 7 4: Message received: Test me 4: Message size to send 7 4: test complete. Job of rank 3 Total number of jobs (subjob size): 5 3: Received startpoint 3: Send my startpoint to process 2 in my subjob 3: Got a NEXUS remote service call 3: Message lenght : 7 3: Message received: Test me 3: Message size to send 7 3: test complete. Job of rank 2 Total number of jobs (subjob size): 5 2: Send my startpoint to process 1 in my subjob 2: Received startpoint 2: Got a NEXUS remote service call 2: Message lenght : 7 2: Message received: Test me 2: Message size to send 7 2: test complete. Job of rank 1 Total number of jobs (subjob size): 5 1: Received startpoint 1: Send my startpoint to process 0 in my subjob 1: Got a NEXUS remote service call 1: Message lenght : 7 1: Message received: Test me 1: Message size to send 7 1: test complete. Job of rank 0 Total number of jobs (subjob size): 5 SubJob of address 2 Total number of Subjobs (including me): 2 And I am NOT the first job master 0: Send my startpoint to process 4 in my subjob 0: Received startpoint 0:I am a master and there is other subjobs ("small rings"):I want to lilnk them 0:Waiting for inter job message (Startpoint) 0: INTER Received startpoint 0: INTER Send my startpoint to 1 0: Got a NEXUS remote service call 0: Message lenght : 7 0: Message received: Test me 0: Message size to send 7 0: test complete. Job of rank 0 Total number of jobs (subjob size): 2 SubJob of address 1 Total number of Subjobs (including me): 2 And I am the first job master 0: Send my startpoint to process 1 in my subjob 0: Received startpoint I am a master and there is other subjobs ("small rings"):I want to lilnk them 0: INTER Send my startpoint to 2 Now waiting for inter job message (Startpoint) 0: INTER Received startpoint Message to send around: Test me 0: Message size to send 7 0: Got a NEXUS remote service call 0: Message lenght : 7 0: Message received: Test me 0: Received the correct message ! 0: test complete. Job of rank 1 Total number of jobs (subjob size): 2 1: Received startpoint 1: Send my startpoint to process 0 in my subjob 1: Got a NEXUS remote service call 1: Message lenght : 7 1: Message received: Test me 1: Message size to send 7 1: test complete.Note: