This distributed directory module may be used alone or in conjunction with Zoltan's load balancing capability and memory and communication services. The user should note that external names (subroutines, etc.) which prefaced by Zoltan_DD_ are reserved when using this module.
The user initially creates an empty distributed directory using Zoltan_DD_Create. Then global ID (GID) information is added to the directory using Zoltan_DD_Update. The directory maintains the GID's basic information: local ID (optional), partition (optional), arbitrary user data (optional), and the current data owner. Zoltan_DD_Update is also called after data migration or refinements. Zoltan_DD_Find returns the directory information for a list of GIDs. A selected list of GIDs may be removed from the directory by Zoltan_DD_Remove. When the user has finished using the directory, its memory is returned to the system by Zoltan_DD_Destroy.
An object is known by its GID. Hashing provides very fast lookup for the information associated with a GID in a two step process. The first hash of the GID yields the processor number owning the directory entry for that GID. The directory entry owner remains constant even if the object (GID) migrates in time. Second, a different hash algorithm of the GID looks up the associated information in directory processor's hash table. The user may optionally register their own (first) hash function to take advantage of their knowledge of their GID naming scheme and the GID's neighboring processors. See the documentation for Zoltan_DD_Set_Hash_Fn for more information. If no user hash function is registered, Zoltan's Zoltan_Hash will be used. This module's design was strongly influenced by the paper "Communication Support for Adaptive Computation" by Pinar and Hendrickson.
Some users number their GIDs by giving the first "n" GIDs to processor 0, the next "n" GIDs to processor 1, and so forth. The function Zoltan_DD_Set_Neighbor_Hash_Fn1 will provide efficient directory communication when these GIDs stay close to their origin. The function Zoltan_DD_Set_Neighbor_Hash_Fn2 allows the specification of ranges of GIDs to each processor for more flexibility. The source code for DD_Set_Neighbor_Hash_Fn1 and DD_Set_Neighbor_Hash_Fn2 provide examples of how a user can create their own "hash" functions taking advantage of their own GID naming convention.
The routine Zoltan_DD_Print will print the contents of the directory. The companion routine Zoltan_DD_Stats prints out a summary of the hash table size, number of linked lists, and the length of the longest linked list. This may be useful when the user creates their own hash functions.
The C++ interface to this utility is defined in the header file zoltan_dd_cpp.h as the class Zoltan_DD. A single Zoltan_DD object represents a distributed directory.
A Fortran90 interface is not yet available.
Source code location: | Utilities/DDirectory |
C Function prototypes file: | Utilities/DDirectory/zoltan_dd.h |
C++ class definition: | Utilities/DDirectory/zoltan_dd_cpp.h |
Library name: | libzoltan_dd.a |
Other libraries used by this library: | libmpi.a, libzoltan_mem.a, libzoltan_comm.a |
Routines:Zoltan_DD_Create: Allocates memory and initializes the directory. |
|
Data Stuctures:
struct Zoltan_DD_Struct: State & storage used by all DD routines. Users should not modify any internal values in this structure. Users should only pass the address of this structure to the other routines in this package. |
The Zoltan_DD_Struct must be passed to all other distributed directory routines. The MPI Comm argument designates the processors used for the distributed directory. The MPI Comm argument is duplicated and stored for later use.
The user can set the debug level argument in the Zoltan_DD_Create
to determine the module's response to multiple updates for any GID
within one update cycle. If the argument is set to 0, all multiple updates
are ignored (but the last determines the directory information.) If the
argument is set to 1, an error is returned if the multiple updates
represent different owners for the same GID. If the debug level is 2,
an error return and an error message are generated if multiple updates
represent different owners for the same GID. If the level is 3, an
error return and an error message are generated for a multiple update even
if the updates represent the same owner for a GID.
Arguments: | |
dd | Structure maintains directory state and hash table. |
comm | MPI comm duplicated and stored specifying directory processors. |
num_gid_entries | Length of GID. |
num_lid_entries | Length of local ID or zero to ignore local IDs. |
user_length | Length of user defined data field (optional, may be zero). |
table_length | Length of hash table (zero forces default value). |
debug_level | Legal values range in [0,3]. Sets response to various error conditions where 3 is the most verbose. |
Returned Value: | |
int | Error code. |
In the C++ interface, the distributed directory is represented by a Zoltan_DD object. It is created when the Zoltan_DD constructor executes. There are two constructors. The first one listed above uses parameters to initialize the distributed directory. The second constructor does not, but it can subsequently be initialized with a call to Zoltan_DD::Create().
Arguments: | |
from | The existing directory structure which will be copied to the new one. |
Returned Value: | |
struct Zoltan_DD_Struct * | The newly created directory structure. |
Arguments: | |
to | A pointer to a pointer to the target structure. The structure will be destroyed and the pointer set to NULL before proceeding with the copy. |
from | A pointer to the source structure. The contents of this structure will be copied to the target structure. |
Returned Value: | |
int | Error code. |
Arguments: | |
dd | Directory structure to be deallocated. |
Returned Value: | |
void | NONE |
There is no explicit Destroy method in the C++ Zoltan_DD class. The object is deallocated when it's destructor is called.
The user can set the debug level argument in Zoltan_DD_Create
to determine the module's response to multiple updates for any GID
within one update cycle.
Arguments: | |
dd | Distributed directory structure state information. |
gid | List of GIDs to update (in). |
lid | List of corresponding local IDs (optional) (in). |
user | List of corresponding user data (optional) (in). |
partition | List of corresponding partitions (optional) (in). |
count | Number of GIDs in update list. |
Returned Value: | |
int | Error code. |
Arguments: | |
dd | Distributed directory structure state information. |
gid | List of GIDs whose information is requested. |
lid | Corresponding list of local IDs (optional) (out). |
data | Corresponding list of user data (optional) (out). |
partition | Corresponding list of partitions (optional) (out). |
count | Count of GIDs in above list. |
owner | Corresponding list of data owners (out). |
Returned Value: | |
int | Error code. |
Arguments: | |
dd | Distributed directory structure state information. |
gid | List of GIDs to eliminate from the directory. |
count | Number of GIDs to be removed. |
Returned Value: | |
int | Error code. |
Experienced users may elect to create their own hash function based on
their knowledge of their GID naming scheme. The user's hash
function must have calling arguments compatible with Zoltan_Hash.
Consider that a user has defined a hash function, myhash, as
unsigned int myhash(ZOLTAN_ID_PTR gid, int length, unsigned int naverage)
{
return *gid / naverage ; /* GID length assumed to be 1 ; naverage = total_GIDS/nproc */
}
Then the call to register this hash function is:
Zoltan_DD_Set_Hash (myhash) ;
NOTE: This hash function might group the gid's directory information near the gid's owning processor's neighborhood, for an appropriate naming scheme.
Arguments: | |
dd | Distributed directory structure state information. |
hash | Name of user's hash function. |
Returned Value: | |
void | NONE |
Arguments: | |
dd | Distributed directory structure for state information |
Returned Value: | |
void | NONE |
Arguments: | |
dd | Distributed directory structure state information. |
size | Number of consecutive GIDs associated with a processor. |
Returned Value: | |
int | Error code. |
Arguments: | |
dd | Distributed directory structure state information. |
proc | List of processor ids labeling for corresponding high, low value. |
low | List of low GID limits corresponding to proc list. |
high | List of high GID limits corresponding to proc list. |
n | Number of elements in the above lists. Should be number of processors! |
Returned Value: | |
int | Error code. |
Arguments: | |
dd | Distributed directory structure state information. |
Returned Value: | |
int | Error code. |
User's Notes
Because Zoltan places no restrictions on the content or length of GIDs, hashing does not guarantee a balanced distribution of objects in the distributed directory. Note also, the worst case behavior of a hash table lookup is very bad (essentially becoming a linear search). Fortunately, the average behavior is very good! The user may specify their own hash function via Zoltan_DD_Set_Hash_Fn to improve performance.
This software module is built on top of the Zoltan Communications functions for efficiency. Improvements to the communications library will automatically benefit the distributed directory.
FUTURE:
The C99 capability for variable length arrays would significantly simplify many of these following routines. (It eliminates the malloc/free calls for temporary storage. This helps prevent memory leaks.) Other C99 features may also improve code readability. The "inline" capability can potentially improve performance.
The distributed directory should be implemented via threads. However, MPI is not fully thread aware, yet.