MOAB: Mesh Oriented datABase  (version 5.4.1)
moabDG.h
Go to the documentation of this file.
00001 /*! \page developerguide Developer's Guide
00002  
00003   \subpage dg-contents
00004  
00005   \subpage dg-figures
00006 
00007 */
00008 
00009 /*!  \page dg-figures List of Figures
00010 
00011     \ref figure1
00012 
00013     \ref figure2
00014 
00015     \ref figure3
00016 */
00017 
00018 
00019 /*!  \page dg-contents Table of Contents
00020 
00021   \ref sequence
00022 
00023   \ref manager
00024 
00025   \ref s-mesh
00026 
00027   \ref sets
00028 
00029   \ref impl-error-handling
00030 
00031     \ref dgfiveone
00032 
00033     \ref dgfivetwo
00034 
00035     \ref dgfivethree
00036 
00037     \ref dgfivefour
00038 
00039     \ref dgfivefive
00040 
00041     \ref dgfivesix
00042 
00043   \section sequence  1.EntitySequence & SequenceData
00044 
00045   \subsection figure1 Figure 1: EntitySequences For One SequenceData
00046   \image html figure1.jpg
00047 
00048   \ref dg-figures "List of Figures"
00049 
00050 The <I>SequenceData</I> class manages as set of arrays of per-entity values. Each
00051 <I>SequenceData</I> has a start and end handle denoting the block of entities for which
00052 the arrays contain data. The arrays managed by a <I>SequenceData</I> instance are
00053 divided into three groups:
00054 
00055 - Type-specific data (connectivity, coordinates, etc.): zero or more arrays.
00056 - Adjacency data: zero or one array.
00057 - Dense tag data: zero or more arrays.
00058 .
00059 
00060 The abstract <I>EntitySequence</I> class is a non-strict subset of a <I>SequenceData</I>.
00061 It contains a pointer to a <I>SequenceData</I> and the start and end handles to indicate
00062 the subset of the referenced <I>SequenceData</I>. The <I>EntitySequence</I> class is
00063 used to represent the regions of valid (or allocated) handles in a <I>SequenceData</I>.
00064 A <I>SequenceData</I> is expected to be referenced by one or more <I>EntitySequence</I>
00065 instances.
00066 
00067 Initial <I>EntitySequence</I> and <I>SequenceData</I> pairs are typically created in one
00068 of two configurations. When reading from a file, a <I>SequenceData</I> will be created
00069 to represent all of a single type of entity contained in a file. As all entries in the <I>SequenceData</I> correspond to valid handles (entities read from the file) a single
00070 <I>EntitySequence</I> instance corresponding to the entire <I>SequenceData</I> is initially
00071 created. The second configuration arises when allocating a single entity. If no
00072 entities have been allocated yet, a new <I>SequenceData</I> must be created to store
00073 the entity data. It is created with a constant size (e.g. 4k entities). The new
00074 <I>EntitySequence</I> corresponds to only the first entity in the <I>SequenceData</I>: the
00075 one allocated entity. As subsequent entities are allocated, the <I>EntitySequence</I>
00076 is extended to cover more of the corresponding <I>SequenceData</I>.
00077 
00078 Concrete subclasses of the <I>EntitySequence</I> class are responsible for representing
00079 specific types of entities using the array storage provided by the
00080 <I>SequenceData</I> class. They also handle allocating <I>SequenceData</I> instances with
00081 appropriate arrays for storing a particular type of entity. Each concrete subclass
00082 typically provides two constructors corresponding to the two initial allocation
00083 configurations described in the previous paragraph. <I>EntitySequence</I> implementations
00084 also provide a split method, which is a type of factory method. It
00085 modifies the called sequence and creates a new sequence such that the range of
00086 entities represented by the original sequence is split.
00087 
00088 The <I>VertexSequence</I> class provides an <I>EntitySequence</I> for storing vertex
00089 data. It references a SequenceData containing three arrays of doubles
00090 for storing the blocked vertex coordinate data. The <I>ElementSequence</I> class
00091 extends the <I>EntitySequence</I> interface with element-specific functionality. The
00092 <I>UnstructuredElemSeq</I> class is the concrete implementation of <I>ElementSequence</I>
00093 used to represent unstructured elements, polygons, and polyhedra. <I>MeshSetSequence</I>
00094 is the <I>EntitySequence</I> used for storing entity sets.
00095 
00096 Each <I>EntitySequence</I> implementation also provides an implementation of
00097 the values per entity method. This value is used to determine if an existing
00098 <I>SequenceData</I> that has available entities is suitable for storing a particular
00099 entity. For example, <I>UnstructuredElemSeq</I> returns the number of nodes per element
00100 from values per entity. When allocating a new element with a specific
00101 number of nodes, this value is used to determine if that element may be stored
00102 in a specific <I>SequenceData</I>. For vertices, this value is always zero. This could
00103 be changed to the number of coordinates per vertex, allowing representation of
00104 mixed-dimension data. However, API changes would be required to utilize such
00105 a feature. Sequences for which the corresponding data cannot be used to store
00106 new entities (e.g. structured mesh discussed in a later section) will return -1 or
00107 some other invalid value.
00108 
00109  \ref dg-contents "Top"
00110 
00111   \section manager 2.TypeSequenceManager & SequenceManager
00112 
00113 The <I>TypeSequenceManager</I> class maintains an organized set of <I>EntitySequence</I>
00114 instances and corresponding <I>SequenceData</I> instances. It is used to manage
00115 all such instances for entities of a single <I>EntityType</I>. <I>TypeSequenceManager</I>
00116 enforces the following four rules on its contained data:
00117 
00118 -# No two <I>SequenceData</I> instances may overlap.  
00119 -# No two <I>EntitySequence</I> instances may overlap.
00120 -# Every <I>EntitySequence</I> must be a subset of a <I>SequenceData</I>.
00121 -# Any pair of <I>EntitySequence</I> instances referencing the same <I>SequenceData</I> must be separated by at least one unallocated handle.
00122 .
00123 
00124   \subsection figure2 Figure 2: SequenceManager and Related Classes
00125   \image html figure2.jpg
00126 
00127   \ref dg-figures "List of Figures"
00128 
00129 The first three rules are required for the validity of the data model. The
00130 fourth rule avoids unnecessary inefficiency. It is implemented by merging such
00131 adjacent sequences. In some cases, other classes (e.g. <I>SequenceManager</I>) can
00132 modify an <I>EntitySequence</I> such that the fourth rule is violated. In such cases,
00133 the <I>TypeSequenceManager::notify</I> prepended or <I>TypeSequenceManager::notify</I> appended
00134 method must be called to maintain the integrity of the data<sup>1</sup>. The above rules
00135 (including the fourth) are assumed in many other methods of the <I>TypeSequenceManager</I>
00136 class, such that those methods will fail or behave unexpectedly if the managed
00137 data does not conform to the rules.
00138 
00139 <I>TypeSequenceManager</I> contains three principal data structures. The first is
00140 a <I>std::set</I> of <I>EntitySequence</I> pointers sorted using a custom comparison
00141 operator that queries the start and end handles of the referenced sequences. The
00142 comparison operation is defined as: <I>a->end_handle() < b->start_handle()</I>.
00143 This method of comparison has the advantage that a sequence corresponding to
00144 a specific handle can be located by searching the set for a “sequence” beginning
00145 and ending with the search value. The lower bound and find methods provided
00146 by the library are guaranteed to return the sequence, if it exists. Using
00147 such a comparison operator will result in undefined behavior if the set contains
00148 overlapping sequences. This is acceptable, as rule two above prohibits such
00149 a configuration. However, some care must be taken in writing and modifying
00150 methods in <I>TypeSequenceManager</I> so as to avoid having overlapping sequences
00151 as a transitory state of some operation.
00152 
00153 The second important data member of <I>TypeSequenceManager</I> is a pointer
00154 to the last referenced <I>EntitySequence</I>. This “cached” value is used to speed up
00155 searches by entity handle. This pointer is never null unless the sequence is empty.
00156 This rule is maintained to avoid unnecessary branches in fast query paths. In
00157 cases where the last referenced sequence is deleted, <I>TypeSequenceManager</I> will
00158 typically assign an arbitrary sequence (e.g. the first one) to the last referenced
00159 pointer. (Note: this cached value might give problems for threading models; it
00160 should probably be different for each thread)
00161 
00162 The third data member of <I>TypeSequenceManager</I> is a <I>std::set</I> of <I>SequenceData</I>
00163 instances that are not completely covered by a <I>EntitySequence</I> instance<sup>2</sup>.
00164 This list is searched when allocating new handles. <I>TypeSequenceManager</I> also
00165 embeds in each <I>SequenceData</I> instance a reference to the first corresponding
00166 <I>EntitySequence</I> so that it may be located quickly from only the <I>SequenceData</I>
00167 pointer.
00168 
00169 The <I>SequenceManager</I> class contains an array of <I>TypeSequenceManager</I> in-
00170 stances, one for each <I>EntityType</I>. It also provides all type-specific operations
00171 such as allocating the correct <I>EntitySequence</I> subtype for a given <I>EntityType</I>.
00172 
00173 <sup>1</sup>This source of potential error can be eliminated with changes to the entity set representation.
00174 This is discussed in a later section.
00175 
00176 <sup>2</sup>Given rule four for the data managed by a <I>TypeSequenceManager</I>, any
00177 <I>SequenceData</I> for which all handles are allocated will be referenced by exactly one <I>EntitySequence</I>.
00178 
00179   \ref dg-contents "Top"
00180 
00181  \section s-mesh 3.Structured Mesh
00182 
00183 Structured mesh storage is implemented using subclasses of <I>SequenceData</I>:
00184 <I>ScdElementData</I> and <I>ScdVertexData</I>. The <I>StructuredElementSeq</I> class is
00185 used to access the structured element connectivity. A standard <I>VertexSequence</I>
00186 instance is used to access the ScdVertexData because the vertex data storage
00187 is the same as for unstructured mesh.
00188 
00189   \ref dg-contents "Top"
00190 
00191   \section sets 4.Entity Sets
00192 
00193 - MeshSetSequence
00194 
00195 The <I>MeshSetSequence</I> class is the same as most other subclasses of <I>EntitySequence</I>
00196 in that it utilizes SequenceData to store its data. A single array in the <I>SequenceData</I>
00197 is used to store instances of the MeshSet class, one per allocated <I>EntityHandle</I>.
00198 <I>SequenceData</I> allocates all of its managed arrays using malloc and free as
00199 simple arrays of bytes. <I>MeshSetSequence</I> does in-place construction and
00200 destruction of <I>MeshSet</I> instances within that array. This is similar to what is
00201 done by <I>std::vector</I> and other container classes that may own more storage
00202 than is required at a given time for contained objects.
00203 
00204 - MeshSet
00205 
00206   \subsection figure3 Figure 3: SequenceManager and Related Classes
00207   \image html figure3.jpg
00208 
00209   \ref dg-figures "List of Figures"
00210 
00211 The <I>MeshSet</I> class is used to represent a single entity set instance in MOAB.
00212 The class is optimized to minimize storage (further possible improvements in
00213 storage size are discussed later.)
00214 
00215 Figure 3 shows the memory layout of an instance of the <I>MeshSet</I> class.
00216 The flags member holds the set creation bit flags: <I>MESHSET_TRACK_OWNER</I>,
00217 <I>MESHSET_SET</I>, and <I>MESHSET_ORDERED</I>. The presence of the <I>MESHSET_TRACK_OWNER</I>
00218 indicates that reverse links from the contained entities back to the owning set
00219 should be maintained in the adjacency list of each entity. The <I>MESHSET_SET</I>
00220 and <I>MESHSET_ORDERED</I> bits are mutually exclusive, and as such most code only
00221 tests for the <I>MESHSET_ORDERED</I>, meaning that in practice the <I>MESHSET_SET</I> bit is
00222 ignored. <I>MESHSET_ORDERED</I> indicates that the set may contain duplicate handles
00223 and that the order that the handles are added to the set should be preserved.
00224 In practice, such sets are stored as a simple list of handles. <I>MESHSET_SET</I> (or in
00225 practice, the lack of <I>MESHSET_ORDERED</I>) indicates that the order of the handles
00226 need not be preserved and that the set may not contain duplicate handles. Such
00227 sets are stored in a sorted range-compacted format similar to that of the Range
00228 class.
00229 
00230 The memory for storing contents, parents, and children are each handled in
00231 the same way. The data in the class is composed of a 2-bit ‘size’ field and two
00232 values, where the two values may either be two handles or two pointers. The size
00233 bit-fields are grouped together to reduce the required amount of memory. If the
00234 numerical value of the 2-bit size field is 0 then the corresponding list is empty.
00235 If the 2-bit size field is either 1 or 2, then the contents of the corresponding list
00236 are stored directly in the corresponding two data fields of the MeshSet object.
00237 If the 2-bit size field has a value of 3 (11 binary), then the corresponding two
00238 data fields store the begin and end pointers of an external array of handles.
00239 The number of handles in the external array can be obtained by taking the
00240 difference of the start and end pointers. Note that unlike <I>std::vector</I>, we
00241 do not store both an allocated and used size. We store only the ‘used’ size
00242 and call std::realloc whenever the used size is modified, thus we rely on the
00243 std::malloc implementation in the standard C library to track ‘allocated’ size
00244 for us. In practice this performs well but does not return memory to the ‘system’
00245 when lists shrink (unless they shrink to zero). This overall scheme could exhibit
00246 poor performance if the size of one of the data lists in the set frequently changes
00247 between less than two and more than two handles, as this will result in frequent
00248 releasing and re-allocating of the memory for the corresponding array.
00249 
00250 If the <I>MESHSET_ORDERED</I> flag is not present, then the set contents list (parent
00251 and child lists are unaffected) is stored in a range-compacted format. In this
00252 format the number of handles stored in the array is always a multiple of two.
00253 Each consecutive pair of handles indicate the start and end, inclusive, of a range
00254 of handles contained in the set. All such handle range pairs are stored in sorted
00255 order and do not overlap. Nor is the end handle of one range ever one less than
00256 the start handle of the next. All such ‘adjacent’ range pairs are merged into a
00257 single pair. The code for insertion and removal of handles from range-formatted
00258 set content lists is fairly complex. The implementation will guarantee that a
00259 given call to insert entities into a range or remove entities from a range is never
00260 worse than O(ln n) + O(m + n), where ‘n’ is the number of handles to insert
00261 and ‘m’ is the number of handles already contained in the set. So it is generally
00262 much more efficient to build Ranges of handles to insert (and remove) and call
00263 MOAB to insert (or remove) the entire list at once rather than making many
00264 calls to insert (or remove) one or a few handles from the contents of a set.
00265 The set storage could probably be further minimized by allowing up to six
00266 handles in one of the lists to be elided. That is, as there are six potential ‘slots’
00267 in the MeshSet object then if two of the lists are empty it should be possible to
00268 store up to six values of the remaining list directly in the MeshSet object.
00269 However, the additional runtime cost of such complexity could easily outweigh
00270 any storage advantage. Further investigation into this has not been done because
00271 the primary motivation for the storage optimization was to support binary trees.
00272 
00273 Another possible optimization of storage would be to remove the <I>MeshSet</I>
00274 object entirely and instead store the data in a ‘blocked’ format. The corresponding
00275 <I>SequenceData</I> would contain four arrays: flags, parents, children, and
00276 contents instead of a single array of <I>MeshSet</I> objects. If this were done then
00277 no storage need ever be allocated for parent or child links if none of the sets
00278 in a <I>SequenceData</I> has parent or child links. The effectiveness of the storage
00279 reduction would depend greatly on how sets get grouped into <I>SequenceDatas</I>.
00280 This alternate storage scheme might also allow for better cache utilization as it
00281 would group like data together. It is often the case that application code that
00282 is querying the contents of one set will query the contents of many but never
00283 query the parents or children of any set. Or that an application will query only
00284 parent or child links of a set without every querying other set properties. The
00285 downside of this solution is that it makes the implementation a little less mod-
00286 ular and maintainable because the existing logic contained in the <I>MeshSet</I> class
00287 would need to be spread throughout the <I>MeshSetSequence</I> class.
00288 
00289   \ref dg-contents "Top"
00290 
00291  \section impl-error-handling 5.Implementation of Error Handling
00292 
00293 When a certain error occurs, a MOAB routine can return an enum type ErrorCode (defined in src/moab/Types.hpp)
00294 to its callers. Since MOAB 4.8, the existing error handling model has been completely redesigned to better set
00295 and check errors.
00296 
00297  \subsection dgfiveone 5.1. Existing Error Handling Model
00298 
00299 To keep track of detail information about errors, a class Error (defined in src/moab/Error.hpp) is used to
00300 store corresponding error messages. Some locally defined macros call Error::set_last_error() to report a new
00301 error message, before a non-success error code is returned. At any time, user may call Core::get_last_error()
00302 to retrieve the latest error message from the Error class instance held by MOAB.
00303 
00304 Limitations:
00305 - The Error class can only store the last error message that is being set. When an error originates from a lower
00306 level in a call hierarchy, upper level callers might continue to report more error messages. As a result, previously
00307 reported error messages get overwritten and they will no longer be available to the user.
00308 - There is no useful stack trace for the user to find out where an error first occurs, making MOAB difficult to debug.
00309 
00310  \subsection dgfivetwo 5.2. Enhanced Error Handling Model
00311 
00312 The error handling model of PETSc (http://www.mcs.anl.gov/petsc/) has nice support for stack trace, and our design has
00313 borrowed many ideas from that. The new features of the enhanced model include:
00314 - Lightweight and easy to use with a macro-based interface
00315 - Generate a stack trace starting from the first non-success error
00316 - Support C++ style streams to build up error messages rather than C style sprintf:
00317 \code
00318 MB_SET_ERR(MB_FAILURE, "Failed to iterate over tag on " << NUM_VTX << " vertices");
00319 \endcode
00320 - Have preprocessor-like functionality such that we can do something like:
00321 \code
00322 result = MOABRoutine(...);MB_CHK_SET_ERR(result, "Error message to set if result is not MB_SUCCESS");
00323 \endcode
00324 - Define and handle globally fatal errors or per-processor specific errors.
00325 
00326 The public include file for error handling is src/moab/ErrorHandler.hpp, the source code for the error
00327 handling is in src/ErrorHandler.cpp.
00328 
00329 \subsection dgfivethree 5.3. Error Handler
00330 
00331 The error handling function MBError() only calls one default error handler, MBTraceBackErrorHandler(), which tries to print
00332 out a stack trace. In the future, we need to provide a callback function to user routine before a complete abort. Something
00333 like a UserTeardown that is a function pointer with a context so that the user can destroy and free essential handles before
00334 an MPI abort.
00335 
00336 The arguments to MBTraceBackErrorHandler() are the line number where the error occurred, the function where error was detected,
00337 the file in which the error was detected, the source directory, the error message, and the error type indicating whether the
00338 error message should be printed.
00339 \code
00340 ErrorCode MBTraceBackErrorHandler(int line, const char* func, const char* file, const char* dir, const char* err_msg, ErrorType err_type);
00341 \endcode
00342 This handler will print out a line of stack trace, indicating line number, function name, directory and file name. If MB_ERROR_TYPE_EXISTING
00343 is passed as the error type, the error message will not be printed.
00344 
00345 \subsection dgfivefour 5.4. Simplified Interface
00346 
00347 The simplified C/C++ macro-based interface consists of the following three basic macros:
00348 \code
00349 MB_SET_ERR(Error code, "Error message");
00350 MB_CHK_ERR(Error code);
00351 MB_CHK_SET_ERR(Error code, "Error message");
00352 \endcode
00353 
00354 The macro MB_SET_ERR(err_code, err_msg) is given by
00355 \code
00356 std::ostringstream err_ostr;
00357 err_ostr << err_msg;
00358 return MBError(__LINE__, __func__, __FILENAME__, __SDIR__, err_code, err_ostr.str().c_str(), MB_ERROR_TYPE_NEW_LOCAL);
00359 \endcode
00360 It calls the error handler with the current function name and location: line number, file and directory, plus an error code,
00361 an error message and an error type. With an embedded std::ostringstream object, it supports C++ style streams to build up error
00362 messages. The error type is MB_ERROR_TYPE_NEW_LOCAL/MB_ERROR_TYPE_NEW_GLOBAL on detection of the initial error and MB_ERROR_TYPE_EXISTING
00363 for any additional calls. This is so that the detailed error information is only printed once instead of for all levels of returned errors.
00364 
00365 The macro MB_CHK_ERR(err_code) is defined by
00366 \code
00367 if (MB_SUCCESS != err_code)
00368   return MBError(__LINE__, __func__, __FILENAME__, __SDIR__, err_code, "", MB_ERROR_TYPE_EXISTING);
00369 \endcode
00370 It checks the error code, if not MB_SUCCESS, calls the error handler with error type MB_ERROR_TYPE_EXISTING and return.
00371 
00372 The MB_CHK_SET_ERR(err_code, err_msg) is defined by
00373 \code
00374 if (MB_SUCCESS != err_code)
00375   MB_SET_ERR(err_code, err_msg);
00376 \endcode
00377 It checks the error code, if not MB_SUCCESS, calls MB_SET_ERR() to set a new error.
00378 
00379 To obtain correct line numbers in the stack trace, we recommend putting MB_CHK_ERR() and MB_CHK_SET_ERR() at the same line as a MOAB routine.
00380 
00381 In addition to the basic macros mentioned above, there are some variations, such as (for more information, refer to src/moab/ErrorHandler.hpp):
00382 - MB_SET_GLB_ERR() to set a globally fatal error (for all processors)
00383 - MB_SET_ERR_RET() for functions that return void type
00384 - MB_SET_ERR_RET_VAL() for functions that return any data type
00385 - MB_SET_ERR_CONT() to continue execution instead of returning from current function
00386 These macros should be used appropriately in MOAB source code depending on the context.
00387 
00388 \subsection dgfivefive 5.5. Embedded Parallel Functionality
00389 
00390 We define a global MPI rank with which to prefix the output, as most systems have mechanisms for separating output by rank anyway.
00391 For the error handler, we can pass error type MB_ERROR_TYPE_NEW_GLOBAL for globally fatal errors and MB_ERROR_TYPE_NEW_LOCAL for
00392 per-processor relevant errors.
00393 
00394 Note, if the error handler uses std::cout to print error messages and stack traces in each processor, it can result in a messy output.
00395 This is a known MPI issue with std::cout, and existing DebugOutput class has solved this issue with buffered lines. A new class
00396 ErrorOutput (implemented similar to DebugOutput) is used by the error handler to print each line prefixed with the MPI rank.
00397 
00398 \subsection dgfivesix 5.6. Handle Non-error Conditions
00399 
00400 We should notice that sometimes ErrorCode is used to return a non-error condition (some internal error code that can be handled, or even expected,
00401 e.g. MB_TAG_NOT_FOUND). Therefore, MB_SET_ERR() should be appropriately placed to report an error to the the caller. Before it is used, we need to
00402 carefully decide whether that error is intentional. For example, a lower level MOAB routine that could return MB_TAG_NOT_FOUND should probably not
00403 set an error on it, since the caller might expect to get that error code. In this case, the lower level routine just return MB_TAG_NOT_FOUND as a
00404 condition, and no error is being set. It is then up to the upper level callers to decide whether it should be a true error or not.
00405 
00406   \ref dg-contents "Top"
00407 */
00408 
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines