Branch data Line data Source code
1 : : /*
2 : : * ParCommGraph.hpp
3 : : *
4 : : * will be used to setup communication between 2 distributed meshes, in which one mesh was migrated
5 : : * from the other. (one example is atmosphere mesh migrated to coupler pes)
6 : : *
7 : : * there are 3 communicators in play, one for each mesh, and one for the joined
8 : : * communicator, that spans both sets of processes; to send mesh or tag data we need to use the
9 : : * joint communicator, use nonblocking MPI_iSend and blocking MPI_Recv receives
10 : : *
11 : : * various methods should be available to migrate meshes; trivial, using graph partitioner (Zoltan
12 : : * PHG) and using a geometric partitioner (Zoltan RCB)
13 : : *
14 : : * communicators are represented by their MPI groups, not by their communicators, because
15 : : * the groups are always defined, irrespective of what tasks are they on. Communicators can be
16 : : * MPI_NULL, while MPI_Groups are always defined
17 : : *
18 : : * Some of the methods in here are executed over the sender communicator, some are over the
19 : : * receiver communicator They can switch places, what was sender becomes the receiver and viceversa
20 : : *
21 : : * The name "graph" is in the sense of a bipartite graph, in which we can separate senders and
22 : : * receivers tasks
23 : : *
24 : : * The info stored in the ParCommGraph helps in migrating fields (MOAB tags) from component to the
25 : : * coupler and back
26 : : *
27 : : * So initially the ParCommGraph is assisting in mesh migration (from component to coupler) and
28 : : * then is used to migrate tag data from component to coupler and back from coupler to component.
29 : : *
30 : : * The same class is used after intersection (which is done on the coupler pes between 2 different
31 : : * component migrated meshes) and it alters communication pattern between the original component pes
32 : : * and coupler pes;
33 : : *
34 : : * We added a new way to send tags between 2 models; the first application of the new method is to
35 : : * send tag from atm dynamics model (spectral elements, with np x np tags defined on each element,
36 : : * according to the GLOBAL_DOFS tag associated to each element) towards the atm physics model, which
37 : : * is just a point cloud of vertices distributed differently to the physics model pes; matching is
38 : : * done using GLOBAL_ID tag on vertices; Right now, we assume that the models are on different pes,
39 : : * but the joint communicator covers both and that the ids of the tasks are with respect to the
40 : : * joint communicator
41 : : *
42 : : *
43 : : */
44 : : #include "moab_mpi.h"
45 : : #include "moab/Interface.hpp"
46 : : #include "moab/ParallelComm.hpp"
47 : : #include <map>
48 : :
49 : : #ifndef SRC_PARALLEL_MOAB_PARCOMMGRAPH_HPP_
50 : : #define SRC_PARALLEL_MOAB_PARCOMMGRAPH_HPP_
51 : :
52 : : namespace moab
53 : : {
54 : :
55 : : class ParCommGraph
56 : : {
57 : : public:
58 : : enum TypeGraph
59 : : {
60 : : INITIAL_MIGRATE,
61 : : COVERAGE,
62 : : DOF_BASED
63 : : };
64 : : virtual ~ParCommGraph();
65 : :
66 : : /**
67 : : * \brief collective constructor, will be called on all sender tasks and receiver tasks
68 : : * \param[in] joincomm joint MPI communicator that covers both sender and receiver MPI groups
69 : : * \param[in] group1 MPI group formed with sender tasks; (sender usually loads the mesh in a
70 : : * migrate scenario) \param[in] group2 MPI group formed with receiver tasks; (receiver
71 : : * usually receives the mesh in a migrate scenario) \param[in] coid1 sender component unique
72 : : * identifier in a coupled application (in climate simulations could be the Atmosphere Comp id =
73 : : * 5 or Ocean Comp ID , 17) \param[in] coid2 receiver component unique identifier in a
74 : : * coupled application (it is usually the coupler, 2, in E3SM)
75 : : *
76 : : * this graph will be formed on sender and receiver tasks, and, in principle, will hold info
77 : : * about how the local entities are distributed on the other side
78 : : *
79 : : * Its major role is to help in migration of data, from component to the coupler and vice-versa;
80 : : * Each local entity has a corresponding task (or tasks) on the other side, to where the data
81 : : * needs to be sent
82 : : *
83 : : * important data stored in ParCommGraph, immediately it is created
84 : : * - all sender and receiver tasks ids, with respect to the joint communicator
85 : : * - local rank in sender and receiver group (-1 if not part of the respective group)
86 : : * - rank in the joint communicator (from 0)
87 : : */
88 : : ParCommGraph( MPI_Comm joincomm, MPI_Group group1, MPI_Group group2, int coid1, int coid2 );
89 : :
90 : : /**
91 : : * \brief copy constructor will copy only the senders, receivers, compid1, etc
92 : : */
93 : : ParCommGraph( const ParCommGraph& );
94 : :
95 : : /**
96 : : \brief Based on the number of elements on each task in group 1, partition for group 2,
97 : : trivially
98 : :
99 : : <B>Operations:</B> it is called on every receiver task; decides how are all elements distributed
100 : :
101 : : Note: establish how many elements are sent from each task in group 1 to tasks in group 2
102 : : This call is usually made on a root / master process, and will construct local maps that
103 : : are member data, which contain the communication graph, in both directions Also, number of
104 : : elements migrated/exchanged between each sender/receiver
105 : :
106 : : \param[in] numElemsPerTaskInGroup1 (std::vector<int> &) number of elements on each sender
107 : : task
108 : : */
109 : :
110 : : ErrorCode compute_trivial_partition( std::vector< int >& numElemsPerTaskInGroup1 );
111 : :
112 : : /**
113 : : \brief pack information about receivers view of the graph, for future sending to receiver
114 : : root
115 : :
116 : : <B>Operations:</B> Local, called on root process of the senders group
117 : :
118 : : \param[out] packed_recv_array
119 : : packed data will be sent to the root of receivers, and distributed from there, and
120 : : will have this information, for each receiver, concatenated
121 : : receiver 1 task, number of senders for receiver 1, then sender tasks for receiver 1,
122 : : receiver 2 task, number of senders for receiver 2, sender tasks for receiver 2, etc Note: only
123 : : the root of senders will compute this, and send it over to the receiver root, which will
124 : : distribute it over each receiver; We do not pack the sizes of data to be sent, only the
125 : : senders for each of the receivers (could be of size O(n^2) , where n is the number of tasks ;
126 : : but in general, it should be more like O(n) ). Each sender sends to a "finite" number of
127 : : receivers, and each receiver receives from a finite number of senders). We need this info to
128 : : decide how to set up the send/receive waiting game for non-blocking communication )
129 : : */
130 : : ErrorCode pack_receivers_graph( std::vector< int >& packed_recv_array );
131 : :
132 : : // get methods for private data
133 : 0 : bool is_root_sender()
134 : : {
135 : 0 : return rootSender;
136 : : }
137 : :
138 : : bool is_root_receiver()
139 : : {
140 : : return rootReceiver;
141 : : }
142 : :
143 : 0 : int sender( int index )
144 : : {
145 : 0 : return senderTasks[index];
146 : : }
147 : :
148 : 0 : int receiver( int index )
149 : : {
150 : 0 : return receiverTasks[index];
151 : : }
152 : :
153 : : int get_component_id1()
154 : : {
155 : : return compid1;
156 : : }
157 : : int get_component_id2()
158 : : {
159 : : return compid2;
160 : : }
161 : :
162 : : int get_context_id()
163 : : {
164 : : return context_id;
165 : : }
166 : : void set_context_id( int other_id )
167 : : {
168 : : context_id = other_id;
169 : : }
170 : :
171 : 0 : EntityHandle get_cover_set()
172 : : {
173 : 0 : return cover_set;
174 : : }
175 : : void set_cover_set( EntityHandle cover )
176 : : {
177 : : cover_set = cover;
178 : : }
179 : :
180 : : // return local graph for a specific task
181 : : ErrorCode split_owned_range( int sender_rank, Range& owned );
182 : :
183 : : ErrorCode split_owned_range( Range& owned );
184 : :
185 : : ErrorCode send_graph( MPI_Comm jcomm );
186 : :
187 : : ErrorCode send_graph_partition( ParallelComm* pco, MPI_Comm jcomm );
188 : :
189 : : ErrorCode send_mesh_parts( MPI_Comm jcomm, ParallelComm* pco, Range& owned );
190 : :
191 : : // this is called on receiver side
192 : : ErrorCode receive_comm_graph( MPI_Comm jcomm, ParallelComm* pco, std::vector< int >& pack_array );
193 : :
194 : : ErrorCode receive_mesh( MPI_Comm jcomm, ParallelComm* pco, EntityHandle local_set,
195 : : std::vector< int >& senders_local );
196 : :
197 : : ErrorCode release_send_buffers();
198 : :
199 : : ErrorCode send_tag_values( MPI_Comm jcomm, ParallelComm* pco, Range& owned, std::vector< Tag >& tag_handles );
200 : :
201 : : ErrorCode receive_tag_values( MPI_Comm jcomm, ParallelComm* pco, Range& owned, std::vector< Tag >& tag_handles );
202 : :
203 : : // getter method
204 : : const std::vector< int >& senders()
205 : : {
206 : : return senderTasks;
207 : : } // reference copy; refers to sender tasks in joint comm
208 : : const std::vector< int >& receivers()
209 : : {
210 : : return receiverTasks;
211 : : }
212 : :
213 : : ErrorCode settle_send_graph( TupleList& TLcovIDs );
214 : :
215 : : // this will set after_cov_rec_sizes
216 : : void SetReceivingAfterCoverage(
217 : : std::map< int, std::set< int > >& idsFromProcs ); // will make sense only on receivers, right now after cov
218 : :
219 : : // strideComp is np x np, or 1, in our cases
220 : : // will fill up ordered lists for corresponding IDs on the other component
221 : : // will form back and forth information, from ordered list of IDs, to valuesComp
222 : : void settle_comm_by_ids( int comp, TupleList& TLBackToComp, std::vector< int >& valuesComp );
223 : : // new partition calculation
224 : : ErrorCode compute_partition( ParallelComm* pco, Range& owned, int met );
225 : :
226 : : // dump local information about graph
227 : : ErrorCode dump_comm_information( std::string prefix, int is_send );
228 : :
229 : : private:
230 : : /**
231 : : \brief find ranks of a group with respect to an encompassing communicator
232 : :
233 : : <B>Operations:</B> Local, usually called on root process of the group
234 : :
235 : : \param[in] joincomm (MPI_Comm)
236 : : \param[in] group (MPI_Group)
237 : : \param[out] ranks ( std::vector<int>) ranks with respect to the joint communicator
238 : : */
239 : : void find_group_ranks( MPI_Group group, MPI_Comm join, std::vector< int >& ranks );
240 : :
241 : : MPI_Comm comm;
242 : : std::vector< int > senderTasks; // these are the sender tasks in joint comm
243 : : std::vector< int > receiverTasks; // these are all the receiver tasks in joint comm
244 : : bool rootSender;
245 : : bool rootReceiver;
246 : : int rankInGroup1, rankInGroup2; // group 1 is sender, 2 is receiver
247 : : int rankInJoin, joinSize;
248 : : int compid1, compid2;
249 : : int context_id; // used to identify the other comp for intersection
250 : : EntityHandle cover_set; // will be initialized only if it is the receiver parcomm graph, in
251 : : // CoverageGraph
252 : :
253 : : // communication graph from group1 to group2;
254 : : // graph[task1] = vec1; // vec1 is a stl vector of tasks in group2
255 : : std::map< int, std::vector< int > > recv_graph; // to what tasks from group2 to send (actual communication graph)
256 : : std::map< int, std::vector< int > >
257 : : recv_sizes; // how many elements to actually send from a sender task to receiver tasks
258 : : std::map< int, std::vector< int > >
259 : : sender_graph; // to what tasks from group2 to send (actual communication graph)
260 : : std::map< int, std::vector< int > >
261 : : sender_sizes; // how many elements to actually send from a sender task to receiver tasks
262 : :
263 : : std::vector< ParallelComm::Buffer* > localSendBuffs; // this will store the pointers to the Buffers
264 : : // will be released only when all mpi requests are waited
265 : : // for
266 : : int* comm_graph; // this will store communication graph, on sender master, sent by nonblocking
267 : : // send to the master receiver first integer will be the size of the graph,
268 : : // the rest will be the packed graph, for trivial partition
269 : :
270 : : // these will be now used to store ranges to be sent from current sender to each receiver in
271 : : // joint comm
272 : : std::map< int, Range > split_ranges;
273 : :
274 : : std::vector< MPI_Request > sendReqs; // there will be multiple requests, 2 for comm graph, 2 for each Buffer
275 : : // there are as many buffers as sender_graph[rankInJoin].size()
276 : :
277 : : // active on both receiver and sender sides
278 : : std::vector< int > corr_tasks; // subset of the senderTasks, in the joint comm for sender;
279 : : // subset of receiverTasks for receiver side
280 : : std::vector< int > corr_sizes; // how many primary entities corresponding to the other side
281 : : // so what we know is that the local range corresponds to remote corr_sizes[i] size ranges on
282 : : // tasks corr_tasks[i]
283 : :
284 : : // these will be used now after coverage, quick fix; they will also be populated by
285 : : // iMOAB_CoverageGraph
286 : : TypeGraph graph_type; // this should be false , set to true in settle send graph, to use send_IDs_map
287 : : std::map< int, std::vector< int > > involved_IDs_map; // replace send and recv IDs_mapp with involved_IDs_map
288 : : // used only for third method: DOF_BASED
289 : : std::map< int, std::vector< int > >
290 : : map_index; // from index in involved[] to index in values[] of tag, for each corr task
291 : : std::map< int, std::vector< int > > map_ptr; // lmap[ie], lmap[ie+1], pointer into map_index[corrTask]
292 : : };
293 : :
294 : : } // namespace moab
295 : : #endif /* SRC_PARALLEL_MOAB_PARCOMMGRAPH_HPP_ */
|