SuperLU_DIST
4.0
superlu_dist on CPU and GPU clusters
|
Implements parallel symbolic factorization. More...
Functions | |
float | symbfact_dist (int nprocs_num, int nprocs_symb, SuperMatrix *A, int_t *perm_c, int_t *perm_r, int_t *sizes, int_t *fstVtxSep, Pslu_freeable_t *Pslu_freeable, MPI_Comm *num_comm, MPI_Comm *symb_comm, mem_usage_t *symb_mem_usage) |
Implements parallel symbolic factorization.
– Parallel symbolic factorization routine (version 2.3) – Lawrence Berkeley National Lab, Univ. of California Berkeley - July 2003 INRIA France - January 2004 Laura Grigori
November 1, 2007 Feburary 20, 2008 October 15, 2008
The function symbfact_dist implements the parallel symbolic factorization algorithm described in the paper:
Parallel Symbolic Factorization for Sparse LU with Static Pivoting, Laura Grigori, James W. Demmel and Xiaoye S. Li, Pages 1289-1314, SIAM Journal on Scientific Computing, Volume 29, Issue 3.
float symbfact_dist | ( | int | nprocs_num, |
int | nprocs_symb, | ||
SuperMatrix * | A, | ||
int_t * | perm_c, | ||
int_t * | perm_r, | ||
int_t * | sizes, | ||
int_t * | fstVtxSep, | ||
Pslu_freeable_t * | Pslu_freeable, | ||
MPI_Comm * | num_comm, | ||
MPI_Comm * | symb_comm, | ||
mem_usage_t * | symb_mem_usage | ||
) |
Purpose ======= symbfact_dist() performs symbolic factorization of matrix A suitable for performing the supernodal Gaussian elimination with no pivoting (GEPP). This routine computes the structure of one column of L and one row of U at a time. It uses: o distributed input matrix o supernodes o symmetric structure pruning
Arguments =========
nprocs_num (input) int Number of processors SuperLU_DIST is executed on, and the input matrix is distributed on.
nprocs_symb (input) int Number of processors on which the symbolic factorization is performed. It is equal to the number of independent domains idenfied in the graph partitioning algorithm executed previously and has to be a power of 2. It corresponds to number of leaves in the separator tree.
A (input) SuperMatrix* Matrix A in A*X=B, of dimension (A->nrow, A->ncol). The number of the linear equations is A->nrow. Matrix A is distributed in NRformat_loc format. Matrix A is not yet permuted by perm_c.
perm_c (input) int_t* Column permutation vector of size A->ncol, which defines the permutation matrix Pc; perm_c[i] = j means column i of A is in position j in A*Pc.
perm_r (input) int_t* Row permutation vector of size A->nrow, which defines the permutation matrix Pr; perm_r[i] = j means column i of A is in position j in Pr*A.
sizes (input) int_t* Contains the number of vertices in each separator.
fstVtxSep (input) int_t* Contains first vertex for each separator.
Pslu_freeable (output) Pslu_freeable_t* Returns the local L and U structure, and global to local information on the indexing of the vertices. Contains all the information necessary for performing the data distribution towards the numeric factorization.
num_comm (input) MPI_Comm* Communicator for numerical factorization
symb_comm (input) MPI_Comm* Communicator for symbolic factorization
symb_mem_usage (input) mem_usage_t * Statistics on memory usage.
Return value ============ < 0, number of bytes allocated on return from the symbolic factorization. > 0, number of bytes allocated when out of memory.
Sketch of the algorithm =======================
Distrbute the vertices on the processors using a subtree to subcube algorithm.
Redistribute the structure of the input matrix A according to the subtree to subcube computed previously for the symbolic factorization routine. This implies in particular a distribution from nprocs_num processors to nprocs_symb processors.
Perform symbolic factorization guided by the separator tree provided by a graph partitioning algorithm. The symbolic factorization uses a combined left-looking, right-looking approach.