# Extensibility in PETSc

It’s important that modern computational science and engineering software not only handle multiphysics and multiscale simulation [3], but also support higher-level analysis, including risk-aware and optimal design, uncertainty analysis, and stochastic simulations. In addition to deeper software stacks and interdisciplinary teams, this requirement necessitates specialization in each application area (e.g., boundary conditions, geometry, subgrid closures, analysis techniques, data sources, and inherent uncertainty/bias), or *extensibility* (and customization) in the terminology of software engineering [1]. Original authors can no longer foresee all use cases for their software. Below are examples from the Portable Extensible Toolkit for Scientific computation (PETSc) library for scalable solutions of partial differential equations (PDEs), illustrating the leverage gained by developing simple, composable pieces instead of monolithic software frameworks.

### Dynamic Implementation Registration

At the highest level, PETSc facilitates extensibility by providing a mechanism to dynamically register implementations of class objects associated with vectors (Vec), matrices (Mat), Krylov methods (KSP), preconditioners (PC), nonlinear solvers (SNES), time integrators (TS), and mesh management objects (DM). This registration mechanism is used for all implementations provided by PETSc. Moreover, the end user has access to the registration API for plugins, enabling seamless use of custom implementations defined within the user’s software stack (e.g., a highly-optimized, matrix-free matrix-vector product implementation) in a fashion identical to any native PETSc implementation.

### Efficient Solvers through Composition

Determining the optimal solver for a given set of equations and boundary conditions appears impossible [2, 4], as its configuration also depends on the initial guess, machine architecture, required accuracy, and even problem evolution. Therefore, the most sensible software design allows for dynamically (runtime) configurable solvers. In PETSc, the user creates a solver (linear, nonlinear, timestepping, optimization), provides data defining the problem (system matrix and right-hand side, residual and Jacobian callbacks, cost function, etc.), and in some cases offers details of the data layout (topology, geometry, grid hierarchy). The solver itself is then assembled via command line options. This process allows trial of all appropriate solver combinations without repeated editing and compiling of the user’s code. Shell scripts are often used to test hundreds of solvers and solver options.

Our strategy is to assemble complex, tailored solvers from simple, generic parts using command line options. For example, we can divide a system into parts by field or by subdomain, and combine solvers additively, multiplicatively, or with a Schur complement. We can apply these analysis and synthesis actions hierarchically to successive divisions, and similarly introduce hierarchy through coarsening or refinement. The following example demonstrates the flexibility of PETSc solvers on a problem of scientific interest and reinforces the aforementioned ideas.

Magma dynamics describes the flow of molten rock (melt) through a deformable porous host rock, which can be modeled by the two-phase formulation [6]

\[\nabla p - \nabla \left( \zeta(\phi) \nabla\cdot \vec{v}_S \right) - \nabla\cdot \left( 2\eta(\phi) \dot{\mathbf{\epsilon}}_S \right) = \vec{f},\]

\[\nabla\cdot \left( -\frac{K(\phi)}{\mu} \nabla p + \vec{v}_S \right) = g, \tag1\]

\[\frac{\partial\phi}{\partial t} - \nabla\cdot (1 - \phi) \vec{v}_S = 0, \]

where the equations correspond to the conservation of momentum, mass, and evolution of rock porosity \((\phi)\). In \((1),\) \(\vec{v}_S\) is the solid velocity, \(\dot{\mathbf{\epsilon}}_S\) is the deviatoric strain-rate tensor, and \(p\) is the fluid pressure. The rock permeability \(K\), shear viscosity \(\eta\), and bulk viscosity \(\zeta\) are all nonlinear functions of \(\phi\). A typical method for solving this differential algebraic equation (DAE) involves an explicit timestep for the advection of \(\phi\), then a nonlinear solver for \(\vec{v}_S\) and \(p\). Researchers in [5] propose to solve the nonlinear system using Newton’s method with a block preconditioner for the linearized system

\[\left( \begin{array}{ccc} A & 0 \\

0 & M+L \\ \end{array} \right)\]

where \(\textrm{A}\) is the divergence of the deviatoric stress and compaction from the first equation, \(\textrm{M}\) is the pressure mass matrix, and \(\textrm{L}\) is the pressure Laplacian. We can easily implement this solver using algebraic multigrid (AMG) to solve \(\textrm{A}\) and the additive Schwarz method (ASM) to solve \(\textrm{M} + \textrm{L}\), with the following command line arguments:

-snes_type newtonls -snes_mf_operator

-pc_type fieldsplit -pc_fieldsplit_0_fields 0 -pc_fieldsplit_1_fields 1

-pc_fieldsplit_type additive -pc_fieldsplit_diag_use_amat 0

-fieldsplit_0_pc_type gamg -fieldsplit_1_pc_type asm

Here we use the matrix-free action of the Jacobian in a Krylov solver. Fieldsplit provides decomposition into the discrete \(\vec{v}_S\) and \(p\) fields required by the \(2 \times 2\) block preconditioner. We then construct a preconditioner using the diagonal blocks of a preconditioning matrix.

We can extend this formulation to solve a nonlinear system over all three fields in order to make the explicit update consistent with the conservation equations following that update, simply by including the porosity in our splitting:

-pc_type fieldsplit -pc_fieldsplit_0_fields 0,2 -pc_fieldsplit_1_fields 1

and Newton’s method as a level smoother:

-snes_type fas -snes_fas_type full

-fas_levels_snes_type newtonls -fas_levels_snes_max_it 2 -fas_levels_snes_mf

-fas_levels_pc_type fieldsplit -fas_levels_pc_fieldsplit_0_fields 0,2

-fas_levels_pc_fieldsplit_1_fields 1

-fas_levels_pc_fieldsplit_type schur -fas_levels_pc_fieldsplit_schur_precondition a11

-fas_levels_pc_fieldsplit_schur_factorization_type full

-fas_levels_fieldsplit_0_pc_type gamg -fas_levels_fieldsplit_pressure_pc_type gamg

-fas_coarse_snes_type newtonls -fas_coarse_snes_mf_operator

-fas_coarse_pc_type fieldsplit -fas_coarse_pc_fieldsplit_0_fields 0,2

-fas_coarse_pc_fieldsplit_1_fields 1

-fas_coarse_pc_fieldsplit_type schur -fas_coarse_pc_fieldsplit_schur_precondition a11

-fas_coarse_pc_fieldsplit_schur_factorization_type full

-fas_coarse_fieldsplit_0_pc_type lu -fas_coarse_fieldsplit_pressure_pc_type lu

We also switched to a Schur complement factorization and pull off-diagonal blocks from the original Jacobian matrix. In simpler flow regimes, we can relax the fine grid solver to nonlinear Gauss-Seidel to improve performance:

-fas_levels_snes_type ngs -fas_levels_snes_max_it 10

### Simulation-Specific Customization

A given simulation framework will invariably require custom methods for visualizing/monitoring residuals, determining convergence of a linear/nonlinear solver, selecting a suitable time-step, and so on. Such requirements are supported by the respective PETSc object because they allow users to register custom implementations for these actions.

### Scalable Customizable Communication

Any linear algebra/solver package, including PETSc, will fail to encompass the full range of parallel communication patterns that users desire. Thus, unless users intend to write low-level code, such as MPI, some sort of interface for unstructured parallel communication is necessary. In PETSc, parallel communication is abstracted in the VecScatter object, a mapping from a source serial or parallel vector to a target vector, perhaps with different parallel layout. The mapping is specified by identifying an index in the source vector with another index in the target.

For instance, suppose that we would like to gather all the values from a parallel vector on one process, say for visualization or output. We can do this by creating a VecScatter that maps all indices in the vector to a serial vector, as in the code shown below:

VecGetSize(global, &N);

N = rank ? 0 : N;

VecCreateSeq(PETSC_COMM_SELF, N, &serial);

ISCreateStride(PETSC_COMM_SELF, N, 0, 1, &is);

VecScatterCreate(global, is, serial, is, &scatter);

The code is available concisely in PETSc as follows:

VecScatterCreateToZero(global, &scatter, &serial);

Now we can redistribute the values into a serial vector:

VecScatterBegin(scatter, global, serial, INSERT_VALUES, SCATTER_FORWARD);

VecScatterEnd(scatter, global, serial, INSERT_VALUES, SCATTER_FORWARD);

We can reverse the mapping, sending the values in a serial vector to all processes, using SCATTER_REVERSE. If we wanted to send the distributed vector to all processes, we would simply remove the line that changes N to 0 for all processes but rank 0, which is encapsulated in the VecScatterCreateToAll() function.

We also make use of VecScatter internally, and scatter the appropriate values from the input vector on to each process when a parallel sparse matrix vector product is performed. Not only is this conceptually simpler, but it leverages architecture-specific optimizations unavailable with raw MPI, such as preference for MPI_Alltoall() over MPI_ISend/Recv(), or segregation of on-node and off-node communication. However, some interface limitations have been identified.

Since VecScatter is tied to the Vec class, it can only move floating point values. The PetscSF object, on the other hand, keeps the one-sided specification of data mapping while allowing storage of arbitrary types in raw arrays. The basic operations (broadcast and reduce) have been augmented with the ability to gather values instead of combine them, FetchAndOp for scan operations, and composition of two PetscSF maps, as well as inversion. Thus, PetscSF has exposed new opportunities for reuse and optimization inside PETSc.

With PETSc, designing and refactoring software using best practices for extensible library development increases scientific and engineering value most effectively. As we demonstrate above, doing so greatly enhances the usability, productivity, and capability of the software.

**References**

[1] Brown. J, Knepley M.G., & Smith, B. (2015). Run-time extensibility and librarization of simulation software. *IEEE Computing in Science and Engineering, 17*, 38-45.

[2] Greenbaum, A., Pták, V. & Zdeněk, S. (1996). Any nonincreasing convergence curve is possible for GMRES. *SIAM Journal on Matrix Analysis and Applications, 17*, 465-469.

[3] Keyes, D.E., McInnes, L.C., Woodward, C., Gropp, W., Myra, E., Pernice, M….Wohlmuth, B. (2013). Multiphysics simulations: Challenges and opportunities. *International Journal of High Performance Computing Applications, 27*, 4-83.

[4] Nachtigal, N.M., Reddy, S.C., & Trefethen, L.N. (1992). How fast are nonsymmetric matrix iterations? *SIAM Journal on Matrix Analysis and Applications, 13*, 778-795.

[5] Rhebergen, S., Wells, G.N., Katz, R. F., & Wathen, A.J. (2014). Analysis of block-preconditioners for models of coupled magma/mantle dynamics. *SIAM J. Sci. Comput., 36*, A1960–A1977.

[6] Takei, Y., & Katz, R.F. (2013). Consequences of viscous anisotropy in a deforming, two-phase aggregate. Part 1. Governing equations and linearised analysis. *J. Fluid Mech., 734*, 424-455.