| June 01, 2022

Large-scale PDE-constrained Optimization for Ice Sheet Model Initialization

Ice sheet modeling contributes to the accurate projection of future sea level rise — a growing threat that will greatly impact coastal populations and infrastructure. One can model ice sheets as extremely viscous shear-thinning fluids that are driven by gravity and whose dynamics strongly depend on the ice temperature and boundary conditions at the ice bed. A critical step in ice sheet modeling is initialization: the estimation of unknown and poorly known parameter fields like basal friction and bed topography, and of the initial temperature in the ice sheet interior. Researchers typically compute these parameter fields by solving a partial differential equation (PDE)-constrained optimization problem to minimize the mismatch between the available observations (e.g., surface ice velocity) and the corresponding quantities that the ice flow model computes. The optimization problem takes the following form:

\[\textrm{Find} \ p \ \textrm{that minimizes} \ \mathcal{J}, \qquad \mathcal{J}(u,p):= \mathcal{M}_C(u,u_{\textrm{obs}}) + \mathcal{R}(p), \qquad \textrm{such that} \ \mathcal{F}(u,T;\, p) = 0.\]

Here, \(p\) is the parameter field (e.g., basal friction coefficient) and \(u\) and \(T\) are respectively the ice velocity and temperature fields that solve the ice flow model, such that \(\mathcal{F}(u, T;p) = 0\). \(\mathcal{J}\) is a scalar objective functional that is composed of two terms: (i) A misfit term \(\mathcal{M}_C\) that penalizes the difference between \(u\) and the observed surface velocity \(u_{\textrm{obs}}\), and more heavily weighs discrepancies with observations that have small covariance \(C\), and (ii) the regularization term \(\mathcal{R}\) that helps ensure well-posedness of the optimization problem and avoids overfitting the data.

In our optimization, the constraint \(\mathcal{F}\) represents the “Blatter-Pattyn” approximation of the Stokes equations for the ice velocity, coupled with a steady-state enthalpy model for ice temperature. Most ice sheet codes compute the initial temperature by “spinning up” the model for several thousands of years until it is close to steady state, and do not include the enthalpy model as a constraint. A drawback of this approach—which our formulation overcomes—is that the model’s thermomechanical state after the spin-up is no longer consistent with observations. Figure 1 illustrates an initialization of the Greenland ice sheet that was performed via an ice sheet model called MPAS-Albany Land Ice (MALI) [4, 8].

Figure 1. This ice sheet initial state was obtained by inverting for the basal friction and matching the observed ice surface velocity. The initialization involves the computation of a steady-state ice temperature field that is self-consistent with the velocity. 1a. Modeled and observed surface velocity in meters/year. 1b. Estimated basal friction in kilopascal-years/meter. 1c. Modeled steady state temperature in Kelvin; ice is removed in some areas to show the basal temperature. The solution field and parameter field respectively have 14 million and 300,000 degrees of freedom. Figure courtesy of the author.

Computational Challenges

Solving the aforementioned large-scale PDE-constrained optimization problem is quite challenging because of its size and highly nonlinear constraints. In the case of high-resolution simulations of an entire ice sheet, the parameter fields’ dimension is on the order of \(10^5\)-\(10^6\), whereas the solution typically has \(10^7\)-\(10^8\) unknowns. Key ingredients for the performance of PDE-constrained optimization are as follows:

Efficient and robust derivative-based optimization methods
Fast and robust solvers for the forward problem
Tools that efficiently compute first and second derivatives of the objective functional \(\mathcal{J}\) and the constraint \(\mathcal{F}\) with respect to the solution \((u,T)\) and parameters \(p\).

Optimization Methods. We implement reduced-space trust-region methods in the Trilinos Rapid Optimization Library to solve the PDE-constrained optimization problem. The term “reduced space” refers to the fact that the optimizer sees the objective \(\mathcal{J}\) as a function of the parameter \(p\); for any guess of the parameter \(p\), one must solve the constraint to obtain the corresponding solution \(u\) and compute the objective functional. Trust-region methods build an approximation of the functional on a “trusted” region of the parameter space around the current parameter guess; they also require first and second total derivatives of the functional with respect to \(p\). We employ efficient, adjoint-based methods to compute total derivatives of \(\mathcal{J}\) with respect to \(p\); these methods involve the solution of one (in the case of first derivatives) or two (in the case of second derivatives) additional linear adjoint problems. During the optimization, we must properly choose the dot product that defines the gradients. Specifically, the Fréchet gradient \(\mathcal{G}\) depends on the dot product \((\cdot, \cdot)\) via the relation \(\frac{d \mathcal{J}}{dp}v = (\mathcal{G}, v)\). For discretized problems, one would typically choose to take the \(l^2\) dot product; but when the parameter is a finite element field, it is often better to define the dot product based on the Hessian of the regularization term: \((x, y) = x^T (\partial_{pp} \mathcal{R})^{-1} y\) (if the Hessian is non-singular). This choice reduces the dependence of \(\frac{d \mathcal{J}}{dp}\) on the specific mesh, which is particularly important for nonuniform meshes.

Solution of Forward Problems. During initialization, our process computes the forward problem solutions many times with parameters that span a wide range of values. The solver must therefore be both fast and robust. Relaxed Newton methods with line search have proven fairly robust for our problem, and the few convergence failures typically do not significantly affect the optimization algorithm’s performance/convergence. Most of the models’ computational cost is due to the finite element assembly—computation of the constraint \(\mathcal{F}\) and its Jacobian \(\partial_{u,T} \mathcal{F}\)—and the solution of the linear systems at each Newton iteration. Our approach maximizes performance in assembly by (i) using memoization techniques to store and reuse computations, (ii) reducing communications across message passing interface ranks, and (iii) exposing parallelism. We solve the linear system using a Krylov method (the generalized minimal residual method) that is preconditioned with either a multigrid preconditioner [9] or a bi-level Schwarz preconditioner [3]. The Kokkos programming model enables performance portability. Figure 2 depicts the total time per nonlinear solve in a weak scalability study for the Antarctic ice sheet [10].

Figure 2. Total solve time for a series of increasingly higher-resolution Antarctic ice sheet simulations that are executed in a weak scalability study on four architectures. The chart provides the mean error bar, median, and lower/upper quartiles of each case; outliers are marked as circles. Figure courtesy of Jerry Watkins.

Computation of Partial Derivatives. To solve PDE-constrained optimization problems, we must compute partial derivatives of the PDE discrete residual and objective functional with respect to the solution and parameter fields; we do so via automatic differentiation (AD). MALI utilizes C++ templates and operator overloading for AD through the Trilinos package Sacado [7]. It computes second derivatives (Hessians) as applied to vector \(v\):

\[\partial_{pp} \mathcal{J}(p,\cdot) v = \partial_{\varepsilon} (\partial_{p} \mathcal{J}(p + \,\varepsilon v,\cdot) |_{\varepsilon=0}.\]

In this case, \(\partial_{p p} \mathcal{J}\) is a second-order tensor — a matrix. Explicitly assembling the matrix when defining a new dot product for the optimization is useful. When the matrix’s pattern matrix is known (\(p\) is a finite-element field in our case, so the matrix has a typical finite-element pattern), we can efficiently compute the matrix via a coloring and seeding algorithm in the Trilinos package Zoltan2. Doing so requires a small number of Hessian-vector evaluations, equal to the number of colors in the matrix pattern.

Initialization Consistent with Climate Forcings

Thus far, our initialization does not account for internal ice sheet transients that are caused by the time-dependent response to climate forcing and dynamic changes in ice thickness. As soon as we run the ice sheet model forward in time, it will exhibit large nonphysical transients that typically last several decades and drive the ice sheet farther away from its optimal initialized state. The ice thickness is governed by the equation

\[\dot H = -\textrm{div}(\bar{u} H) + a,\]

where \(H\) is the ice thickness, \(\bar{u}\) is the depth-integrated velocity, and \(a\) is a source term that includes snow accumulation and melting. We improve our initialization process by simultaneously inverting for the basal friction field and the (often highly uncertain) bed topography field. We also add two misfit terms; one penalizes the difference between the flux divergence \(\textrm{div}(\bar{u} H)\) and the observed apparent mass balance \((a_{\textrm{obs}} - \dot H_{\textrm{obs}})\), and the other penalizes differences between the estimated and observed bed topography [6]. Figure 3 displays results for the Humboldt Glacier in Greenland, thereby demonstrating the approach’s effectiveness at matching the observed thickness rate of change.

Figure 3. Initialization of the Humboldt Glacier in Greenland. 3a. Flux divergence and ice surface velocity in meters/year for the standard initialization. 3b. Flux divergence and ice surface velocity in meters/year for the initialization with added bed topography control and matching climate forcing. 3c. Observed apparent mass balance and ice surface velocity in meters/year. Figure courtesy of the author.

Final Considerations

We conclude with a few considerations about the modeling choices that pertain to initialization, the limitations of this approach, and possible alternatives.

The choice of parameters for which to invert (e.g., basal friction and bed topography) as well as the misfit and regularization terms significantly impacts the results of ice sheet initialization. While certain methods (e.g., the discrepancy principle) and “best practices” can help, ultimately these modeling choices are typically based on expert judgment, trial and error, and available computational resources.
Our initialization approaches compute a steady-state temperature field as part of the optimization. One limitation of this approach—as well as the traditional “spinning up” method—is that it incorrectly assumes that the ice is at thermal equilibrium. We can mitigate this issue by incorporating internal temperature data (e.g., data from boreholes, combined observations of the thawed versus frozen extent of the ice bed interface, and radar attenuation) in our assimilation process.
All of our results are deterministic, and we account for uncertainties in the data by weighting the misfit terms on the basis of data uncertainty (data with high covariance are trusted and therefore weighted less than those with smaller covariance). A more rigorous approach to account for uncertainty involves the performance of Bayesian inversion [1, 5]. However, this technique is extremely challenging from a computational point of view due to the “curse of dimensionality.”
In order to compute an initial state that is consistent with observed rates of thickness change, we allow for adjustments in bed topography. Such adjustments reduce nonphysical transients that would otherwise appear during the first few decades of forward model integration if the initial state is not consistent with climate forcings. We must strike a balance between keeping the bed topography close to the observations—which are typically noisy and relatively sparse—and accurately matching the observed thickness rates of change. Doing so is a modeling choice, especially because the uncertainties on observed thickness transients and climate forcings are not well known.
Here we adopted a snapshot optimization, which assumes that all of the data are available at the same time. This is not the case in practice; in fact, some of the observations that we utilized were collected over several years, thus introducing another error—in addition to the observational uncertainties—that is hard to quantify. Transient optimization approaches, while more expensive, can mitigate this issue by incorporating data only when available [2].

Mauro Perego presented this research during a minisymposium at the 2021 SIAM Conference on Computational Science and Engineering, which took place virtually last year. This article updates and expands upon that content.

Acknowledgments: This work is the result of a collaborative effort with Luca Bertagna, Kim Liegeois, Eric Phipps, Irina Tezaur, Ray Tuminaro, and Jerry Watkins of Sandia National Laboratories; Trevor Hillebrand, Matthew Hoffman, and Stephen Price of Los Alamos National Laboratory; Georg Stadler of New York University; Alexander Heinlein of Delft University of Technology in the Netherlands; and Alessandro Barone of University Campus Bio-Medico in Italy. The SciDAC projects FASTMath and ProSPect—funded by the U.S. Department of Energy (DOE) Office of Science’s Advanced Scientific Computing Research program and Biological and Environmental Research program—provided support. Our research used resources from the National Energy Research Scientific Computing Center, which is a DOE Office of Science User Facility operated under contract no. DE-AC02-05CH11231, as well as the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the DOE Office of Science under contract no. DE-AC05-00OR22725.

Disclaimer: Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. This article describes objective technical results and analysis. Any subjective views or opinions do not necessarily represent the views of the U.S. Department of Energy or the U.S. Government.

References
[1] Brinkerhoff, D.J. (2022). Variational inference at glacier scale. J. Comput. Phys., 459, 111095.
[2] Goldberg, D.N., Heimbach, P., Joughin, I., & Smith, B. (2015). Committed retreat of Smith, Pope, and Kohler Glaciers over the next 30 years inferred by transient model calibration. Cryosph., 9(6), 2429-2446.
[3] Heinlein, A., Perego, M., & Rajamanickam, S. (2022). FROSch preconditioners for land ice simulations of Greenland and Antarctica. SIAM J. Sci. Comput., 44(2), B339-B367.
[4] Hoffman, M.J., Perego, M., Price, S.F., Lipscomb, W.H., Zhang, T., Jacobsen, D., … Bertagna, L. (2018). MPAS-Albany Land Ice (MALI): A variable-resolution ice sheet model for Earth system modeling using Voronoi grids. Geosci. Model Dev., 11(9), 3747-3780.
[5] Isaac, T., Petra, N., Stadler, G., & Ghattas, O. (2015). Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet. J. Comput. Phys., 296, 348-368.
[6] Perego, M., Price, S., & Stadler, G. (2014). Optimal initial conditions for coupling ice sheet models to Earth system models. J. Geophys. Res. Earth Surf., 119(9), 1894-1917.
[7] Phipps, E., & Pawlowski, R. (2012). Efficient expression templates for operator overloading-based automatic differentiation. In S. Forth, P. Hovland, E. Phipps, J. Utke, & A. Walther (Eds.), Recent advances in algorithmic differentiation (Vol. 87) (pp. 309-319). Berlin/Heidelberg, Germany: Springer.
[8] Tezaur, I.K., Perego, M., Salinger, A.G., Tuminaro, R.S., & Price, S.F. (2015). Albany/FELIX: A parallel, scalable and robust, finite element, first-order Stokes approximation ice sheet solver built for advanced analysis. Geosci. Model Dev., 8(4), 1197-1220.
[9] Tuminaro, R., Perego, M., Tezaur, I., Salinger, A., & Price, S. (2016). A matrix dependent/algebraic multigrid approach for extruded meshes with applications to ice sheet modeling. SIAM J. Sci. Comput., 38(5), C504-C532.
[10] Watkins, J. Carlson, M., Shan, K., Tezaur, I., Perego, M., Bertagna, L., … Price, S.F. (2022). Performance portable ice-sheet modeling with MALI. Preprint, arXiv:2204.04321.

Mauro Perego works at the Center for Computing Research at Sandia National Laboratories. His research mainly focuses on ice sheet models and touches on several aspects of scientific computing, including the discretization and solution of nonlinear partial differential equations, numerical optimization, and scientific machine learning.