Climate Prediction, Exascale Computing, and Time Parallelism

By Paul Davis

New computing architectures promise to deliver more accurate weather predictions faster than ever before. But achieving dramatic reductions in wall-clock time using these new machines demands “disruptive algorithms,” according to Beth Wingate of the University of Exeter, who gave an invited talk at the 2017 SIAM Conference on Computational Science and Engineering (CSE17), held in Atlanta, Ga., this February.

Wingate and her colleagues are among the vanguard of computational scientists developing parallel-in-time algorithms. These algorithms augment the classic paradigm of spatially parallel computing—assigning each of a multitude of processors to compute simultaneously the governing model equations at one or more distinct nodes of the spatial grid—with the intuitively disruptive idea of simultaneous computation at multiple points in time. How indeed could time evolve in any manner but serially, one moment after another?

Suppose each point of a temporal grid were the starting time for a distinct initial-value problem with its own dedicated processor(s). One could solve that ensemble of problems simultaneously—parallel in time—matching the solutions sequentially at the temporal grid points by adjusting the initial value of the outgoing solution to match the final value of its incoming predecessor.

Although there could be some fearful devils hiding in the details of such matching (more about them later), real run times might be cut dramatically. A long sequence of tiny time steps could be folded over on itself many times, potentially reducing the wall-clock time required for the computation. Of course, the overhead of matching at the temporal folds and other particulars could eat up those real-time savings if they were handled carelessly.

But thoughtfully constructed algorithms that are parallel in time do have the power to cope with the multiple time scales inherent in models for climate and weather prediction; indeed, multiple time scales are a challenge common to many large-scale computational models. Phenomena that oscillate rapidly, such as ocean vortices, demand small time steps to accurately track their motion. Dissipative phenomena, like cooling, may need a small time step to track an initial rapid decline in temperature but a much larger step as the rate of cooling decreases.

Although these disparate time scales may only present themselves at a few locations within the geographic region being modeled, the constraint of the smallest time step can impose itself much more widely. In a further twist of the tiny-time-step dagger, finer spatial resolutions necessitate even smaller time steps in order to maintain numerical stability.

The practical outcome is discouraging. As Wingate observed, despite the promise in the newest machines of vastly increased numbers of processors with more concurrency—the potential to do millions of calculations at the same time—total clock time needed to compute current weather and climate models may not decline significantly; finer spatial grids and better resolution of oscillatory phenomena will force more smaller time steps to model the same period of time. While the results may offer better spatial and temporal accuracy, they might not reach an investigator’s desk any sooner — or demand less energy to compute.

Mathematical and computational ingenuity are essential to realizing the promise that parallel-in-time algorithms will reduce overall execution time by simultaneously taking many small time steps.

In 1964, Jürg Nievergelt suggested the first explicitly parallel approach to solving an initial-value problem for an ordinary differential equation [1], though he was obviously unaware of the challenges and opportunities posed by today’s most advanced architectures. He divided the required time interval into a series of coarse subintervals to permit a parallel, then a serial step: first compute in parallel a family of solutions for different initial values at the beginning of each subinterval, then sequentially interpolate solutions from one subinterval to the next to construct the solution over the original time interval.

The short time steps necessary for either numerical stability or accurate resolution of oscillations are still required to solve the coarse grid problems. Solving them simultaneously—in parallel—can save significant time if sequential matching at the grid points is not too costly.

Figure 1. A schematic of the parareal parallel-in-time process. The initial approximation to the solution of an initial-value problem on the coarse ΔT time grid is traced in pink. The subsequent corrections computed in parallel using the fine steps Δt are in blue. The improved coarse grid approximation is in black. Image adapted from [2].

The modern parallel-in-time paradigms developed by Wingate and her collaborators are rooted in a method called parareal, introduced in 2001 by Jacques-Louis Lions, Yvon Maday, and Gabriel Turinici. In simplest form, the parareal method also coarsely subdivides the original time interval, but it begins with an approximate solution computed on the coarse grid using the backward Euler method. These backward Euler values are the initial values for solution in parallel of each coarse grid initial-value problem. The backward Euler method propagates the discrepancy between the initial and final values at each subinterval interface. The coarse grid initial values are corrected accordingly, and these initial-value problems are solved again in parallel [1], as illustrated schematically in Figure 1. Ultimately, the iterative process of implicit solution and updating of initial values on the coarse grid can be reformulated as an incomplete Newton’s method.

These parallel-in-time approaches are not just intuitively clever. They are provably accurate; e.g., each repetition of the propagate-and-solve cycle of the parareal method increases its order of accuracy by one. A firm analytic foundation for these parareal-type methods is key to the approach that Wingate and colleagues such as Terry Haut of Lawrence Livermore National Laboratory [2] use to reduce overall run time for a set of one-dimensional shallow water equations, a standard test problem for potential weather and climate algorithms.

Asymptotically, as the ratio of slow to fast frequencies approaches zero, the fast frequencies can be swept out of the problem. Wingate represents that process with a time average of a version of the underlying nonlinear operator; she emphasizes that the operator, not the solution, is averaged. Roughly speaking, her team’s version of parareal uses the time-averaged operator/asymptotic approximation to provide the coarse time grid solution and an exponential integrator for the fine grid solution that corrects the coarse grid values. This algorithm can use coarse time steps that are as much as 50 times greater than the explicit time step limitation — and 10 times larger than the next best parallel method. More importantly, this approach incorporates the dynamic variations in time scales that are common in climate models; the ratio of slow to fast frequencies is not always small. A moving, variable-width time average window accommodates these situations.

Wingate’s student Adam Peddle has both analytic and computational evidence (presented in a minisymposium at CSE17) of a “Goldilocks sweet spot” in the number of parareal iterations versus the ratio of the width of the time-averaging window to the coarse grid spacing. When frequency ratios are not small, Peddle’s results show an optimum number of parareal iterations, a point where “the averaging mitigates the oscillatory stiffness.”

Wingate envisions an evolutionary integration of time parallelism into new machine architectures. “The immediate need is to get current models running on the new machines,” she said. “Time parallelism will be in the background.” Then new models, relatively simple at first, will become available to scientists studying simple, more fundamental problems. Today’s young researchers will carry forward those models with novel algorithms on new machines, thereby advancing the methods that underpin our most important applications.

Wingate’s CSE17 presentation is available from SIAM either as slides with synchronized audio, or as a PDF of slides only.

Acknowledgments: SIAM News would like to acknowledge Beth Wingate for her review of this article.

References
[1] Gander, M.J. (2015). 50 years of time parallel time integration. In T. Carraro, M. Geiger, S. Korkel, & R. Rannacher (Eds.), Multiple Shooting and Time Domain Decomposition. Springer International Publishing.
[2] Haut, T., & Wingate, B. (2014). An Asymptotic Parallel-in-Time Method for Highly Oscillatory PDEs. SIAM J. Sci. Comput., 36(2), A693-A713.

Paul Davis is professor emeritus of mathematical sciences at Worcester Polytechnic Institute.