| November 02, 2015

Applying Mathematics to Data Assimilation Methods

In part I of this article in the October issue, the authors described the use of data assimilation in numerical weather prediction.

The subject of data assimilation has been driven primarily by practitioners working in the geophysical sciences. However, the potential for application in all realms of science and engineering cannot be overstated. For this reason, the subject is ripe for development by the mathematics community [6]. The primary benefits of a discipline of this type are three-fold: (i) systematic development leads to clarity about the right questions to ask and distinguishes between generic algorithmic and mathematical questions, and application-specific ones; (ii) it leads to the possibility of importing algorithmic innovation from the computational mathematics community; and (iii) it allows for the exchange of ideas between different application areas through a common language. Of course, this perspective is not news to most SIAM News readers, but the value of mathematics as the language of science and engineering is always worth re-emphasizing.

A central transferable idea in this article is that, in many areas of applied mathematics, the data model and the mathematical model should be considered in conjunction. Thinking about a scientific or engineering problem in this way from the very start is certainly a nontraditional way of thinking, but we argue that it is, in many areas, the right viewpoint. In the context of data assimilation we thus consider a combined model for the signal with a model for the observation process. For expository purposes we consider a discrete time signal \(V_J = {\{\upsilon_l\}}^J_{l=0},\) given by

\[\upsilon_{j + 1} = \Psi(\upsilon_j) +\xi _j.\]

Here the model noise \(\{\xi_\ell\}^{J-1}_{\ell = 0}\) represents stochastic forcing to a deterministic evolution given by \(\Psi(\cdot)\); this stochastic forcing may or may not be included in the use of the model, depending on the setting. The mathematical model for the signal may have many centuries of intellectual development behind it (for example, in numerical weather prediction, or NWP) or may be the product of more recent application-driven needs (for example, in traffic flow). The level of confidence in the purely deterministic signal model will affect whether or not it is appropriate to include model noise in it. The observations \(Y_J = {\{y_l\}}^J_{l=1}\) are assumed to be given by

\[y_{j+1} = h(\upsilon_{j+1}) +\eta _{j+1}.\]

This equation typically will model the use of data acquisition instruments, which will, of course, be application specific. Here the observational noise \({\{\eta_\ell\}}^J_{\ell = 1}\) is almost always present because very few observing instruments are perfect. We can state two formulations of the data assimilation problem. The first is to find information about \(\upsilon_j\) given \(Y_j = {\{y_l\}}^J_{l=1},\) and to update this information sequentially as \(j \rightarrow j + 1;\) this is known as filtering. The second is to find information about \(V_J\) given \(Y_J\) for some given J; this is known as smoothing. Smoothing is more computationally demanding than filtering because it operates in a state space of dimension \(J + 1\) times that of filtering’s state space. While in the fully probabilistic model described below filtering and smoothing theoretically lead to the same result at \(j = J,\) current computational implementations of smoothing in the form of 4DVAR (standing for four dimensions—three space plus time—and a cost functional to be minimized) and filtering in the form of ensemble Kalman filters (EnKFs) often demonstrate that smoothing is more informed by the data than filtering is. However, EnKFs deliver an estimate for forecast uncertainties and do not require the computation of adjoint operators (and can thus be seen as derivative-free minimization methods). Merging the advantages of 4DVAR with those of EnKFs is currently a very active area of research in NWP.

Another important distinction is between deterministic and probabilistic methods. Deterministic methods for smoothing can be formulated through optimization as attempting to find the model and observational noise sequences that provide the best fit to the overall mathematical/data model. This leads to the 4DVAR objective function:

\[J(V_J) := \sum_{j=0}^{J-1} \bigl(|C^{-\frac{1}{2}} (v_{j + 1} - \Psi (vj))|^2 + | \Gamma^{-\frac{1}{2}} (y_{j+1} - h(vj + 1))|^2 \bigr),\]

which will typically be augmented with a regularization term for the initial condition, as discussed earlier. The covariance matrices \(C\) and \(\Gamma\) weight the relative confidence both in the mathematical model and the data. There are many variants on the above equation and, in particular, the singular limit \(C \rightarrow 0,\) where the model is thought to be noise free \((\xi_j \equiv 0),\) and hence optimization is over \(\upsilon_0\) only, is widely used. Deterministic methods for filtering also can be expressed in terms of optimization; 3DVAR type methods have the form:

\[\upsilon_{j+1} = \textrm{argmin}_\upsilon J_{j(\upsilon)}, J_j(\upsilon) := |C_j^{-\frac{1}{2}} (\upsilon - \Psi (\upsilon_j))|^2 + | \Gamma^{-\frac{1}{2}} (y_{j+1} - h(\upsilon))|^2.\]

As with smoothing, the choice of covariances \(C_j\) and \(\Gamma\) leads to a variety of different methods. A key question is whether such methods can reproduce accurate estimates of the true signal, even when initialized incorrectly. Figure 1 shows the output of a filter for the Navier-Stokes equation, and demonstrates how, in this case, the true signal (denoted by \(u\)) is recovered by the filter (denoted by \(m\)); three different Fourier components are shown, together with the \(L^2\) norm of the relative error. In all four figures synchronization between filter and truth may be observed. A rigorous dynamical systems perspective on synchronization was established in [4] in the noise-free model and data case. EnKF methods employ \(N\) copies of the above iterated minimization in parallel, and the covariance \(C_j\) is estimated empirically from an ensemble of forecasts. Such methods provide a transition from deterministic to probabilistic data assimilation techniques in that the ensemble information may be used as a surrogate for model uncertainty. Furthermore, such methods also lead to complex interaction between the different ensemble members and hence to interesting and challenging problems in random dynamical systems [7]. There is a great deal of opportunity for new research in this area.

Figure 1. Shows accurate filtering for the 2D Navier-Stokes equation. u denotes the true solution and m denotes the filter. The truth and filter are compared in several Fourier components, and in relative L² error. Several initializations of the filter are shown, indexed by (k). Figure credit: Kody Law.

More generally, probabilistic filtering methods concern approximation of the sequence of probability measures \(\mu_j(\cdot)={\mathbb P}(v_j \in \cdot | Y_j).\) There are various approaches, but the most prevalent for low-dimensional applications are SMC methods that attempt to approximate the probability distributions \(\mu_j\) by weighted sums of Dirac measures. This can be very hard to do in problems where the state space dimension is large or where the data is very informative [14, 15]. EnKF methods partially address the need to tackle such problems by employing linear regression during each data assimilation step, but rigorous analysis justifying their accuracy in practical scenarios (fixed, small ensemble size) is very much lacking and very much required. An interesting connection between probabilistic filtering and optimal transportation theory [12] provides an important conceptual foundation for the analysis of these problems.

The smoothing distribution requires study of the probability measure \(\mu(\cdot)={\mathbb P}(V_j|Y_j).\) This measure is on a space of dimension \(J + 1\) times that of the space where each measure \(\mu_j\) from filtering lives. As a consequence, it can be very difficult to study this probability measure accurately and efficiently. Monte Carlo Markov chain (MCMC) methods can be used in some cases, but they are primarily for model problems in benchmarking mode [8]; there remains a significant number of challenging questions in numerical analysis and statistics concerning how to make these methods accurate and efficient for high-dimensional applications [3].

Data assimilation is at a very exciting juncture for mathematical scientists. There are a plethora of applications in which dynamical models are confronted with significant data sets. The question of how to merge the dynamical model with the data to either estimate model states or model parameters—or to estimate both—is thus very timely.

In addition to the legacy applications in the geophysical sciences [11], for which data assimilation remains key, new areas include traffic flow [16, 5], neuroscience [1], personalized medicine [13], and power grids [2]. Furthermore, the subject has been application-led to date. The opportunity for mathematical scientists to systematize the field, develop and import new ideas and algorithms, and export these ideas into application domains old and new is a great one. The recent texts [9, 10, 12] provide introductions to the mathematical underpinnings of data assimilation. The field is one that will only grow in importance over the next few decades and is an ideal one for younger researchers in the mathematical sciences.

This article is based, in part, on a lecture delivered by Andrew Stuart at the 2015 SIAM Conference on Applications of Dynamical Systems, held in Snowbird, Utah.

Acknowledgments: Andrew Stuart is grateful to EPSRC, ERC, and ONR for financial support that led to the research underpinning this article. Sebastian Reich acknowledges support under the DFG Collaborative Research Center SFB1114: Scaling Cascades in Complex Systems.

References
[1] Abarbanel, H. (2013). Predicting the Future: Completing Models of Observed Complex Systems. New York, NY: Springer.

[2] Backhaus, S., & Chertkov, M. (2013). Getting a grip on the electrical grid. Physics Today, 66(5), 42-48.

[3] Cotter, S.L., Roberts, G.O., Stuart, A.M., & White, D. (2013). MCMC methods for functions: Modifying old algorithms to make them faster. Stat. Sci., 28, 424-446.

[4] Hayden, K., Olson, E., & Titi, E. (2011). Discrete data assimilation in the Lorenz and 2d Navier-Stokes equations. Phys. D, 240(18), 1416-1425.

[5] Herrera, J.C., & Bayen, A.M. (2010). Incorporating of Lagrangian measurements in freeway traffic state estimation. Transportation Res. Part B, 44, 460-481.

[6] Ide K. & Jones, C. (2007). Special issue on the mathematics of data assimilation, Phys. D, 230, vii-viii.

[7] Kelly, D.T.B., Law, K.J.H., & Stuart, A.M. (2014). Well-posedness and accuracy of the ensemble Kalman filter in discrete and continuous time. Nonlinearity, 27(10), 2579.

[8] Law, K.J.H., & Stuart, A.M. (2012). Evaluating data assimilation algorithms. Mon. Weather Rev., 140, 3757-3782.

[9] Law, K.J.H., Stuart, A.M., & Zygalakis, K.C. (2015). Data Assimilation: A Mathematical Introduction. New York, NY: Springer.

[10] Majda, A., & Harlim, J. (2012). Filtering Complex Turbulent Systems. New York: Cambridge University Press.

[11] Oliver, D., Reynolds, A., & Liu, N. (2008). Inverse Theory for Petroleum Reservoir Characterization and History Matching. Cambridge: Cambridge University Press.

[12] Reich, S., & Cotter, C. (2015). Probabilistic Forecasting and Bayesian Data Assimilation. New York: Cambridge University Press.

[13] Sedigh-Sarvestani, M., Albers, D.J., & Gluckman, B.J. (2012). Data assimilation of glucose dynamics for use in the intensive care unit. Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE, 5437-5440.

[14] Snyder, T., Bengtsson, T., Bickel, P., & Anderson, J. (2008). Obstacles to high-dimensional particle filtering, Mon. Weather Rev., 136, 4629-4640.

[15] Van Leeuwen, P.J., Cheng, Y., & Reich, S. (2015). Nonlinear Data Assimilation, Frontiers in Applied Dynamical Systems: Reviews and Tutorials, Volume 2. Berlin: Springer-Verlag.

[16] Work, D.B., Blandin, S., Tossavainen, O.P., Piccoli, B., & Bayen, A.M. (2010). A traffic model for velocity data assimilation, Applied Mathematics Research eXpress, 1, 1-35.

Sebastian Reich is a professor of Numerische Mathematik at the Institut für Mathematik at Universität Potsdam (Germany) and in the Department of Mathematics and Statistics at the University of Reading (UK). Andrew Stuart is a professor of mathematics at the Mathematics Institute at Warwick University (UK).