By Sebastian Reich and Andrew Stuart
Part II of this article was published in the November issue of SIAM News.
The seamless integration of large data sets into sophisticated computational models provides one of the central challenges for the mathematical sciences in the 21st century. When the computational model is based on dynamical systems and the data set is time ordered, the process of combining models and data is called data assimilation. The assimilation of data into computational models serves a wide spectrum of purposes, ranging from model calibration and model comparison, all the way to the validation of novel model design principles.
Historically, the rise of numerical weather prediction (NWP) in the 1950s played a major role in germinating the field of data assimilation. The computational models that were employed subsequently demanded algorithms for determining initial model states from available observations. Such a task falls naturally within the realm of ill-posed inverse problems , with the important caveat that Tikhonov-type regularizations have to be consistent with the underlying model dynamics ; indeed it was discovered that forecast skill could be dramatically improved by explicitly including the NWP models into the data assimilation cycle . The data assimilation technique associated with this viewpoint is still widely used in operational weather forecasting, and, collectively, the methods are referred to by the synonym 4DVAR (standing for four dimensions—three space plus time—and a cost functional to be minimized). The 4DVAR methodology fits into the framework of Tikhonov regularized inverse problems where the regularization term on the initial condition is balanced by a term reflecting faithful reproduction of the model dynamics.
A second class of algorithms widely used by the NWP community are the Kalman filter-type methods emerging from the control community [6, 7]. Kalman’s work has been enormously influential, constituting an early systematic development of a methodology to combine models and data for dynamical systems; it is applicable to linear problems subject to additive Gaussian noise.
An early suggestion to use the Kalman filter in the solution of linear PDEs arising in the atmospheric sciences is . Early extensions of the classic Kalman filter to nonlinear systems include the extended Kalman filter . However, computational expense, together with the strong nonlinearity of atmosphere-ocean dynamics, prevented an operational implementation of the extended Kalman filter. Instead, operational weather centers implemented a greatly simplified version of the Kalman update equations by cycling the 3DVAR methodology [9, 11]. In this method, the data is incorporated sequentially at each fixed time so it is optimized over three space dimensions. Structurally this cycled 3DVAR looks like an extended Kalman filter-type update, but with a fixed covariance structure to weight the model reliability versus that of the data.
Steadily increasing computer power eventually allowed for an extension of this cycling approach by combining it with ensemble prediction, which became prevalent in the NWP community in the 1980s . In this approach, rather than making a single best weather forecast, an ensemble of forecasts is made and their variability is used to weight the reliability of the model in comparison with the reliability of the data. Current ensemble-based data assimilation methodologies rely on linear regression to combine forecast uncertainties and observations and are collectively termed ensemble Kalman filters (EnKFs) . Employing EnKFs in operational data assimilation systems has led to improved forecast skill compared to the simplified 3DVAR approach (see Figure 1). While EnKFs provide an elegant extension of the classic Kalman filter to the highly large-scale non-Gaussian and nonlinear NWP models in use today, the underlying linear regression ansatz also places limitations on their ability to predict, for example, extreme meteorological events.
Figure 1. Improvement of forecast skills for temperature in the southern hemisphere through the use of an ensemble Kalman filter data assimilation system at the German Meteorological Service (DWD). Image credit: Roland Potthast (German Meteorological Service & University of Reading).
Current research activities in data assimilation for NWP focus on expanding the range of observational systems (see Figure 2) and on merging 4DVAR with ensemble prediction systems on the one hand, and sequential Monte Carlo methods with EnKFs on the other. Practical challenges for such extensions arise, for example, from the relatively small affordable ensemble sizes (on the order of 100) and the presence of spatially and temporally correlated data and model errors that cannot be easily represented by standard stochastic processes.
Figure 2. Range of observational systems that deliver data to numerical weather prediction systems. Image credit: Roland Potthast (German Meteorological Service & University of Reading).
The field of petroleum reservoir simulations has also led to innovation in the development of data assimilation methods; in that context, there is a strong focus on combined model state and parameter estimation . With reservoir modeling parameters (eg. permeability) often being hugely uncertain, data assimilation and uncertainty quantification become even more challenging for petroleum reservoir engineering.
This article is based, in part, on a lecture delivered by Stuart at the 2015 SIAM Conference on Applications of Dynamical Systems, held in Snowbird, Utah. In part II, published in the next issue, the authors explain why data assimilation is ripe for development by the mathematics community.
Acknowledgments: Andrew Stuart is grateful to EPSRC, ERC, and ONR for financial support that led to the research underpinning this article. Sebastian Reich acknowledges support under the DFG Collaborative Research Center SFB1114: Scaling Cascades in Complex Systems.
 A. Bennett, Inverse Modeling of the Ocean and Atmosphere, Cambridge University Press, New York, 2002.
 H.K. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer, Dordrecht, The Netherlands, 1996.
 G. Evensen, Data Assimilation: The Ensemble Kalman Filter, Springer, New York, 2006.
 M. Ghil, S.E. Cohn, J. Tavantzis, K. Bube, and E. Isaacson, Application of estimation theory to numerical weather prediction, in Dynamic Meteorology: Data Assimilation Methods, L. Bengtsson, M. Ghil and E. Källén, eds., Springer, New York, 139-224, 1981.
 A.H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, Waltham, MA, 1970.
 R. Kalman, A new approach to linear filtering and prediction problems, J. Basic Engineering, 82:1 (1960), 35-45.
 R. Kalman and R. Bucy, New results in linear filtering and prediction theory, J. Basic Engineering, 83:3 (1961), 95-108.
 E. Kalnay, Atmospheric Modeling, Data Assimilation and Predictability, Cambridge University Press, New York, 2003.
 A.C. Lorenc, S.P. Ballard, R.S. Bell, N.B. Ingleby, P.L.F. Andrews, D.M. Barker, J.R. Bray, A.M. Clayton, T. Dalby, D. Li, T.J. Payne, and F.W. Saunders, The Met. Office global three-dimensional variational data assimilation scheme, Quart. J. Royal. Met. Soc., 126:570 (2000), 2991-3012.
 D. Oliver, A. Reynolds, and N. Liu, Inverse Theory for Petroleum Reservoir Characterization and History Matching, Cambridge University Press, New York, 2008.
 D.F. Parrish and J.C. Derber, The national meteorological centers spectral statistical-interpolation analysis system, Mon. Weather Rev., 120:8 (1992), 1747-1763.
 P. Talagrand and O. Courtier, Variational assimilation of meteorological observations with the adjoint vorticity equation, I: Theory, Quart. J. Royal Met. Soc., 113 (1987), 1311-1328.