# Lagrangian Data Assimilation in Ocean Modeling

In making predictions or estimations of the state of a system, in our case the ocean, uncertainty is derived from many sources. There are errors in the model, as it cannot reflect reality fully, but also in the observations, because of instrument inaccuracies, human error, and the processing of the information. It is particularly important to consider observational errors in ocean studies, in which so much of the data collection is indirect: Model variables, such as fluid velocity, are estimated from measurements made at sea-surface height; gliders measure temperature and density, but their locations are not known exactly.

Data assimilation is the procedure through which the “best estimate” of the state of the system is obtained from all the information available, including model computations and data analyses. In its most comprehensive expression, it is based on Bayes’ theorem and gives a full probability distribution function for the system state. It therefore captures all the uncertainty in the estimate or prediction, given the input information.

Much subsurface information in the ocean comes from instruments that, at least to some extent, “go with the flow.” *Floats* are pressurized to follow constant-density surfaces and record their position as a function of time through sonar pinging to (or from) a pre-placed receiver (or transmitter). Floats, however, can report their readings only at the end of the voyage, which may last up to a year or more. *Drifters* are instruments that move around on surface currents and are tracked through GPS. *Gliders*, the most modern observational instruments in this family, are designed to combine the strengths of floats and drifters—they use buoyancy changes to submerge and surface, and their frequent surfacings afford regular reporting of data through satellite communication.

The considerable benefit to be gained from assimilating Lagrangian data became clear when we were able to reconstruct a large eddy in a general circulation model of the Gulf of Mexico from a small number of well-placed floats. In general, we have discovered that the strategic placement of instruments can pay dividends in the form of accurate representations of the flow field. But the key float locations are strategic exactly because they are of dynamical significance in the Lagrangian flow. And it is the passage of trajectories either through or near these locations that causes the assimilation methods to fail.

Some approaches and answers are becoming available, but what is worth noting here is the big picture and its message for UQ and data assimilation. A lot is said about the “curse of dimensionality” in UQ. The difficulty is in sampling probability density functions in high dimensions: When there are too many dimensions to check, you never know whether you have found the peaks of the distribution. The main way to deal with this is to assume a Gaussian approximation for the PDF and a linear approximation for the underlying model. You then have to track only the mean and variance. These approximations form the basis of almost all methods used in high-dimensional problems, including Kalman filtering and variational methods, and they work very well provided that the assumptions of Gaussianity and linearity are good. But we know that for Lagrangian data assimilation, they completely miss the exact features that carry the key information.

Particle filters give very effective ways of sampling potentially multi-modal PDFs. But these statistical methods work most effectively only in low dimensions. Our perspective is that such an approach can work in Lagrangian data assimilation because the observational data is confined to a well-defined low-dimensional subspace – namely, that part of the augmented system that encodes the trajectory behavior.

Perhaps there is a moral here for UQ in general: You do not want to linearize genuine nonlinear effects; after all, you are then quantifying the uncertainty in an oversimplified, and incorrect, problem. If viewed in terms of the assimilation of data, it may be tractable to capture the “nonlinear” uncertainty if it is known, a priori, to come from a well-defined, low-dimensional subspace where the data lives. This is the case for Lagrangian-type data, but it may be true in other circumstances as well.

**Acknowledgments: **This article describes the joint work of Christopher K.R.T. Jones with Amit Apte (Tata Institute of Fundamental Research, India), Kayo Ide (University of Maryland), Damon McDougall (University of Warwick), Naratip Santitissadeekorn (UNC Chapel Hill), Elaine Spiller (Marquette University), Andrew Stuart (Warwick), and Guillaume Vernieres (National Aeronautics and Space Administration). The research is supported by the Office of Naval Research and the National Science Foundation.