# Quantifying Uncertainty in Numerical Analysis

My first encounter with the pairing of numerical analysis and uncertainty quantification goes back three decades. There was not yet a name to define the latter, and only few numerical analysts ventured into the world of probability. As a graduate student in the mid-1980s, I took some courses in statistics and intended to focus my Ph.D. thesis on the mathematics of experimental design, only to find my potential advisor leaving the institution. The search for a new topic and advisor ended successfully when Peter Henrici suggested that I examine the effect of round-off error on the computation of the fast Fourier transform using a probabilistic formulation. Round-off errors and the problems associated with finite precision arithmetic introduce a significant level of uncertainty in results of scientific computations. In the days before IEEE arithmetic, different computers followed different standards when it came to representing numbers in finite precision arithmetic, and it was normal for the same code to produce significantly dissimilar results when run on two different machines. The literature on probabilistic analysis of error in numerical algorithms at the time was rather limited, and definitely not considered part of mainstream numerical analysis.

My probabilistic foray ended shortly after the completion of my Ph.D., and I spent the next decade and a half working in numerical linear algebra, particularly on iterative solvers for large-scale inverse problems. Although probability did not play an active role in my research, the thought that it may be key to understanding the variability of the solution as a function of the random starting vector or the random perturbation of the data kept buzzing in my head. My initial encounter with Bayesian methodology for solving inverse problems was simultaneously exciting and frustrating: not only did it look like Bayesian methodology was the natural language to describe variability in the computed results, but it seemed that one could also use the Bayesian framework to make numerical algorithms take advantage of qualitative or partial information, or belief about the quantity to be computed. It would have been a *Eureka!* moment, except that I had no idea how to interface state-of-the-art numerical analysis and Bayesian inference. The challenge of designing computational tools for quantifying uncertainties in numerical analysis—and leveraging them to improve the computational algorithm—has since become the central theme of my research.

**Figure 1.**True and recovered figures.

**1a.**The original image.

**1b.**A blurred and noisy version of the original.

**1c.**MAP estimate of the hyperparameter.

**1d.**MAP estimate with conditionally Gaussian prior with inverse gamma hyperprior.

**1e.**MAP estimate with second-order smoothness prior.

**1f.**MAP estimate with white noise prior. Image credit: Daniela Calvetti.

Fortunately, when beginning to recast numerical analysis in a probabilistic setting, it suffices to have a few basic notions of probability and an open mind. The first step towards embedding numerical algorithms into a probabilistic framework is to model all quantities whose values are unknown as random variables, where the randomness is a way of admitting our ignorance about their values rather than a statement about the variables themselves. For example, when solving a linear system \(Ax=b\), where we assume that \(A\) is a known matrix and vector \(b\) is corrupted by some additive noise \(\epsilon\), we model \(x\) and \(b\) as random variables because we are unsure of their values. On the other hand, we probably know something about the nature of the noise and have at least some vague ideas regarding the properties of the vector \(x\) — it is very hard to know absolutely nothing!

Using Gaussian distributions \(x \sim {\mathcal{N}}(\overline{x},\Gamma)\), \(\epsilon \sim {\mathcal{N}}(0,\Sigma)\) to express what we know about \(x\) and \(\epsilon\) is a natural entry point. Since the additive noise model implies \(\epsilon = b-Ax\), if we know the value of \(x\), then \(b\) inherits the uncertainty coming from the noise contribution and can be described by a shifted Gaussian \(b \sim {\mathcal{N}}(Ax,\Sigma)\), called the likelihood. Once we consider the data \(b\), the idea or belief about \(x\) that we had a priori is likely to change in light of our observation; this leads to the posterior distribution of \(x\), which is the solution of the linear system in probabilistic terms. Bayes formula states that the posterior distribution is proportional to the product of the prior and the likelihood. If all we know about the noise and solution can be expressed in terms of Gaussian distributions, summarizing the posterior distribution by its mode amounts to computing its maximizer, the maximum a posteriori estimate \(x_{\rm MAP}\). Assuming for simplicity that \(b\) is corrupted by scaled white noise \(\Sigma = \sigma^2 I\),

\[ x_{\rm MAP} = {\rm argmax} \left\{ \exp\left( -\frac{1}{2\sigma^2} \|Ax-b\|^2 \right) \exp\left( - \frac 12 \|L(x-\overline x) \|^2 \right) \right\}, \]

where \(L\) is an invertible matrix related to \(\Gamma\). Equivalently, since \(x_{\rm MAP}\) is the minimizer of the functional

\[x_{\rm MAP} = {\rm argmin} \left\{ \|Ax-b\|^2 + \alpha \| L (x-\overline x)\|^2 \right\}\]

where \(\alpha=\sigma^2\), we can think of it in numerical linear algebra as the solution of a penalized least squares and—in the context of inverse problems—a Tikhonov regularized solution.

Do we gain anything new in the deterministic setting by recasting the solution of our problem in probabilistic terms? One advantage of describing the solution of the problems in terms of its posterior probability density is that in addition to summarizing it with a single point estimate, e.g., the mode, we can query the probability that realizations of the random variable are close to it. What is most exciting about the solution being a random variable is that there is great freedom in the assignment of the prior, which can be as specific or as loose as we believe the solution to be before considering the data. In particular, we can define prior distributions through stochastic simulations based on qualitative a priori beliefs that may be difficult to express analytically. I often think that in everyday life, we use a prior to state our opinion about events before gathering related data. Some priors are confirmed by data and others are in conflict with them. While the interpretation of the prior as a penalty term is currently very common regardless of the presence or absence of an underlying Bayesian framework, this was not the case in earlier days; a little over a decade ago, a referee commented that numerical linear algebra and Bayesian inference have absolutely nothing to do with each other!

**Figure 2.**Three realizations from three different priors used for the reconstructions in Figure 1. From left to right: white noise prior, second order smoothness prior, and conditionally Gaussian prior with inverse gamma hyperprior. Image credit: Daniela Calvetti.

After warming up with traditional smoothness priors reminiscent of Tikhonov regularization functionals, it may be tempting to advance to more complicated priors, where we retain the Gaussian formalism but assume that entries of the covariance matrix are unknown — hence modeled as random variables. For consistency, we express our beliefs about these new random variables before accounting for the data in the form of a probability density, referred to as hyperprior. Conditionally Gaussian priors are particularly well suited when looking for sparse solutions, with appropriate choices of hyperpriors. State-of-the-art numerical methods allow us to efficiently compute an estimate of the solution by an inner-outer iteration scheme. For some choices of hyperpriors, the results are remarkably similar to those obtained with sparsity-promoting regularization schemes, at a fraction of the computational cost.

Recently, I have applied these ideas to the reconstruction of brain activity from measurements of the induced magnetic field at the sensor of a magneto-encephalography device. In that context, where the number of unknowns is much larger than the number of measurements, utilization of qualitative a priori beliefs can have a major impact on the quality of the computed solution. Preconditioners for Krylov subspace iterative linear system solvers related to a priori information about the organization of neurons, first shown in Santiago Ramón y Cajal’s pioneering drawings, and belief in the sparsity of brain activity have produced highly accurate reconstructions at a very low computational cost. Powerful, state-of-the-art numerical analysis algorithms make it possible to compute the solution very efficiently. There is no question now that uncertainty is a crucial component of numerical computations; the challenge is using its quantification to design better numerical algorithms.