Differentiation With(out) a Difference

By Nicholas Higham

“The shortest and best way between two truths of the real domain often passes through the imaginary one,” mathematician Jacques Hadamard famously said. Here I will discuss an instance of this maxim that deserves to be better known.

A fundamental tension in numerical analysis is the interplay between truncation errors and rounding errors; this is particularly prevalent in the numerical approximation of derivatives. The forward difference approximation

\[f'(x) \approx \frac{f(x+h)-f(x)}{h}\]

has error \(O(h)\) for smooth \(f\), so we can make the error as small as we wish by choosing a small enough \(h\). But as numerical analysis textbooks explain, if we take \(h\) too tiny then rounding errors incurred in the \(f\)-evaluations—which are brought into prominence by subtraction in the numerator and the small denominator—begin to dominate. The \(h\) that minimizes the combination of truncation error and rounding error is of order \(u^{1/2}\), where \(u\) is the unit roundoff (\(u \approx 10^{-16}\) for IEEE double-precision arithmetic), and it gives an error in \(f'(x)\) of order \(u^{1/2}\).

Cartoon created by mathematician John de Pillis.

However, most textbooks do not explain that when \(f\) is a real function of a real variable, we can do much better by venturing into the complex plane. Provided \(f\) is analytic, the approximation

\[f'(x) \approx \mathrm{\mathop{Im}} \frac{f(x+\mathrm{i}h)}{h}, \tag1 \]

where \(\textrm{i}\) is the imaginary unit, has error \(O(h^2)\); \(h\) can be taken arbitrarily small without rounding errors vitiating the result. Indeed, the value \(h = 10^{-100}\) is used in this approximation at the National Physical Laboratory [2]. The approximation \((1)\) is called the complex-step approximation, and the first occurrence I am aware of is in a 1998 SIAM Review paper by William Squire and George Trapp [6]. The authors were inspired by the earlier work of James Lyness and Cleve Moler [5] on the use of complex variables to approximate derivatives.

Rounding errors still affect the complex step approximation, but they do not inhibit the approximation’s ability to achieve the expected accuracy given the uncertainty in the evaluation of \(f\).

As a simple example, for the function \(f(x) = \frac{\mathrm{atan}(x)}{1+\mathrm{e}^{-x^2}},\) a MATLAB experiment¹ yields the following results:

fd =
0.274623728154858 % Accurate derivative

fd_cs =
0.274623728154858 % Complex step with h = 1e-100.

fd_fd1 =
0.250000000000000 % Forward diff. with h = 8u.

fd_fd2 =
0.274623729288578 % Forward diff. with h = sqrt(8u).

How does the complex step approximation work? It basically carries out a form of approximate automatic differentiation, with \(h\) acting as a symbolic variable. For the process to work properly, it is essential that the computation of \(f\) itself not use complex numbers, as rounding errors introduced in the imaginary parts would disturb the process.

Squire and Trapp’s paper has garnered over 400 citations on Google Scholar — many from papers that develop more general versions of the approximation. For example, the complex step method is suggested in [4] for the approximation of gradients in the context of deep learning. I have used it to compute Fréchet derivatives of matrix functions [1]. It has also become popular for sensitivity analysis in engineering applications, particularly because its implementation is so simple [3].

The complex step method is an easy-to-appreciate example of the potential gains from “going complex.” In programming environments with built-in complex arithmetic, such as MATLAB or Julia, it is trivial to give it a try.

¹ The code used can be downloaded here.

References
[1] Al-Mohy, A.H., & Higham, N.J. (2010). The Complex Step Approximation to the Fréchet Derivative of a Matrix Function. Numer. Alg., 53, 133-148.
[2] Cox, M.G., & Harris, P.M. (2004). Numerical Analysis for Algorithm Design in Metrology. Software Support for Metrology Best Practice Guide No. 11. Teddington, U.K.: National Physical Laboratory.
[3] Giles, M.B., Duta, M.C., Mueller, J.-D., & Pierce, N.A. (2003). Algorithm Developments for Discrete Adjoint Methods. AIAA J., 4(2), 198-205.
[4] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA: MIT Press.
[5] Lyness, J.N., & Moler, C.B. (1967). Numerical Differentiation of Analytic Functions. SIAM J. Numer. Anal., 4(2), 202-210.
[6] Squire, W., & Trapp, G. (1998). Using Complex Variables to Estimate Derivatives of Real Functions. SIAM Rev., 40(1), 110-112.

Nicholas Higham is Royal Society Research Professor and Richardson Professor of Applied Mathematics at the University of Manchester. He is the current president of SIAM.