# Happy to be a Mathematician: Remarks to the AAAS Section on Mathematics

*David E. Keyes of King Abdullah University of Science and Technology (KAUST) was recognized as a 2018 Fellow of the American Association for the Advancement of Science (AAAS) Section on Mathematics for his service to the mathematical sciences profession and fundamental research contributions at the interface of parallel computing and numerical analysis. The following remarks are adapted and excerpted from his induction speech at the 2019 AAAS Annual Meeting, held in Washington, D.C., this past February.*

As a mathematician, I consider myself to be one of most fortunate people in the world. In principle, anyone can access mathematics’ beauty. But some of us were blessed to be encouraged to pursue the subject in our youth and are now able to make a living by doing mathematics as a profession. While we learned to love mathematics for its beauty, many of us also love it for its utility and community.

What is more beautiful than mathematics? One finds elegance in the pleasing structure of a proof, built brick-by-brick by lemmas or stepwise by induction. In a world where falsehoods are readily blended and sold with truth, mathematics offers access to an inner sanctum of truth and a paradigm for applying logical thinking to all of life’s issues.

Beauty is also present in the magnificent structure of functions in the complex plane, so that an infinite real integral can be replaced with a set of infinitesimal contours around the poles of its integrand and evaluated based on a single term from an expansion at each pole. Equally breathtaking is the way in which an infinite polynomial—like the series for sine—converges everywhere on the real line to a number bounded in magnitude by one, something you would never guess if you camped on a large argument and watched the swinging term-by-term partial sums at the start.

Beauty is the readily visible or touchable riot of a Möbius strip or Klein bottle — one-sided surfaces embedded in higher dimensions. Another ultimate recreation for the mind is a space-filling curve, introduced as a mathematical curiosity by Giuseppe Peano in 1890 and David Hilbert in 1891. These one-dimensional curves get arbitrarily close to every point in a higher-dimensional space while recursively folding onto themselves so that the adjacent points in high-dimensional space are also near each other in arc length along the curve. Hilbert could never have guessed that over a century later, researchers would use these curves to lay out high-dimensional data in linearly-indexed computer memory to obtain proximity that is relevant to cache-based, distributed-memory computer architectures.

Speaking of computing, what about the mathematics of numerical analysis? We can now perform a quintillion—\(10^{18}\)—floating point operations per second in simulations that run for a week. After committing \(10^{24}\) rounding errors with respect to exact arithmetic (following the compromise that is floating point), we can confidently bound the error in the result to a tiny fraction of unity. But computers at that scale burn barrels of oil per hour, so let’s celebrate the beauty of algorithms that shave off full powers of the arithmetic complexity, which is the number of operations required to complete a computation.

- 1965: fast Fourier transform
- 1972: multigrid method
- 1986: fast multipole method
- 1991: sparse grids
- 1999: hierarchical matrices
- 2008: multilevel Monte Carlo methods
- 2011: randomized singular value decomposition.

What will be the next dramatic reduction in algorithmic complexity to reduce a polynomial-time task in the problem size to something linear or log-linear? I suspect that it will soon arise in the realm of deep learning. Stay tuned!

Let’s go back to hierarchical matrices, which were introduced 20 years ago by Wolfgang Hackbusch and contemporaneously by Eugene Tyrtyshnikov. \(H\)-matrices approximate a dense matrix of size \(N^2\) as a collection of blocks, almost all of which are representable through their singular value decomposition (or other cheaper means) to high accuracy with small rank. The result is a reduction of storage to \(O(rN\log N)\), where \(r\) is the effective rank of a typical small block of the original matrix, and a reduction in the cost of multiplication, inversion, and other traditional matrix operations to \(O(r^2N\log^2N)\).

Matrices with enough data sparsity to be compressed in this way are ubiquitous in practice. Hierarchical matrices are revolutionizing scientific computing and fitting unprecedentedly large problems into computer memory, which can never grow fast enough in our data-rich world.

This relates to my own research. My group is systematically introducing data sparsity into traditional linear algebra libraries and applications that rely on high-performance linear algebra. Examples of such applications include maximum likelihood estimation in the geospatial statistical prediction of weather or climate—like wind or solar insolation—to guide the placement of billions of dollars of renewable energy infrastructure.

In addition, Hessians for seismic inversion allow us to discover more oil until this renewable infrastructure is in place. Real-time adaptive optics help align mirrors of the largest land-based telescopes to correct for atmospheric turbulence. And density functional theory can design new materials. We work on these applications directly with scientists and engineers.

We also release open-source software—which has been picked up in the NVIDIA cuBLAS Library for millions of graphics processing units and the Cray Scientific and Math Libraries for hundreds of the top supercomputers—to reach a wider crowd.

The following four algorithmic premiums are important when migrating the world’s scientific software to exascale computing:

- Massive load-balanced concurrency with minimal communication between distributed memory nodes
- Massive concurrency
*within*the nodes for shared-memory cores - High arithmetic intensity, namely restriction to data that already exists in registers and caches, due to the thousand-cycle penalties of going to DRAM or off-chip
- Synchronization reduction; as we head towards
*one billion*one-gigahertz cores to achieve \(10^{18}\) operations per second, synchronizing through traditionally addictive algorithmic idioms can pace the entire computation by the slowest or most data-starved core.

Ph.D. students in my current group must aim to improve at least one of these four aspects of an algorithm that manipulates a matrix, traverses a mesh, operates on an unstructured particle swarm, or extends existing exascale algorithms to a new application. Seven of the inaugural 10 graduates’ first postdoctoral placements were funded by the U.S. Department of Energy.

So, how did I get here? I earned my bachelor’s degree in mechanical engineering and engineering physics, and wanted to work on problems related to energy resources and conversion in the oil-embargoed 1970s. I discovered that mathematical modeling was my favorite part of engineering, so I pursued a Ph.D. in applied mathematics. The first commercial parallel computers emerged around this time, which inspired my postdoctoral research in computer science. My first faculty appointment was in mechanical engineering, during which I taught graduate-level applied mathematics while beginning my research in parallel computational fluid dynamics; this has remained my main interest.

I am a computational scientist of mathematical origin from a time when computational science was not yet recognized. As a result, I had to grow into the field by starting over in multiple disciplines to learn applications, models, software, and hardware. Mathematics is a common thread and prerequisite.

I have co-originated and co-named two algorithms, both with Xiao-Chuan Cai (University of Colorado Boulder): (i) the Newton-Krylov-Schwarz algorithm, a domain-decomposed solver for nonlinear systems that arise in implicit discretizations of partial differential equations, and (ii) the Additive Schwarz preconditioned inexact Newton algorithm, a mostly asynchronous method for the same types of systems.

Like Tim Kelley (North Carolina State University), my colleague and fellow 2018 Fellow, I am heavily invested in Newton’s method. In fact, our co-authored paper on pseudo-transient continuation for nonlinear systems is among our most highly-cited articles.

This brings me back to community. Mathematicians can work alone, and some do with great success. But they can enrich their problem space and enhance their creativity by seeking out the “pathologies” of others based on applications and stresses from computer hardware.

We stand today on the threshold of the convergence of the so-called third paradigm of large-scale simulation and fourth paradigm of big data. I look forward to this convergence, as it will expand community as well as extend the imaginations and central importance of mathematicians.