| June 01, 2015

New Mathematics for Extreme-scale Computational Science?

Let’s start with the good news: Mathematics continues to be the most important contributor to any work in large-scale computational science. This is so because computational complexity becomes ever more important with faster computers. Once systems are large enough, the best algorithms will always be the ones with the best asymptotic complexity. High-quality journals, such as the SIAM Journal on Numerical Analysis and the SIAM Journal on Scientific Computing, regularly publish papers in this area that advance the research frontier.

That said, many of these novel algorithms underperform by many orders of magnitude. Contrary to the belief of some in the mathematics community, what is needed is not simply a few extra weeks for converting Matlab to Fortran and MPI. Designing efficient HPC software requires extensive creative research. The deficiencies of currently available software, as outlined in the following list, are much more fundamental.

There is nothing so practical as a good theory,¹ but a misconception about the role of rigorous theory seems to have taken root in the math community. For example, a rigorous asymptotic bound of the form \(| e | \leq C~ h^p\) has only heuristic implications if we are assessing the quality of a discretization for all finite values of \(h\) (that is, in any practical computation). Such theorems are a poor basis for comparing one discretization to another of the same or even different order, as long as the constants remain unspecified. We need more extensive quantitative theory. In its absence, systematic numerical experiments become as important as or even more important than rigorous theory.
Some areas of contemporary applied mathematics have an underdeveloped tradition in systematic algorithmic benchmarking. This starts with the lack of generally accepted standard test examples, which means that the numerical cost of an algorithm (i.e., the number of flop/s required for a specific discretization or by a specific solver) is frequently left unquantified. Consequently, relatively inefficient algorithms sometimes remain in use even when better alternatives exist.
On modern computer systems, the traditional cost metric of numerical mathematics (i.e., the number of flop/s needed to solve a problem) increasingly fails to correlate with truly relevant cost factors, such as time to solution and energy consumption. It will be necessary to quantify much more complex algorithmic characteristics, such as memory footprint and memory-access structure (e.g., cache re-use, uniformity of access, utilization of block-transfers), processor utilization, and communication and synchronization requirements. These effects must be built into better complexity models—models that are simple enough to be used, but that capture the true nature of computational cost far better than a simple count of flop/s.
For extreme-scale computational science, we need a more systematic integrated methodology that we can use to engineer algorithms. Starting from a mathematical model, we want to be able to predict a priori the performance that can be achieved; afterward, we will evaluate the actual performance with respect to the prediction, accounting for the discrepancies. This must be done on all levels of the “simulation pipeline” – the mathematical model, the discretization, the solver, its sequential implementation, and eventually its parallelization.

It is important that this be seen not simply as tweaking a given algorithm to run fast on a particular architecture, but as true co-design. In particular, the process includes the design and development of the algorithms and data structures. If it is known that, say, a 2D multigrid Poisson solver reaches \(h^2\)-discretization accuracy in fewer than 30 operations per unknown, we must be able to justify the use of a more complicated discretization and more expensive solver for the same problem class. We may have good reasons, but such algorithmic choices must be based on clear arguments that account for the accuracy achieved relative to the cost.

From a talk given by Ulrich Rüde at the Exa-Scale Mathematics Workshop in Washington, DC, August 2013.

On the implementation side, even the sequential version of an algorithm often reaches only a fraction of the peak performance of a core. We should be able to explain why this is so. It may turn out that memory or communication bandwidth is the relevant bottleneck. Generally, theory must concisely quantify the performance bounds for a given computer system, and the design process must be based on a systematic accounting for the limiting resources. To achieve this, we need realistic a priori cost predictions throughout the development process. And in general, we should be more honest in assessing parallel performance. David Bailey’s “Twelve Ways to Fool the Masses …”² are still too much in use.

The deficiencies just listed are enough to drive a multi-decade mathematical research program. But underlying the deficiencies are some great opportunities in the form of novel mathematical research directions. Here are a few of them:

With \(10^9\) parallel threads (in future extreme-scale systems) we will have to avoid all unnecessary communication and synchronization. Research is already under way in some areas, including dense linear algebra, although the problem is wide open elsewhere, e.g., for iterative solvers. New asynchronous, communication-avoiding algorithms must be designed. Lower bounds must be found on the amount of communication/synchronization necessary for a particular problem. Chaotic relaxation strategies or stochastic and nondeterministic algorithms—possibly among the key innovations needed for extreme-scale computing—could also provide greater robustness and built-in fault tolerance overall.
Extreme-scale systems will provide the computational power to move from qualitative simulation to predictive simulation, and from predictive simulation to optimization, parameter identification, and inverse problems; they will make stochastic simulations possible and allow us to better quantify uncertainties.
Extreme-scale computing will enable us to bridge the gap between the mesoscale and the human scale. “Mesoscale” refers to certain physical scales, such as a cell in a biological system, a grain of sand, or a pore in an aquifer. A living human has around \(10^{11}\) neurons and \(10^{13}\) red blood cells; a pile of sand may consist of \(10^{10}\) grains. The mesoscale is halfway between the atomic and the human scales. Mesoscale computing entails dealing with large numbers of objects, but such ensembles may become tractable on extreme-scale systems – with \(10^{18}\) flop/s we can perform \(O(10^5)\) flop/s for each human blood cell.

Thus, the extreme scale may offer new possibilities for simulation science. To exploit this capability, however, we need new methods for modeling and simulating large mesoscopic ensembles for long enough times. New algorithms must be invented, new modeling paradigms devised. We also need new techniques for validation and verification: We are not interested in accurate predictions of each individual blood cell in a human being, but the ensemble behavior must be physically meaningful and must provide insight, e.g., physiological, beyond that offered by classical techniques. Such multiphysics scenarios and multiscale modeling paradigms will gain increased momentum with the advent of extreme-scale computing and will become even more interesting research topics.

In summary, I believe that the advent of extreme-scale computing is forcing mathematical scientists to address the growing performance abyss between existing mathematical theory and the practical use of HPC systems. Tweaking codes is not enough – we must look back and perform new analyses in areas in which we have not thought deeply enough, in order to develop a new methodology for interdisciplinary algorithm and performance engineering. Beyond this, extreme-scale computing opens fascinating new opportunities in fundamental research that far surpass increased mesh resolution. Opportunities for the development of asynchronous algorithms and large-scale mesoscopic modeling are just two examples.

¹ According to Kurt Lewin (1890–1947).

² A modernized version can be found here. For the original, see this link.

Ulrich Rüde is a professor in the Department of Computer Science at the University of Erlangen-Nuremberg.