SIAM News Blog
SIAM News
Print

SPPEXA: Software for Exascale Computing

By Severin Reiz and Hans-Joachim Bungartz

Figure 1. Approximation of high-dimensional problems based on sparse grids. In this case, the grids are two dimensional. Image adapted from [1].
In 2011, the German Research Foundation (DFG) established a nationwide Priority Programme — a coordinated funding scheme open to research consortia from all of Germany. The resulting program, called Software for Exascale Computing (SPPEXA), addresses fundamental research on various aspects of high-performance computing (HPC) software. SPPEXA involves 17 consortia, 39 research institutions, and 57 principal investigators that hail from the fields of computer science, mathematics, natural sciences, engineering sciences, and life sciences. Funding commenced in 2013, comprised two funding phases, and ended in April 2020. Overall funding from the DFG included 23 million euros.

When referring to exascale computing, one typically emphasizes the corresponding high-end computer systems for several reasons. First, the Top500 list of the 500 most powerful systems worldwide—which publishes twice a year and typically culminates in a “race to exascale”—is always celebrated as a huge competition between technological approaches, vendors, and nations. Second, repeatedly providing large-scale HPC capacities is considered a feat of scientific and economic strength. And third, despite their prices (or maybe even because of them), HPC systems are frequently easier to finance as one-time investments than scientific staff. Nevertheless, one can only exploit a computer’s theoretical performance in practice with efficient and scalable algorithms and a high-performance software ecosystem. Funding agencies in the U.S. and Japan quickly reacted to this challenge with their own respective programs; SPPEXA is the DFG’s response.

SPPEXA provides a holistic approach to HPC software. It ensures the efficient use of current and upcoming high-end supercomputers by exploring both evolutionary and disruptive research threads. The following six research directions are considered most crucial: (i) computational algorithms, (ii) application software, (iii) system software and runtime libraries, (iv) programming, (v) software tools, and (vi) data management. Computational algorithms—such as fast linear solvers or eigensolvers—are core numerical components of many large-scale application codes, including those driven by classical simulations or oriented towards data analytics. If one cannot ensure scalability for such core routines, the battle is already nearly lost. Application software acts as the “user” of HPC systems and typically appears as legacy codes that researchers have developed over years or even decades. Increasing the software’s performance via a co-design that combines algorithm and performance engineering and addresses both the “systems–algorithms” and “algorithms–applications/models” interfaces is vital.

Figure 2. Performance models generated for call paths in SWEEP3D, a neutron transport simulation. Image courtesy of [1].

Performance engineering cannot succeed without progress in compilers, monitoring, code optimization, verification support, and parallelization support (like auto-tuning); this dependence collectively underlines the importance of system software, runtime libraries, and tools. Programming—including programming models—is likely the research direction where the need for a balance of evolutionary research (improving and extending existing programming models, for example) and revolutionary approaches (exploring new programming models and language concepts, such as domain-specific languages) is most apparent. In addition, data management has always been relevant to HPC in terms of input/output or post-processing and visualization. It is therefore becoming increasingly important as more HPC applications are related to data.

Figure 3. Flow speeds in a mantle convection model that is computed with hierarchical hybrid grids. Image courtesy of [1].
To illustrate the wide spectrum of SPPEXA’s research impact, we highlight three exemplary subprojects:

Algorithms: An Exa-Scalable Two-Level Sparse Grid Approach for Higher Dimensional Problems in Plasma Physics and Beyond (EXAHD) supports Germany’s long-standing research in the use of plasma fusion as a clean, safe, and sustainable carbon-free energy source (see Figure 1). The EXAHD initiative aims to develop scalable, efficient, and fault-tolerant algorithms to run on distributed systems, thus advancing the progress of cutting-edge plasma fusion research.

Monitoring Tools: ExtraPeak’s research focuses on performance optimization of parallel programs by measuring and analyzing their runtime behavior (see Figure 2). Researchers utilized Scalasca, a tool developed by ExtraPeak, to optimize a Barnes-Hut algorithm variant for neurons; this mechanism connected SPPEXA to the Human Brain Project, an ongoing effort by the European Union to build a cutting-edge research infrastructure for neuroscience and computing.

Figure 4. More information about these projects and links to the code repositories are available here. Image provided by the authors.
Applications: Terra Neo addresses the challenges of understanding the convection in Earth’s mantle, which is responsible for most of the planet’s geological activity — from plate tectonics to volcanoes and earthquakes (see Figure 3). Due to the models’ sheer scale and complexity, the advent of exascale computing offers a tremendous opportunity for scientists to develop a greater understanding of the mantle. To fully utilize the forthcoming resources, Terra Neo is working to design new software with optimal algorithms that permit a scalable implementation towards exascale architectures. To the best of our knowledge, it holds the world record for a linear solver with 1013 unknowns.

As the first program of its kind, SPPEXA is unique in numerous ways. It is the first strategic Priority Programme, in that it was initiated by the DFG’s Board and did not originate via the standard bottom-up funding methods. SPPEXA also featured many more coordinated activities—such as workshops and doctoral retreats, for example—than is typical of Priority Programmes. It has truly been a multidisciplinary endeavor that involved topics and researchers from informatics, mathematics, and many areas of science and engineering (see Figure 4). Finally, SPPEXA’s creation marked the first time that a Priority Programme was synchronized with two other national funding agencies with bi- and trilateral consortia: the Agence Nationale de la Recherche in France and the Japan Science and Technology Agency.

Two culminating events in late 2019 demonstrated the broad impact of SPPEXA’s software-related research on all associated fields: (i) an international symposium in Dresden, Germany, at which all consortia presented their results — published in Springer’s Lecture Notes in Computational Science and Engineering [1], and (ii) a trilateral workshop in Tokyo, Japan, that focused on the convergence of HPC and data science. SPPEXA may have officially come to an end, but the exascale journey continues!


References
[1] Bungartz, H.-J., Reiz, S., Uekermann, B., Neumann, P., & Nagel, W.E. (Eds.) (2020). Software for Exascale Computing – SPPEXA 2016-2019. In Lecture Notes in Computational Science and Engineering (136). Cham, Switzerland: Springer.

Severin Reiz is a fourth-year Ph.D. candidate under the supervision of Hans-Joachim Bungartz at the Technical University of Munich (TUM), where he has worked as program manager of SPPEXA since December 2018.  His research is on high-performance computing (HPC) and machine learning algorithms, in collaboration with George Biros at the University of Texas at Austin. Hans-Joachim Bungartz is a full professor of informatics and mathematics at TUM, dean of TUM’s Department of Informatics, TUM Graduate Dean, and a board member of the Leibniz Supercomputing Centre. He is an initiator and coordinator of the SPPEXA program, and his research is on scientific computing with a focus on HPC.

blog comments powered by Disqus