| November 24, 2021

Testing Methods for High-performance Computing Applications

During a minisymposium at the 2021 SIAM Conference on Computational Science and Engineering (CSE21), I held a question-and-answer session that was inspired by my article “Software as Craft.” After I asserted that modern software practice finds unit testing to be sufficient, someone wondered why this finding is not always true for high-performance computing (HPC) software. I admitted that my experience with HPC code had been the same, though I was not sure why. This response did not seem adequate, so I decided to investigate the underlying problems in greater depth.

I interviewed the original questioner as well as the Next Generation Simulation (NGS) Initiative team about the development of a new meshing software product called Morph. I also reviewed the CTH code with which I currently work. Here I discuss the very different situations and support requirements for three different codes.

Case 1: CTH

CTH is a legacy code with a moderate amount of regression testing. It was chronically underfunded for a while and testing was not a priority. Although this attitude is currently changing, the amount of time and effort that are required to add proper testing indicates that these costs were deferred, not removed. In fact, they are likely higher than the prospective cost of performing the testing during development. The overuse of global variables and extern declarations for functions and variables mean that any change requires a few minutes or several hours to confirm that no implicit declaration or linker mismatch is in use. One easy way to understand the testing in place is the via the Legacy Code Change Algorithm, courtesy of Michael Feathers.

Case 2: FLASH

FLASH is a legacy code with regression and library-integration testing. It is a multi-physics code that regularly incorporates new physics solvers. Because each type of physics must coexist and cooperate with all of the others on demand, FLASH testing has been a lengthy effort — which has been thoroughly documented. Because independent nonlinear solvers must work together, unit testing and some integration testing are both required to assure the team that the code is valid and yields verified solutions.

Case 3: Morph

Morph is a new code unit that was unit tested from the beginning. In a previous project, the Sierra ToolKit/Next Generation Platforms (STK/NGP) team developed their code from scratch and devoted a lot of effort to creating unit testing, following test-driven development, and using established SOLID principals. This approach was so successful that management assigned many of the same developers and team leads to the NGS initiative and insisted that they use the same techniques when developing Morph. According to a recent product owner, these codes’ most significant value is their ability to rapidly change without breakage due to the large inventory of unit tests.

Why do STK/NGP and Morph manage to function almost entirely with unit testing, but FLASH requires combinatorial regression testing? The answer to this question lives in the codes themselves. STK/NGP handles mesh import, field data, and topology, and Morph is focused on geometry modification and meshing. FLASH is a full hydrodynamics code and has 15 physics solvers listed on its website. Though the FLASH 3 release did significant work to create a modular and extensible code base, the nature of solving so many disparate physics problems in a single code means that one must employ regression testing to reveal unexpected emergent behaviors. At the other end of the spectrum, CTH utilizes regression testing to address the fact that proper modularity is still an ongoing effort. It may eventually require the same type of integration testing as FLASH, since the physics are distantly related.

How should I have answered that initial question at CSE21? I would say that some HPC codes still require integration/regression code suites due to the complexity of the physics that are involved and the high stakes of failure; furthermore, I should not have initially stated that unit testing alone could be enough. For example, STK/NGP and Morph are part of a larger code base called Sierra that provides a full set of integration/regression testing beyond the unit testing that those codes perform in their own spaces. My final takeaway is that designing a test suite is a proper software engineering task unto itself and relates directly to the verification and validation efforts that are so important in HPC.

Paul Wolfenbarger is a Principal Member of the Technical Staff at Sandia National Laboratories, where he works to facilitate software development in high-performance computing. He is interested in the history of energy policy and climate change in the context of public financing and regulation.