Getting the Most Out of a High-End Computer
Related Articles |
In a recent interview*, Pittsburgh Supercomputing Center co-director Michael Levine addressed a discrepancy in the performance of two of the center’s supercomputing systems. One of them, a 10-teraflop/s Cray XT3, has a processor clock speed just about two and a half times faster than the other, a 6-teraflop/s LeMieux system—yet the XT3 outperforms LeMieux by as much as ten times for some applications. How does the PSC account for the gap in the performance of the two systems in such cases—and if the XT3 does indeed perform so much better, why even bother using LeMieux?
Levine listed three elements that contribute to the discrepancy: the differences in interconnect systems, memory bandwidth, and software between the two systems. In particular, he pointed out that the LeMieux operating system is responsible for supporting each processor individually. In the XT3, by contrast, the OS isn’t designed to support the machine as a whole—that task is mostly assigned to certain processors within the system, leaving the OS free to “concentrate” on calculating.
But that doesn’t mean that LeMieux is inferior to the XT3. LeMieux lags significantly behind in some situations, but might equal or outperform the XT3 for problems best solved by a system in which each processor operates independently.
Levine’s comments and the differences between the two supercomputers reflect a larger issue in high-performance computing: What defines performance? With terascale computing already a reality and petascale on the horizon, have flop/s-based performance metrics outlived their usefulness?
While the Top500 project still ranks the performance of supercomputing systems based on flop/s using a specialized version of the LINPACK benchmark, LINPACK creator and Top500 VIP Jack Dongarra is among the experts who no longer consider the benchmark a useful real-world test. (In fact, he says, “I was probably the first one to say that.”) The benchmark used in the Top500 ranking measures performance (in flop/s) in the solution of a dense system of linear equations. The version adopted by the Top500 project allows system users to optimize software and scale the problem to run efficiently on their machines, thus leveling the playing field somewhat.
Yet, as Dongarra points out, “at the end of the day, it’s not the speed of the computer, it’s the ability of scientists to get out of that machine what they’re trying to accomplish [that matters]. So the machines have to make it easy for scientists to gain insight into the problems they’re facing.” If a system rates high on the Top500 list for solving a dense system of linear equations but is inefficient at solving the main problems for which it was built, the performance numbers produced by the LINPACK benchmark would certainly not be the most important metric for rating the machine’s overall worth.
“LINPACK is something that has evolved over the last 32 years, and at one time it was perhaps a good thing, but systems have changed quite a bit in those 32 years,” says Dongarra. He concedes that the benchmark can still provide potentially useful information, “but it shouldn’t be used alone.”
So how should the performance of largescale systems be assessed? Dongarra stresses balance among a computer’s components.
“You’re going to build a machine that’s going to cost hundreds of millions of dollars, and whoever’s building that system has to build in a balance in terms of the memory, the amount of disks it has, and its capability of really solving petascale problems,” he says. “And the environment is more than just the hardware. The environment itself relates to the software, to the algorithms, to the applications. There’s sort of an ecosystem in place, and one needs to understand how all of those components interact in order to give the full picture.”
Could that full picture somehow provide a new metric for performance? Maybe. If the system isn’t balanced—if the ecosystem isn’t well designed or maintained—the efficiency of the machine will suffer.
“The ecosystem depends on the components working well together. So the best system would have a matched environment where everything fits in a way that they could all relate to each other very easily,” Dongarra says. “Unfortunately, we have a situation in most cases where the hardware is out of balance—far exceeding the capabilities of the rest of the system. We struggle with that aspect—we understand that [computer systems] have tremendous capabilities, but we struggle with ways of coming up with that performance.”
Efforts to improve efficiency in large-scale systems are nothing new, and scientists are increasingly focused on addressing the imbalances between system components as petascale computing moves closer to reality. Will the next generation of high-performance systems require fresh metrics to provide meaningful performance data? That’s an issue up for debate among computer scientists as the petascale era approaches.—MS
* HPCWire, week of June 9, 2006; www.hpcwire.com/hpc/686730.html.
« Back to September 2006 Index