| October 01, 2020

Artificial Intelligence and High-Performance Computing: The Drivers of Tomorrow’s Science

Deep learning (DL)—a specific approach to artificial intelligence (AI) that is based on artificial neural networks—is now recognized as one of the biggest disruptive technologies of the 21st century. After researchers proposed the first neural network with electrical circuits in 1944, the popularity of neural nets rose and waned until recently. Now DL is beginning to revolutionize entire fields and find applications in every conceivable domain of science and engineering.

This transformation was ignited by decades-long advances in high-performance computing (HPC) and the emergence of graphics processing units (GPUs), which make DL feasible in practice. As we move into the next decade, HPC and AI continue to enable and drive each other at an exponential pace. HPC is pushing the limits of AI model complexity—comprised of the number of learned parameters and depth of the mathematical neural network layers used for predictions—for increased accuracy while simultaneously reducing training time. Meanwhile, AI is transforming the way in which we conduct science. The quest for knowledge used to begin with grand theories, and physics-based models still form the core of scientific understanding. But data-driven surrogate models are starting to outperform first-principles models in specific tasks, although researchers must address several issues—such as model interpretability and generalization—before such models become commonplace in our science toolbox. Here I summarize recent trends and key challenges that scientists face when using AI to enable scientific breakthroughs.

The most unsettling facet of DL is its lack of interpretability. AI surrogates replace highly nonlinear first-principle functions with approximations, which can be orders of magnitude faster than state-of-the-art simulation codes. However, the resulting surrogate model—which has millions or even billions of parameters—is often essentially a “black box,” in which case internal representations of the model’s learned data (i.e., feature extraction) are unknown. Scientists currently interpret neural networks by observing their external interfaces, not by understanding their internal representations. As a neural net gets progressively deeper and learns increasingly more parameters, it becomes harder for experts to interpret the resulting learned model. This begs the following question: In scientific computing, do researchers really need a billion parameters, especially since it is difficult to reason with these parameters and build confidence models? A promising approach to ameliorate this issue is the incorporation of domain knowledge—such as physical laws, conservation constraints, invariances, and symmetries—in network design and/or training. These domain- or physics-informed AI models can potentially require less data and handle more noise while maintaining or improving prediction accuracy by leveraging centuries of advances in scientific knowledge. Although black-box DL models alone are currently not replacements for first-principles models, they already perform remarkably well as preconditioners, software accelerators, and surrogates in complex scientific workflows (see Figure 1).

Figure 1. A sample comparison of the traditional physics solver simulation with a physical simulation and deep learning coupled framework. The latter’s objective is to produce the same output as the physics solver (i.e., respect the convergence constraints) while accelerating the convergence process via deep learning. Figure courtesy of [1].

The core of the DL-based approach to AI is data. A unique challenge of scientific DL is both the abundance and lack of readily available data (i.e., ground truth) for training neural nets. In some research areas, instruments commonly generate petabytes of data from one experiment. But in other areas that rely primarily on simulations, generating data to train deep models can be prohibitively expensive. Regardless of how one generates the data, the training process is time-consuming — even on the latest GPU hardware. Although continued advances in computer architecture will accelerate both data generation and training, they are unlikely to keep pace with AI developments. Therefore, researchers must investigate data-efficient learning for AI models. Since not all data are equally useful for a given learning task, the data’s nature can have a significant impact on the quality of the learned model. One solution is to curate application-specific datasets that can reduce the required data size to train an accurate model. However, while domain-informed AI models and application-specific training datasets can potentially advance model interpretability and reduce complexity, they may also deter generalization. Some key research directions for future exploration include understanding generalization limit tradeoffs versus AI surrogate interpretability, associating confidence intervals with predictions and decision-making to assess model robustness, and quantifying the uncertainty of input data from different scales and disparate sources.

The portability of applications and workflows across diverse and increasingly heterogeneous hardware platforms has been a decades-long challenge in scientific computing, and finding the optimal balance between performance and portability remains an active research topic. Driven by commercial AI applications, DL frameworks like TensorFlow, PyTorch, and Keras—which are built upon highly-optimized libraries and software—have risen in popularity and become the “norm” for neural net programming. They are easy to adopt and can abstract underlying evolving AI hardware, parallel algorithms, and architecture-specific optimizations. These frameworks also provide a higher level of programming abstraction, which benefits data-driven scientific computing.

However, this abstraction identifies a new requirement for AI-integrated scientific workflows. Existing infrastructure generally does not allow for the seamless integration of the aforementioned DL libraries and frameworks with data from different sources, physical simulations, modeling codes, and in situ analysis tools. Moreover, traditional workflows are based on a human-in-the-loop design that consists of hypothesis construction, repeated experimentation, and exploration of design space. Due to the growing application complexity and hardware diversity, an opportunity exists for AI-driven scientific workflows. Future AI might potentially tailor end-to-end application workflows for specialized hardware, with the ability to dynamically adapt as the data and models are refined. This can enable AI to leverage novel solutions that process data close to the source as part of application workflows, thus reducing time to solution.

Data cannot drive the next generation of science if it is inaccessible. The lack of open knowledge bases and curated repositories of data for science applications might arguably be one of the greatest limiting factors in progress when compared to other domains. There is presently no systematic way to contrast the different proposed models for solving the same science problem. An open knowledge base of data and models may both accelerate the pace of model development for scientific applications and allow for reproducibility while driving the design of new AI hardware. Applications such as autonomous vehicles, gaming, social networks, and e-commerce are the primary motivators of current AI hardware. Knowledge bases that are tailored for science with exemplar applications could potentially bridge this gap, ultimately allowing science to drive new AI technologies and co-design emerging AI architectures.

Here I describe only some of the core issues and opportunities that researchers may encounter when embracing the DL revolution. DL for science is still in its infancy, and plenty of questions will present themselves in the coming decades. Only time will reveal to what extent DL will augment or replace existing techniques for the solution of real and complex scientific problems.

References
[1] Obiols-Sales, O., Vishnu, A., Malaya, N., & Chandramowlishwaran, A. (2020). CFDNet: A deep learning-based accelerator for fluid simulations. ICS ‘20: Proceedings of the 34th ACM International Conference on Supercomputing.

Aparna Chandramowlishwaran is an associate professor of electrical engineering and computer science at the University of California, Irvine.