| May 01, 2023

Scientific Uses of Automatic Differentiation

Recent progress in machine learning (ML) technology has been nothing short of spectacular. The last two years alone have seen remarkable technological advances, including models that can win art competitions by generating images from text and chatbots with near human-level proficiency. At the heart of these advances is the ability to obtain high-quality solutions to non-convex optimization problems for functions with billions—or even hundreds of billions—of parameters. Many deep and surprising mathematical questions have emerged as part of this work. Among other topics, these queries pertain to the success of overparameterized models, the relative importance of local minima in the loss landscape, and the underlying theoretical basis for empirically determined optimizers. A fundamental theory is still missing, so we will no doubt be grappling with these questions for many years.

Nevertheless, the technological developments that have enabled ML advances also present an incredible opportunity for progress in classical applied mathematics problems. In particular, the increased proficiency for systematically handling large, non-convex optimization scenarios may help solve some classical problems that have long been a challenge. We now have the chance to make substantial headway on questions that have not yet been formulated or studied because we lacked the tools to solve them. To be clear, we do not wish to oversell the state of the art; algorithms that identify the global optimum for non-convex optimization problems do not yet exist. The ML community has instead developed efficient software tools that find candidate solutions, created benchmarks to measure solution quality, and cultivated a culture of competition against these benchmarks.

Adoption of this methodology may transform exciting novel insights into important classical problems. Here, we illustrate several new opportunities that exist because of the widespread, open-source deployment of effective software tools for automatic differentiation. While the mathematical framework for automatic differentiation was established long ago [4]—dating back at least to the evolution of adjoint-based optimization in optimal control [2, 6]—ML researchers have recently designed efficient software frameworks that natively run on hardware accelerators. These frameworks have served as a core technology for the ML revolution over the last decade and inspired high-quality software libraries such as JAX, PyTorch, and TensorFlow. The technology’s key feature is the fact that the computational cost of computing derivatives of a target loss function is independent of the number of parameters; this trait makes it possible for users to implement gradient-based optimization algorithms for functions with staggering numbers of parameters.

We outline two example problems in classical applied mathematics that are inspired by these new tools: (i) The development of numerical algorithms for the solution of nonlinear partial differential equations (PDEs) and (ii) the design of materials with novel functionality. In each of these problems, we begin with a dynamical system

\[dx/dt = f_\theta(x), \tag1\]

where \(x \in R^n\), parameterized by \(\theta \in R^m\). Both \(n\) and \(m\) are typically large, reflecting a high-dimensional optimization problem in a high-dimensional dynamical system. We then attempt to solve the optimization problem, which is characterized by a loss function \(L=L(x|\theta)\). The structure of this mathematical problem bears a strong resemblance to the neural network (NN)-based optimization problems in modern ML. Yet while it is sometimes natural to parameterize arbitrary functions in \((1)\) with NNs, \(\theta\) can parameterize any aspect of the system — from interaction potentials to interpolation weights in a numerical scheme.

Nonlinear Partial Differential Equations

Our first set of examples involves algorithms for the solution of nonlinear PDEs. Researchers usually derive such algorithms analytically; we typically adopt an appropriate spatial basis (e.g., polynomial, spectral, or Chebyshev) that yields a discretization of the equations in space, then further discretize the equations in time. Doing so gives rise to an update rule for the solution at the new time step. However, these algorithms are plagued with inaccuracy or instability if the spatial grid is too coarse; as a result, one must often resolve the solution’s finest features with many grid points (which increases the computational cost).

An alternative method for the derivation of update rules involves generating high-resolution data for the solutions of a particular nonlinear PDE, then algorithmically deriving the update rule as an optimization problem. This update rule differs from the classical algorithms in that it is equation specific; moreover, it can yield accurate results in a regime that is inaccessible to classical numerical solvers. For example, Figure 1 provides solutions from a solver that is trained on two-dimensional turbulence in the Navier-Stokes equations. We use an update rule that learns equation-specific interpolants to more accurately estimate unresolved quantities instead of directly averaging them. These interpolants are parameterized via NNs with initially unknown weights. We train the weights to match the solver’s accuracy on a 10 times coarser grid and time step than the baseline classical method, with an approximate 100-fold savings in computational cost.

Figure 1. Learned interpolation (LI) achieves the same accuracy as direct simulation (DS) at an approximately 10 times higher resolution. 1a. From top to bottom row: Evolution of predicted vorticity fields for reference (DS \(2,048 \times 2,048\)), learned (LI \(64 \times 64\)), and baseline (DS \(64 \times 64\)) solvers that start from the same initial velocities. The yellow box in each square traces the evolution of a single vortex. 1b. Comparison of the vorticity correlation between predicted flows, the reference solution for our model, and direct numerical simulation solvers. 1c. Energy spectrum scaled by \(k^5\) and averaged between time steps \(10{,}000\) and \(20{,}000\), at which point all solutions have decorrelated with the reference solution. Figure courtesy of [5].

This scenario is just one example of an algorithm that we can derive with a differentiable solver. Other possibilities include the parameterization of unknown physical effects or discrete algorithms that implement turbulence models [5] or dynamic boundary layers [1]. And algorithm development only scratches the surface of the realm of possibility. For instance, a recent study used a differentiable solver to find periodic orbits in a turbulent flow that were an order of magnitude more unstable than previously known [7]; this discovery provides the first compelling evidence that one can construct the statistics of a fully developed turbulent flow with a set of unstable periodic orbits.

Molecular Dynamics

The resourceful design of functional materials has much potential for the creation of novel technologies that range from maximum efficiency solar cells to self-repairing materials. A key challenge within this process is inverse design, i.e., finding the parameter settings for the materials as a solution to a non-convex optimization problem with an exponentially large parameter space. Molecular dynamics (MD) is a common tool for dynamic materials modeling, which simulates the collective motion of particles under classical mechanical forces. By simulating materials with MD, scientists have uncovered mechanisms for viral assembly and elucidated frictional forces in nanocrystals. However, designing and optimizing materials to have desired functionalities requires more than just simulation of these systems — researchers must also optimize over these simulations to find input parameters that reliably yield functional outputs. Efficient automatic differentiation algorithms have enabled the development of fully differentiable MD simulators like JAX MD, which allow users to directly differentiate through entire dynamical trajectories. We can explore a much broader design space of functional materials as a result.

An illustrative set of examples arises from the field of self-assembly, which involves the design of interactions among particles so that they will spontaneously form complex structures if randomly thrown together. This property is critical to biological function, from organelles in the cell to embryonic development. Equilibrium self-assembly, which designs interactions to achieve desired equilibrium targets, has experienced significant progress. But biological systems do not rely solely on controlling the structure; they also depend on kinetics. Although kinetic features of self-assembly were long considered too complex for direct design, the ability to directly differentiate with MD simulations provides immediate access to the relationship between kinetics and tunable system properties. Such access makes it possible for us to create both structural features of self-assembly and kinetic pathways. For instance, we previously established quantitative control over the rate of bulk crystal formation and the transition rate between small clusters of particles [3].

Figure 2. Highly tunable transition rates in clusters of colloidal particles. 2a. Depiction of the doubly nudged elastic band method for finding the transition state between two local energy minima. Here we consider the transition between two seven-particle clusters. 2b. Absolute error as a function of target rate. Optimization achieves an error rate of 0.2 percent of the target rate. 2c. Target rates versus rates from the molecular dynamics (MD) simulation. The solid line indicates perfect agreement. Figure adapted from [3].

Figure 2 demonstrates quantitative control over the transition rate between clusters of seven unique particles that are interacting via the Morse potential. Figure 2a depicts the energy landscape of the clusters, from which we extract an approximate transition pathway (shown in black) with the doubly nudged elastic band method. We then estimate the transition rate between the two states from this approximate pathway. Since the transition rate calculation is fully differentiable, we can optimize the parameters of the interaction potential to directly tune the transition rates. Figure 2b charts the difference between the target and achieved rates in the optimization results, and Figure 2c displays the target rates versus the measured rates that stem from full MD simulations. The ability to tune transition rates provides a gateway into an entirely new materials design space; automatic differentiation in the context of dynamical systems has already changed the scope of the material properties that we can generate.

Discussion

The ML community has developed efficient software tools to search for minima in non-convex landscapes, as well as benchmarks to measure solution quality. Both of these advances represent significant opportunities in applied mathematics. Though we have focused on the use of automatic differentiation within dynamical systems, an equally important lesson from the ML revolution pertains to the construction of benchmarks and community efforts to improve them. An appropriately designed suite of benchmarks could precipitate immense progress in the computational design of materials and algorithms.

We have only explored the tip of the iceberg in the realm of automatic differentiation for dynamical systems. Current projects within our own research groups include the design of fluid microstructures to achieve target rheologies; chaotic mixers; swimmers in complex environments; catalysts that cause self-assembled materials to disassemble on cue; and models for tissue development wherein individual cells control their division and apoptosis rate through measurements of chemical and mechanical fields in their local environments. We hope that the coming years will match recent improvements in ML models with just as much progress in applied non-convex optimization problems for the sciences.

References
[1] Alieva, A., Hoyer, S., Brenner, M.P., Iaccarino, G., & Noorgard, P. (2023). Towards accelerated data-driven Rayleigh Benard convection simulations. Eur. Phys. J. E. Under review.
[2] Bryson Jr., A.E., & Ho, Y.-C. (1969). Applied optimal control: Optimization, estimation, and control. Waltham, Mass: Ginn and Company.
[3] Goodrich, C.P., King, E.M., Schoenholz, S.S., Cubuk, E.D., & Brenner, M.P. (2021). Designing self-assembling kinetics with differentiable statistical physics models. Proc. Natl. Acad. Sci., 118(10), e2024083118.
[4] Griewank, A., & Walther, A. (2008). Evaluating derivatives: Principles and techniques of algorithmic differentiation (2nd ed.). Philadelphia, PA: Society for Industrial and Applied Mathematics.
[5] Kochkov, D., Smith, J.A., Alieva, A., Wang, Q., Brenner, M.P., & Hoyer, S. (2021). Machine learning-accelerated computational fluid dynamics. Proc. Natl. Acad. Sci., 118(21), e2101784118.
[6] LeCun, Y. (1988). A theoretical framework for back-propagation. In D. Touresky, G. Hinton, & T. Sejnowski (Eds.), Proceedings of the 1988 connectionist models summer school (pp. 21-28). Pittsburgh, PA: Carnegie Mellon University.
[7] Page, J., Norgaard, P., Brenner, M.P., & Kerswell, R.R. (2022). Recurrent flow patterns as a basis for turbulence: Predicting statistics from structures. Preprint, arXiv:2212.11886.

	Michael P. Brenner is the Michael F. Cronin Professor of Applied Mathematics and Professor of Physics at Harvard University. He is also a research scientist at Google Research. Over the years, Brenner’s research has focused on understanding the limits and opportunities of mathematical modeling to better comprehend different aspects of the world.
	Ella M. King is a Ph.D. candidate in the Department of Physics at Harvard University, where she works with Brenner. She will begin a postdoctoral position as a Simons Junior Fellow at New York University in September 2023. King specializes in soft matter physics and computational materials design.