SIAM News Blog
SIAM News
Print

Machine Learning and Dynamical Systems

By Qianxiao Li and Weinan E

Mathematical modeling of dynamical systems (DS) is a central goal of the quantitative sciences. Although machine learning (ML) technologies are modern inventions, the interaction of data and dynamics has a long history. One of data science’s first forays into modeling dynamics perhaps began with astronomy — particularly with Ptolemy’s archaic but instructive geocentric model of the cosmos, which culminated in Kepler’s laws of planetary motion. These laws ultimately laid the empirical basis for Newton’s landmark contributions. Since then, the interactions of DS and data science have matured in both breadth and depth. Recent innovations in ML—particularly deep learning (DL)—have yielded further insights into the connection between these fascinating fields. Here we introduce some recent lines of work at the intersection of ML and DS.

Three Types of Connections

To organize ideas, one can classify the types of interactions between ML and DS into three complementary directions: machine learning of dynamical systems, by (or via) dynamical systems, and for dynamical systems (see Figure 1 for some examples). The first direction is perhaps most familiar, as it concerns the question of how one can obtain mathematical models from observations of dynamical processes (much like the works of Ptolemy and Kepler). A key issue involves learning such dynamics from data while still retaining physical insights. The second direction pertains to the study of the theory and algorithms for modern ML methods that one applies to data from dynamical problems. Examples include the well-known recurrent neural networks (RNNs) and their extensions (such as long short-term memory networks (LSTMs) and gated recurrent units) as well as complex mechanisms (such as attention) whose theoretical understanding remains quite limited. Finally, recent work in the last direction shows that deep neural networks (DNNs) are themselves akin to DS; viewing them in this way allows one to employ dynamics-based mathematical ideas and tools to advance theory and algorithms for DL [3]. We now discuss selected works in each of these directions.

Figure 1. Examples of interactions between machine learning (ML) and dynamical systems (DS).

Machine Learning by Dynamical Systems

Despite widespread practical success, DL still requires further theoretical progress. For instance, researchers seek to develop a succinct mathematical setting with which to view DL that focuses on newly arising phenomena. As an example, one significant novel aspect of DL is the presence of compositional structures in the model; stacked layers achieve complexity through repeated composition. An important unresolved question explores the way in which this new compositional structure changes the model’s behavior in terms of approximation, optimization, and generalization.

DS theory offers a promising framework for carrying out such an analysis. The connection between DS and these compositional structures was first publicized in 2017 [3]; one can regard deep (residual) neural networks (NNs) as finite-difference discretizations of a continuous-time DS (see Figure 2). This outlook—which is popularized in ML literature as neural ordinary differential equations [2]—provides a convenient language with which to capture the key features of DL. Studies have explored the immediate consequences of this viewpoint in terms of network stability [4] and training methods [5, 6]. On the optimization side, these factors set forth the connection between DL and optimal control theory. In particular, researchers can regard the latter as a form of mean field optimal control; one can derive similar optimality conditions—such as Pontryagin’s maximum principle and the Hamilton-Jacobi-Bellman equations—in the mean field setting.

Another interesting angle is approximation theory, which focuses on how one can build a complex hypothesis space through composition or dynamics. Complexity in shallow NNs arises from the linear combination of a large number of adaptive basis functions (or neurons/nodes). DNNs seem to require another angle, and several relevant results appear in the continuous setting [7]. However, a comprehensive understanding of the approximation properties of compositional or dynamical hypothesis spaces is currently limited. In particular, a general characterization of the aspects that make a “nice” function for approximation through composition/dynamics remains an interesting open question. The resolution of this question might explain why the performance of deep models is very problem-dependent.

Figure 2. The continuous viewpoint of deep learning (DL). Similar to how researchers analyze solid and fluid mechanics in the continuum limit, one can take a continuum idealization of DL that regards the layer structure as a discretization of a continuous dynamical system (DS). Here, the fictitious “time” parameter represents a continuous analogue of layers and the dynamics model layer composition. Conversely, one can regard deep residual neural networks as discretizations of a continuous-time DS. Figure courtesy of Qianxiao Li.

Machine Learning for Dynamical Systems

A reverse scenario encourages researchers to comprehend the application of modern ML techniques to problems that entail dynamics, such as time series forecasting and sequence-to-sequence models for language or engineering applications. In the specific context of DL, a key mathematical problem involves elucidating the interaction of compositional or dynamical structures in DNNs as well as the dynamical structures that are present in the data or data generation process. This method offers researchers the unique opportunity to couple data and model structure in the analysis.

Consider RNNs, which are among the simplest ways to model sequential relationships. Yet many issues plague RNN applications in practice, especially their inability to handle long sequences. Researchers have made various improvements from a practical angle, including the now-popular LSTM. However, a complete understanding of the theory and limitations of recurrent architectures for time series modeling remains fragmented. This restraint is an obstacle for practitioners, who often must rely on trial and error to select the correct model architecture for the task at hand.

To address this issue, a functional approximation framework can serve as a general formulation on which one analyzes time series modeling via NNs [8]. Use of this framework proves that a “curse of memory” is associated with RNNs, even in linear settings. This finding parallels the investigation of the “curse of dimensionality” and demonstrates that both approximation and optimization become exceedingly difficult when memory increases in the target relationship. The concept of memory can be made mathematically precise in this context; Figure 3 provides a heuristic illustration. This framework’s ability goes well beyond recurrent NNs and can compare and contrast different architectures for time series modeling, such as the WaveNet.

Figure 3. Sequence relationships with long and short memory. In both cases, the input time series are identical: a smooth function of time that eventually stops varying. 3a. In the smooth and short memory case, the output time series is also smooth in time and stops varying shortly after the input does. This observance means that the output does not depend on the input’s values from the distant past. 3b. In the not smooth and long memory case, long memory occurs as the output time series continues to vary irregularly. One can concretely define the concepts of smoothness and memory according to this heuristic illustration. Figure courtesy of Qianxiao Li.

Machine Learning of Dynamical Systems

Arguably the most studied interaction between ML and DS is the notion of learning dynamics from data. With the rapid adoption of data-driven methods in both computational and experimental sciences, this subject is becoming an increasingly important area of ML application. We subsequently focus on two dominant approaches that allow one to build dynamical models from data. The first is a statistical approach that uses generic model hypothesis spaces—e.g., sparse regression with polynomial functions—to regard recovery of a dynamical model as a regression problem [1]. One can further impart physical structure like symmetry and invariance to these models. The second technique is a modeling approach, which is more familiar in the fields of science and engineering. Here we derive a model space based on physical understanding of the dynamical process; data and learning are meant to fix unknown parameters and functions in the model parameterization. The key contribution of NNs is to approximate the unknown functions — e.g., free energy, Hamiltonian, or response coefficients. Examples of this approach include inverse problems with physics-informed NNs [9] and dissipative systems via the Onsager principle [10]. Although these methods differ in principle, they all lead to interesting interactions between learning and physics. The theoretical and algorithmic intricacies of these approaches thus constitute an active area of research.

Outlook

Much is still unknown about the interface of dynamics and learning in each of the three aforementioned directions. A worthwhile overall question is as follows: Why might connecting dynamics and learning be fruitful to the research in each domain? Indeed, while the dynamical viewpoint of learning provides a familiar mathematical setting, it also captures certain key novelties that pertain to modern ML. In fact, it may provide an avenue wherein researchers can concretely explore why and when deep is better than shallow when it comes to NNs, or why certain recurrent architectures are better than others. Such progress will naturally inspire the design of better models that learn dynamics from data and balance approximation flexibility with the retention of physical insights. The system will in turn benefit one’s overall understanding of dynamical processes through data. Ultimately, this line of work can help facilitate the principled adoption of ML in science and engineering workflows.


References
[1] Brunton, S.L., Proctor, J.L., & Kutz, J.N. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems. PNAS, 113(15), 3932-3937.
[2] Chen, R.T.Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D.K. (2018). Neural ordinary differential equations. In Advances in neural information processing systems 31 (NeurIPS 2018) (pp. 6571-6583). Montreal, Canada.
[3] E, W. (2017). A proposal on machine learning via dynamical systems. Comm. Math. Stat., 5(1), 1-11.
[4] Haber, E., & Ruthotto, L. (2017). Stable architectures for deep neural networks. Inverse Prob., 34(1), 14004.
[5] Li, Q., Chen, L., Tai, C., & E, W. (2017). Maximum principle based algorithms for deep learning. J. Mach. Learn. Res., 18(1), 5998-6026.
[6] Li, Q., & Hao, S. (2018). An optimal control approach to deep learning and applications to discrete-weight neural networks. In Proceedings of the 35th international conference on machine learning (ICML 2018), Vol. 80 (pp. 2985-2994). Stockholm, Sweden.
[7] Li, Q., Lin, T., & Shen, Z. (2019). Deep learning via dynamical systems: An approximation perspective. J. Eur. Math. Soc. To appear.
[8] Li, Z., Han, J., E, W., & Li, Q. (2021). On the curse of memory in recurrent neural networks: Approximation and optimization analysis. In International conference on learning representations (ICLR 2021). Vienna, Austria.
[9] Raissi, M., Perdikaris, P., & Karniadakis, G.E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 378, 686-707.
[10] Yu, H., Tian, X., E, W., & Li, Q. (2020). OnsagerNet: Learning stable and interpretable dynamics using a generalized Onsager principle. Preprint, arXiv:2009.02327.

Qianxiao Li is an assistant professor in the Department of Mathematics at the National University of Singapore and a research scientist in the Institute of High Performance Computing under the Agency for Science, Technology and Research. He is interested in theory and algorithms on the interface of machine learning and dynamical systems, as well as their applications to science and engineering. Weinan E is a professor in the School of Mathematical Sciences at Peking University and a professor in the Department of Mathematics and Program in Applied and Computational Mathematics at Princeton University. His main research interests are in numerical algorithms, machine learning, and multi-scale modeling with applications to chemistry, material sciences, and fluid mechanics. In recent years, E has also begun to explore control theory and related fields.

blog comments powered by Disqus