| January 17, 2017

Data-driven Discovery of Governing Physical Laws

Dynamical Systems and Machine Learning

By Steven L. Brunton, J. Nathan Kutz, and Joshua L. Proctor

Ordinary and partial differential equations are widely used throughout the engineering, physical, and biological sciences to describe the physical laws underlying a given system of interest. We implicitly assume that the governing equations are known and justified by first principles, such as conservation of mass or momentum and/or empirical observations. From the Schrödinger equation of quantum mechanics to Maxwell’s equations for electromagnetic propagation, knowledge of the governing laws has allowed transformative technology (e.g., smart phones, internet, lasers, and satellites) to impact society. In modern applications such as neuroscience, epidemiology, and climate science, the governing equations are only partially known and exhibit strongly nonlinear multiscale dynamics that are difficult to model. Scientific computing methods provide an enabling framework for characterizing such systems, and the SIAM community has historically made some of the most important contributions to simulation-based sciences, including extensive developments in finite-difference, finite-element, spectral, and reduced-order modeling methods.

The plummeting cost of sensors, computational power, and data storage in the last decade has enabled the emergence of data methods for the sciences. Such vast quantities of data offer new opportunities for data-driven discovery, referred to as the fourth paradigm of science [7]. Of course, data science is not new. More than 50 years ago, John Tukey envisioned the existence of a scientific effort focused on learning from data, or data analysis [5, 16]. Eventually two cultures, centered on the concepts of machine learning and statistical learning, emerged within the community of data scientists [1]. The former focuses on prediction, while the latter concerns inference of interpretable models from data. Both methodologies have achieved significant success across many areas of big data analytics. But these traditional approaches fall short of achieving a general goal for computationally-oriented scientists, which is inferring a (typically nonlinear) model from observations that both correctly identifies the underlying dynamics and generalizes qualitatively and quantitatively to unmeasured parts of phase, parameter, or application space.

Nowhere are these philosophical outlooks more clearly illustrated than in the historical developments concerning planetary motion and gravitation by Johannes Kepler (1571-1630) and Sir Isaac Newton (1643-1727). Both were leading figures of the scientific revolution, which many consider to have begun with Nicolaus Copernicus’s De revolutionibus orbium coelestium (On the Revolutions of the Heavenly Spheres). This work displaced the Ptolemaic doctrine of the perfect circle that had been the dominant predictive theory for nearly 1,500 years.

Figure 1. Using Tycho Brahe’s state-of-the-art data, Johannes Kepler utilized geometrical principles in Tabulae Rudolphinae [8] to discover that planetary orbits were actually ellipses. Figure credit: [8] (left) and Creative Commons (right).

Kepler was an early big data scientist. As an assistant to Tycho Brahe, he had access to the best and most well-guarded astronomical data collected to date. Upon Brahe’s untimely death, Kepler was appointed his successor with the responsibility to complete Brahe’s unfinished work. Over the next eleven years, he laid the foundations for the laws of planetary motion, positing the elliptical nature of planetary orbits (see Figure 1, on page 1). Newton built upon this work, proposing the existence of gravity in order to derive \({\bf F}=m{\bf a}\) and explain Kepler’s elliptic orbits. A cynic could argue that Newton provided nothing new in terms of prediction, just a more convoluted way to derive elliptical orbits through calculus. However, Newton used the inferred \({\bf F}=m{\bf a}\) to facilitate the development and characterization of new systems never before considered or observed. The discovery of this fundamental governing law was critical for technological development and enabled unprecedented engineering and scientific progress, such as sending a rocket to the moon.

The success of Newton’s calculus fueled the scientific revolution and led to many of the canonical models of mathematical physics, including the heat equation, wave equation, Poisson’s equation, and Navier-Stokes equations, among other significant developments. But for many modern applications, governing equations are often unknown or only partially known, and exhibit strong nonlinearities, parametric dependencies, multi-scale phenomena, intermittency, and/or transient behavior. Such systems often take the general mathematical form

\[ u_t = N (u, x, t; \mu).\tag1\]

The function \(N(\cdot)\) is an unknown right-hand side that describes the ordinary or partial differential equation in terms of \(u\), its derivatives, and parameters in \(\mu\). Our objective is to discover \(N(\cdot)\) given only time-series measurements of the system (see Figure 2). A key assumption is that the true \(N(\cdot)\) is comprised of only a few terms, making the model sparse in the space of all possible combinations of functions. For example, Burgers’ equation \(N=- uu_x + \mu u_{xx}\) and the harmonic oscillator \(N=-i \mu x^2 u - i \hbar u_{xx}/2\) each have only two terms. This is consistent with Occam’s razor: the most likely governing equation is the simplest one that works.

Figure 2. Data-driven discovery algorithm in which time-series data alone is used to construct the governing equations of the measured system. Image credit: Samuel Rudy, Steven L. Brunton, J. Nathan Kutz, and Joshua L. Proctor.

Naïve approaches to discovering \(N(\cdot)\) lead to a combinatorially large search through all possible models. To overcome this difficulty, researchers have developed methods spanning ideas from nonlinear regression to artificial neural networks [4, 6, 14, 17]. More recently, Michael Schmidt and Hod Lipson used a genetic algorithm to distill the free-form laws from measurements in a seminal contribution [13]. It is important to note that the most parsimonious model was selected by balancing model complexity and accuracy via Pareto analysis [10]. An alternative approach uses emerging sparse regression techniques to determine \(N(\cdot)\) without an intractable (\(np\)-hard) combinatorial brute-force search [2]. Specifically, a library \({\bf\Theta} ({\bf U})\) of candidate linear, nonlinear, and partial derivative terms for the right-hand side is constructed. Each column of \({\bf\Theta} ({\bf U})\) contains the values of a candidate term evaluated using the collected data. In this library, one can write the dynamics as

\[{\bf U}_t = {\bf \Theta} ({\bf U}) \boldsymbol{\xi},\tag2\]

where \({\bf U}_t\) is a vector of time derivatives of the measurement data and \(\boldsymbol{\xi}\) is a sparse vector, with each nonzero entry corresponding to a functional term to be included in the dynamics. Finding the sparsest vector \(\boldsymbol{\xi}\) consistent with the measurement data is now feasible with advanced methods in sparse regression, which makes it possible to find the most parsimonious model while circumventing the combinatorial search. Moreover, this approach has found success in a wide variety of ordinary [2, 9] and partial differential equation [11] settings. One can collect the time-series measurements from either an Eulerian framework where the sensors are fixed spatially, or in a Lagrangian framework where the sensors move with the dynamics [11]. This method is part of a growing effort to leverage sparsity in dynamical systems [3, 12, 15].

The initial success of these methodologies, including sparse regression and genetic algorithms, suggest that one can integrate many concepts from statistical learning with traditional scientific computing and dynamical systems theory to discover dynamical models from data. This integration of nonlinear dynamics and machine learning opens the door for principled versus heuristic methods for model construction, nonlinear control strategies, and sensor placement techniques. Additionally, these new model identification methods have transformative potential for parameterized systems and multiscale models where first principle derivations have remained intractable, such as neuroscience, epidemiology, and the electrical grid.

References
[1] Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199-231.
[2] Brunton, S.L., Proctor, J.L., & Kutz, J.N. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15), 3932-3937.
[3] Brunton, S.L., Tu, J.H., Bright, I., & Kutz, J.N. (2014). Compressive sensing and low-rank libraries for classification of bifurcation regimes in nonlinear dynamical systems. SIAM Journal on Applied Dynamical Systems, 13(4), 1716-1732.
[4] Crutchfield, J.P., & McNamara, B.S. (1987). Equations of motion from a data series. Complex Systems, 1, 417-452.
[5] Donoho, D.L. (2015). 50 years of data science. In Tukey Centennial Workshop. Princeton, NJ: Princeton University.
[6] Gonzalez-Garcia, R., Rico-Martinez, R., & Kevrekidis, I.G. (1998). Identification of distributed parameter systems: A neural net based approach. Computers & Chemical Engineering, 22, S965-S968.
[7] Hey, T., Tansley, S., & Tolle, K.M. (Eds.) (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. (Vol. 1). Redmond, WA: Microsoft Corporation.
[8] Kepler, J. (1627). Tabulae Rudolphinae, quibus astronomicae scientiae, temporum longinquitate collapsae restauratio continetur. Ulm, Germany: Vlmae, Typis J. Saurii.
[9] Mangan, N.M., Brunton, S.L., Proctor, J.L., & Kutz, J.N. (2016). Inferring biological networks by sparse identification of nonlinear dynamics. arXiv:1605.08368. To appear in IEEE Transactions on Molecular, Biological, and Multi-Scale Communications.
[10] Pareto, V. (1964). Cours d’e´conomie politique (Vol. 1). Geneva, Switzerland: Librairie Droz.
[11] Rudy, S., Brunton, S.L., Proctor, J.L., & Kutz, J.N. (2016). Data-driven discovery of partial differential equations. arXiv:1609.06401.
[12] Schaeffer, H., Caflisch, R., Hauck, C.D., & Osher, S. (2013). Sparse dynamics for partial differential equations. Proceedings of the National Academy of Sciences, 110(17), 6634-6639.
[13] Schmidt, M., & Lipson, H. (2009). Distilling free-form natural laws from experimental data. Science, 324(5923), 81-85.
[14] Sugihara, G., May, R., Ye, H., Hsieh, C., Deyle, E., Fogarty, M., & Munch, S. (2012). Detecting causality in complex ecosystems. Science, 338 (6106), 496-500.
[15] Tran, G., & Ward, R. (2016). Exact recovery of chaotic systems from highly corrupted data. arXiv:1607.01067.
[16] Tukey, J.W. (1962). The future of data analysis. The Annals of Mathematical Statistics, 33(1), 1-67.
[17] Voss, H.U., Kolodner, P., Abel, M., & Kurths, J. (1999). Amplitude equations from spatiotemporal binary-fluid convection data. Physical review letters, 83(17), 3422.

Steven L. Brunton is an assistant professor of mechanical engineering, adjunct assistant professor of applied mathematics, and a data science fellow with the eScience Institute at the University of Washington. J. Nathan Kutz is professor of applied mathematics, adjunct professor of physics and electrical engineering, and a senior data science fellow with the eScience Institute at the University of Washington. Joshua L. Proctor is an associate principal investigator with the Institute for Disease Modeling as well as affiliate assistant professor of applied mathematics and mechanical engineering at the University of Washington.