# Color Image-driven Deep Autoencoders Predict Atmospheric Pollutant Patterns

For centuries, nature has helped scientists understand the laws that govern the physical world. While the traditional process of turning observations into physical understanding is often painstakingly slow, powerful new algorithms allow computers to learn physics more rapidly by observing images and videos.

We are leveraging this novel capability to predict spatial patterns that are influenced by wind-driven dynamics, including common geoscientific phenomena like algal blooms on the ocean’s surface; aeolian dunes; volcanic ash; wildfire smoke; and air pollution plumes that disperse liquid, gas, or dust from a specific source. Although satellites obtain images of these events on a daily basis, researchers have not yet utilized them to build predictive machine learning (ML) models.

**Figure 1.**Schematic progression of the proposed data preprocessing for machine learning (ML) training. Our work is the first step towards leveraging real-world images for plume prediction. Figure adapted from [1].

To establish a proof of concept, we trained machine learning (ML) models with simulated images (i.e., pixel information) rather than physical quantities—the equivalent of extracting training data from satellite images—in order to replicate the physics behind wind-driven spatial patterns (see Figure 1). We incorporated deposition data from a set of thousands of computational fluid dynamics Aeolus [2] simulations. To model dispersion within the atmosphere, we relied on the three-dimensional incompressible advection-diffusion equation with sources and sinks in a Lagrangian framework. The Lagrangian particle displacement due to advection, diffusion, and settling in the three coordinate directions is

\[dx_i={\tilde{u}}_idt+\ \frac{{\partial v_T}/{S_c}}{dx_i}dt+\sqrt{\frac{2v_T}{S_c}}dW_{x_i},\tag1\]

where \(\tilde{u}_i\) are the wind components in the \(x\), \(y\), and \(z\) projection directions. Here, \(v_T\) is the eddy diffusivity, \(S_c\) is the Schmidt number, \(dW_{x_i}\) are three independent normal random variates with zero mean and variance, and \(dt\) is the timestep of the Lagrangian particle’s advection. At any time \(t\), we can calculate both the concentration from the Lagrangian particle locations at that time and the contaminant mass that is associated with each particle. The deposition flux is the product of vertical deposition velocity and concentration.

Using this data, we implemented deep learning algorithms—specifically convolutional neural network (CNN)-based autoencoders—to predict the spatial patterns that are associated with plumes, which blow in many directions and originate from different locations (see Figure 2). CNNs find wide applicability in computer vision problems due to their ability to learn hidden nonlinear features in data without a formal feature extraction procedure. Instead, CNNs transform an input (e.g., an image) via a kernel or filter. Given a two-dimensional input \(x\) and kernel \(k\), we can obtain each pixel of output \(y\) with a convolutional layer:

\[y\left[i,j\right]=\left(x \ast k\right)\left[i,j\right]= \Sigma_n \Sigma_m k[n,m]x[i-n,j-m].\]

**Figure 2.**Schematic prediction pipeline.

**2a.**Full prediction model diagram that illustrates the bottleneck model, corrector, and decoder.

**2b.**Three examples of plume pattern predictions. The vertical arrows connect the pipeline stages to actual predictions. The ground truth comprises predictions from the Aeolus model. Figure adapted from [1].

*reduced space*or

*feature map*. Because a single-layer autoencoder with a linear activation function shares many similarities with principal component analysis (PCA), autoencoders serve as a nonlinear extension of PCA. Training a CNN involves finding the kernel coefficients that minimize the loss function, which measures the discrepancy between the CNN’s desired output and the obtained output \(y\).

We utilized an encoder to receive a color image and compress its dimensionality to 0.02 percent of the original size, then employed a decoder to recover the original image from the reduced space. Our present study seeks to predict static spatial patterns (i.e., patterns that do not vary with time), which is a simpler problem than what we hope to accomplish in the future. Moving forward, we intend to develop image-based, data-driven models that predict the spatiotemporal evolution of real-world plumes from direct camera observations.

*M. Giselle Fernández-Godino delivered a minisymposium presentation on this research at the 2023 SIAM Conference on Computational Science and Engineering (CSE23), which recently took place in Amsterdam, the Netherlands. She received funding to attend CSE23 through a SIAM Early Career Travel Award. To learn more about Early Career Travel Awards and submit an application, visit the online page.*

**References**

[1] Fernández-Godino, M.G., Lucas, D.D., & Kong, Q. (2023). Predicting wind-driven spatial deposition through simulated color images using deep autoencoders. *Sci. Rep.*, *13*, 1394.

[2] Gowardhan, A.A., McGuffin, D.L., Lucas, D.D., Neuscamman, S.J., Alvarez, O., & Glascoe, L.G. (2021). Large eddy simulations of turbulent and buoyant flows in urban and complex terrain areas using the Aeolus model. *Atmosphere*, *12*(9), 1107.

M. Giselle Fernández-Godino is a scientist in the Atmospheric, Earth, and Energy Science Division at Lawrence Livermore National Laboratory. Since the beginning of her appointment in 2020, her work has focused on the application of machine learning and uncertainty quantification approaches to the flow transport of hazardous materials and fusion energy design optimization. |