| June 08, 2021

Can Machine Learning Help Manage Groundwater in California?

The U.S. Department of Energy (DOE) is very interested in the development and improvement of machine learning (ML) and artificial intelligence (AI) techniques for scientific applications to ultimately advance science and energy research. However, many scientific applications pose a variety of challenges for off-the-shelf ML models. For example, datasets from DOE’s user facilities—such as the Advanced Light Source or the Molecular Foundry—are often massive and high dimensional. They thus require scalable ML models that can extract scientific insights. And despite the large number of data-poor applications wherein the observations of some or all relevant features are very limited and potentially noisy, one may still wish to use available data to make predictions and enable decision support. This is the case for publicly available, long-term and high-frequency groundwater level measurements in California.

From 2012 to 2016, California experienced an unprecedented multi-year drought. The decreasing trend of daily groundwater measurements at an observation well in Butte County, Calif., reflect the drought’s impact, as illustrated in the example time-series plot in Figure 1. In response, California legislators signed the Sustainable Groundwater Management Act into law in 2014. The act provides a framework with the goal of long-term water conservation and groundwater management.

Figure 1. Measurements of daily groundwater levels at an observation well in Butte County, Calif. (well ID 22N01E28J001M). The drought’s impact from 2012-2016 on groundwater levels is clearly visible in the decreasing trend during that time. Data courtesy of the California Natural Resources Agency.

However, the future availability and need for groundwater (rather than surface water) is hard to predict. Though mechanistic multiscale, multiphysics simulation models that estimate groundwater depths do exist, they require extensive characterization of hydrostratigraphic properties and boundary conditions for which the necessary data are often unavailable. Traditional time-series forecasting techniques—like regression methods and autoregressive integrated moving averages—are not applicable due to the non-stationary behavior of groundwater levels. This motivated our research to develop deep learning (DL) approaches for prediction. DL models are advantageous because they can learn complex functional relationships between input and output variables, such as weather and groundwater levels. Once trained, they can make predictions quickly, which allows for fast scenario analysis.

Although previous research has shown that some types of DL models provide outstanding performance for certain tasks in new applications—for example, convolutional neural nets (CNNs) for image data—one usually does not know which type of DL model and architecture (defined by hyperparameters like the number of layers and nodes, batch size, etc.) will yield the best results.

In order to make time-series predictions of groundwater levels at different well sites, we developed an optimization algorithm that automatically finds the best hyperparameters for different types of DL models. We first formulated a bilevel optimization problem, in which the upper-level objective function aims to optimize a performance metric over the hyperparameter space. Given a set of hyperparameters, the algorithm then optimizes the lower-level objective function by training the corresponding DL model via stochastic gradient decent (SGD). The solution of the lower-level problem allows us to evaluate the performance function at the upper level.

For simplification purposes, we formulate this problem as an integer optimization problem and assume that all hyperparameters can only take on a limited number of values. But even with this assumption, the number of possible DL model architectures is so large that we cannot try all of them in practice. For example, if we optimize seven hyperparameters in a CNN with a handful of options per hyperparameter, we quickly have over 30 million combinations. Dependent on the DL model size and amount of training data, the computing time for architecture training may be high. Therefore, efficient and effective optimizers that can quickly find near-optimal architectures are necessary.

For our groundwater prediction application, we implemented a derivative-free optimizer that uses a computationally inexpensive surrogate model to map the hyperparameters to their performance. Each iteration of the algorithm solves an auxiliary optimization problem on the surrogate model in order to select the next set of hyperparameters to try. The algorithm is adaptive in the sense that each time it obtains a new hyperparameter-performance pair, the surrogate model updates and the auxiliary optimization problem is solved anew. The auxiliary optimization problem strikes a balance between search space exploration and exploitation, thus allowing us to quickly identify hyperparameters with good performance. Figure 2 depicts a sketch of the algorithm’s process.

Figure 2. Illustration of the surrogate model’s adaptive sampling and updating that approximates the deep learning (DL) model performance in the hyperparameter space. 2a shows the initial experimental design with four different hyperparameter sets, 2b depicts the corresponding surrogate model, and 2c and 2d serve as updates of the surrogate model after sampling one new set of hyperparameters at a time.

We used this adaptive optimizer to find the best architectures of different DL model varieties and predict groundwater levels two years into the future. We compared the predictive performances of CNNs, long short-term memory networks (LSTMs), recurrent neural networks (RNNs), and multilayer perceptrons (MLPs). Each model utilized the same training features: daily observations of groundwater levels, temperature, precipitation, and discharge at a nearby river, as well as the week of the observation. We obtained the data from the California Natural Resources Agency and the California Data Exchange Center, and the hyperparameters that we optimized included the number of nodes, layers, and epochs; batch size; dropout rate; and a lag. The lag determines the number of past days that the algorithm uses to generate the next day’s prediction. When making future predictions for groundwater, we assumed that we had future knowledge of the other input features.

The results of our groundwater prediction study yielded the following conclusions:

In terms of prediction accuracy, the MLP and CNN performed best and achieved the smallest validation mean squared error (MSE).
The LSTM and RNN models often failed to successfully train when the lag’s value was too large, and their MSE was high when we restricted the maximum lag. Although existing literature often uses LSTMs for time series data, this model did not perform well for our application, indicating that examining different types of DL models is worthwhile.
The optimal lag for the MLP and CNN was greater than 300, suggesting that one needs a whole year of past observations to make accurate predictions into the future. Given California’s seasonal characteristics—dry summers and wet winters—this is a reasonable result.
The required computing time for the optimization varied largely. The CNN was the slowest model and required more than 20 hours to try 50 different architectures on a simple laptop. while the MLP was the fastest model and required an average of 4.5 hours to try 50 different architectures.
The stochasticity that arises from the use of SGD to train DL models can have a large impact on prediction reliability. In our groundwater study, the MLP typically had the lowest prediction variability; one should thus account for stochasticity when optimizing DL model architectures.
Different sized architectures can provide similar predictive performances, thus demonstrating the presence of local minima.
The optimal model architecture is sensitive to and depends on the data in question. Researchers should hence always consider hyperparameter optimization when working with a new dataset.

Figure 3. Future predictions of groundwater levels (red) at an observation well in Butte County, Calif. These predictions were made with an MLP model that was trained five times. Good agreement exists between the predictions and the truth (blue). We only used data from 2010 through 2016 for hyperparameter optimization.

Finally, Figure 3 illustrates the groundwater level predictions that we obtained by training the MLP model five times; these results affirm the impact of stochasticity. Although the MLP model has not seen the groundwater level data from 2016 to 2018, its predictions are close to the true data. We therefore conclude that one can achieve good predictions with the right model, and our hyperparameter optimizer is able to automatically discern the best model. More details and results are available in our paper.

Juliane Mueller presented this research during a minisymposium presentation at the 2021 SIAM Conference on Computational Science and Engineering, which took place virtually in March.

Acknowledgments: This work was supported by Laboratory Directed Research and Development (LDRD) funding from Berkeley Lab, provided by the Director, Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. The research was conducted through a collaboration between Berkeley Lab's Computational Research Division and the Earth and Environmental Sciences Area.

Juliane Mueller is a staff scientist in the Computational Research Division at Lawrence Berkeley National Laboratory. She received her Ph.D. in applied mathematics from Tampere University of Technology in Finland in 2012 and—after a postdoctoral appointment at Cornell University—joined Berkeley Lab in 2014 as an Alvarez Fellow in Computing Sciences. Mueller's research focuses on the development of derivative-free optimization algorithms for compute-intensive black-box problems that arise throughout the domain sciences.