| December 15, 2021

Computational and Statistical Methods for Improving Urban Mobility with Mobiliti Data

By Serges Love Teutu Talla, Isabelle Kemajou-Brown, Cy Chan, and Bin Wang

Figure 1. Map of San Francisco Bay. Public domain image.

Transportation planners and managers play a key role in the management of traffic congestion — the most prevalent transportation challenge that affects urban mobility. Despite ongoing research on modeling efforts to improve traffic efficiency, numerous cities still face a heavy influx of vehicles, congestion, and traffic accidents. Previous researchers have estimated travel time in urban traffic via a model of a transportation network system with binary subsystems [4]. Although this parameter estimation does hold for non-overlapping routes, the method fails to account for the level of congestion. Furthermore, the unimodal and identical distribution for travel times does not explain the state of traffic congestion [2, 3].

We thus develop a multi-subsystem approach in which log-likelihood (LL) functions can be intertwined, semi-intertwined, or completely separate. For instance, a triple subsystem may incorporate three dimensions such as travel time, traffic flow, and traffic color code.

The simulation of large data sets to understand urban transportation’s complexity requires computationally expensive techniques for large-scale urban networks; some simulations can take days or weeks to complete. Our goal is to reduce computation enough to perform these calculations in smaller computer clusters and ultimately reduce computational cost when predicting traffic dynamics.

Consider a triple subsystem that includes binary outputs with a non-identical distribution. We assume that a link’s travel time is log-normally distributed. We also assume that the probability of a link being congested is the parameter of a Bernoulli distribution, and the average flow rate of vehicles that enter a link during a time window is the parameter of a Poisson distribution. We therefore formulate the maximum likelihood function in the following four models:

The autonomous model provides separate LL functions wherein the parameters are all independent of each other.
The semi-intertwined 1 (SI1) model provides dependency on parameters without compromising autonomy. The travel time and vehicle flow parameters depend on binary outputs of traffic conditions and have separate LL functions.
The semi-intertwined 2 (SI2) model provides dependency on parameters with partial autonomy. The probability of success (PS) has a separate likelihood function. The travel time and vehicle flow parameters depend on binary outputs of traffic conditions and also share the same LL function.
The fully intertwined (FI) model provides dependency on parameters with no autonomy. Adding all three of the separate LL functions obtains a fully intertwined LL function for the triple subsystems.

To test our model and the corresponding maximum likelihood formulations, we used the Mobiliti simulator to generate several days’ worth of traffic data—consisting of 674 nodes and 1,097 links—over a region of downtown San Francisco. Figure 1 shows a map of the experimental region, which includes roadways of all functional classes (not just arterial roads and highways). We then performed data collection via existing methods [1]. We simulated 30 random trials of vehicle traffic (driven by the San Francisco Chained Activity Modeling Process travel demand model); for each trial, then captured the average vehicle speeds and flow rates across all links in the experimental region for every 15-minute interval within a 24-hour period. The trials ran with stochastic trip leg generation, wherein each leg’s origin and destination nodes came from a random distribution over the specified traffic analysis zones from the demand model. We employed a Bureau of Public Roads congestion model—which finds congested link traversal times based on demand flow rates and the designed capacity—to determine link speeds. The link traversal time increases with the number of vehicles that utilize the link.

To perform maximum likelihood estimation (MLE), we used the Python SciPy optimization methods TNC (truncated Newton constrained minimization, which allows upper and lower bounds using gradient information) and Nelder-Mead (a simplex algorithm for multidimensional unconstrained minimization or maximization with free derivatives) on a sample of 100 links. Our MLE results provide a vector of size 100 of each parameter: the PS, log mean travel time, log variance travel time, and average traffic flow when the LL function is maximized.

Figure 2. Log-likelihood versus number of function evaluations. 2a. Double subsystem. 2b. Triple subsystem. Figure courtesy of Serges Love Teutu Talla.

Estimation Results

PS for triple subsystem: The LL function returns the same value for the autonomous and SI1 models via the Nelder-Mead method, with a slightly higher execution time for SI1. We also obtain the same LL function value for SI1 and SI2 using TNC, with a very similar execution time. With Nelder-Mead, the mean squared errors (MSEs) for the SI1 and autonomous models are equal to each other and smaller than the MSE for FI. With TNC, the MSEs for SI1 and SI2 are equal to each other and smaller than the MSE for FI. The PS MSE for the FI model with a double subsystem is smaller with Nelder-Mead but equal with TNC. The intertwinement seems to impair the estimates’ PS with both Nelder-Mead and TNC.
Mean travel time (MTT): The LL function values for the autonomous and SI1 models are of opposite signs when using the Nelder-Mead method; the execution time for SI1 is more than ten times greater than that of the autonomous model. With Nelder-Mead, the MSE for the autonomous model is smaller than that of SI1, which in turn is smaller than the MSE of FI. With TNC, the MSEs for SI1, SI2, and FI are equal. With a double subsystem, the MTT MSE for FI is smaller with Nelder-Mead but equal with TNC. The intertwinement seems to impair the MTT estimates with Nelder-Mead, but the estimate remains unchanged with TNC.
Mean average flow: The LL function values are higher for the autonomous model than for SI1 via Nelder-Mead, and SI1’s execution time is more than two times longer than that of the autonomous model. With Nelder-Mead, the MSE for FI is smaller than those of the autonomous and SI1 models, which are equal to each other. With TNC, the MSE for SI1 is smaller than those of SI2 and FI, which are equal.

Figure 3. Time elapse versus number of function evaluations. 3a. Double subsystem. 3b. Triple subsystem. Figure courtesy of Serges Love Teutu Talla.

Computational Results

Computations for only 100 data points were very costly; this expense was a challenge because we had more than 1,000 data points. To overcome this difficulty, we assume that the MLE on a certain number of links does not depend on other links and investigate our model’s computational separability.

We created some scatterplots for the fully intertwined model to investigate the way in which the LL functions change with the number of evaluations until convergence (see Figure 2). We also graphed the execution time as a function of the number of function evaluations (see Figure 3). In each model, we consider four blocks of 25 links from the same sample of 100 links and perform MLE to gather data for different MSE values. We plug the MLEs from the blocks of 25 links into the LL function of the code for the 100 links to yield a new LL function value that is based on the estimates from the blocks of 25 links. We then compare the two LL function values and create graphs with both the TNC and Nelder-Mead methods. Furthermore, we compute and compare the MSEs via a similar approach as for the MLEs. Our results indicate that the TNC method satisfies the definition of computational separability for both SI1 and FI.

Figures 4a and 4b each present a comparison graph for MSE versus the number of evaluation functions until we achieve convergence for the double and triple subsystem respectively. In addition, the graphs compare the convergence points with TNC versus Nelder-Mead.

Figure 4. Mean squared errors versus number of function evaluations. 4a. Double subsystem. 4b. Triple subsystem. Figure courtesy of Serges Love Teutu Talla.

Conclusions

In summary, we proposed two binary models for triple subsystems: intertwined and separate. Each model is accelerated for combined subgroup computation outputs, so that they achieve a low number of evaluation functions and a reduced execution time. We hence expect our method to obtain a high-accuracy solution with parallel computing architectures. After the separability investigation, we plan to accelerate performance with parallel optimization via high-performance computing. In addition, we also demonstrated the performance and computational costs for the intertwined model of the triple subsystems and separate independent model of the triple subsystems, then compared the MLE estimates for both models.

Our work is amenable to further extensions, such as a Markov chain-based stochastic model to identify the Markov transition matrix and provide MLE estimates for any departure time during any weekday. In the future, we may also consider binary, weighted binary, and multinomial outputs for traffic conditions to estimate traffic link parameters.

Isabelle Kemajou-Brown presented this research during a minisymposium at the 2021 SIAM Conference on Computational Science and Engineering (CSE21), which took place virtually earlier this year. Serges Love Teutu Talla also presented this research during a minisymposterium at CSE21.

Acknowledgments: This work was supported in part by the U.S. Department of Energy, Office of Science, and Office of Workforce Development for Teachers and Scientists under the Visiting Faculty Program. Our research is a continuation of work that was funded by the Independent Research and Development Program of the Johns Hopkins Applied Physics Laboratory.

References
[1] Chan, C., Wang, B., Bachan, J., & Macfarlane, J. (2018). Mobiliti: Scalable transportation simulation using high-performance parallel computing. In 2018 21st international conference on intelligent transportation systems (ITSC). Maui, HI: IEEE.
[2] Guo, F., Rakha, H., & Park, S. (2010). Multistate model for travel time reliability. Transp. Res. Rec., 2188(1), 46-54.
[3] Ji, Y., Jiang, S., Du, Y., & Zhang, H.M. (2015). Estimation of bimodal urban link travel time distribution and its applications in traffic analysis. Math. Probl. Eng., 615468.
[4] Zhao, X., & Spall, J.C. (2016). Estimating travel time in urban traffic by modeling transportation network systems with binary subsystems. In 2016 American control conference (ACC) (pp. 803-808). Boston, MA: IEEE.

	Serges Love Teutu Talla is a PhD. student and graduate research assistant in the Department of Mathematics at Morgan State University. His interest research areas include actuarial mathematics, mathematical statistical models and applicable techniques, probability, optimization, and data analysis.
	Isabelle Kemajou-Brown is an associate professor of mathematics and actuarial science in the School of Computer, Mathematical and Natural Sciences at Morgan State University. Her research interests lie in the areas of stochastic analysis—ranging from theory to applications—and include the development and analysis of models in economics or finance, simulation, and estimation with real data.
	Cy Chan is a computer research scientist in the Computer Architecture Group at Lawrence Berkeley National Laboratory, where he works on techniques for modeling, simulation, and optimization for high-performance computing systems. His current research interests include scalable modeling and simulation of future transportation systems and smart electrical grid systems, as well as co-design for novel computer architectures.
	Bin Wang is a research scientist in the Energy Technologies Area at Lawrence Berkeley National Laboratory. His research interests include control and optimization techniques in the context of transportation and electric grid systems.