| March 14, 2023

Identifiability and Sensitivity Analysis for Bayesian Parameter Estimation in Systems Biology

By Nathaniel Linden, Padmini Rangamani, and Boris Kramer

An ongoing challenge in systems biology—the field of computational biology that employs mathematical modeling to study signal transduction in cells—is the estimation of model parameters to constrain a model against experimental data [9]. Despite recent advances in experimental technologies, direct measurement of these parameters remains difficult. Researchers throughout the computational sciences have developed many parameter estimation methods, including nonlinear optimization and Bayesian inference [6, 7], but these methods are not yet widespread in systems biology.

Figure 1. A comprehensive framework for parameter estimation and uncertainty quantification in systems biology. Figure adapted from [4].

Why is parameter estimation so challenging in systems biology? Models are based on rate laws that describe biochemical reaction kinetics; in many cases, unknown system compositions or reaction mechanisms lead to uncertainty in a model formulation. Furthermore, researchers often approximate complex biochemical reactions with simplified rate laws—such as Michaelis-Menten kinetics and Hill equations—which contain rational nonlinearities and introduce multiple unknown parameters in each state equation. The strong nonlinearity in the states and parameters, as well as the large parameter count relative to the number of states, collectively result in parameter spaces that are difficult to explore with estimation algorithms. In conjunction with these modeling challenges, biological experiments are time-consuming, costly, and limited by available technologies. Experimental data therefore often include only a few time points (sparsity in time), measure a subset of the states (incompleteness), and are corrupted by various noise sources. Many different parameter sets will yield model predictions that fit this uncertain data equally well.

We recently developed a comprehensive framework for parameter estimation and uncertainty quantification of systems biology models (see Figure 1) [4]. We assumed that the models consist of ordinary differential equations

\[\dot{\mathbf{x}}(t) = f(\mathbf{x}(t); \boldsymbol{\theta}),\]

where \(\mathbf{x} \in ^d\) is the state vector of biochemical concentrations, \(\boldsymbol{\theta} \in ^p\) are the model parameters, and \(f(\cdot,\cdot)\) is the possibly nonlinear function of biochemical rate laws.

The first question is thus as follows: Which parameters can be uniquely estimated from the available data? Nonlinear models and incomplete data often complicate parameter estimation because parameters can be nonidentifiable — infinitely many values of a nonidentifiable parameter lead to the same model outputs [10]. Nonidentifiable parameters may then result in flat loss functions and probability distributions that are hard to optimize or sample with estimation algorithms. In addition, parameters that weakly influence model outputs are difficult to estimate because perturbations to such parameters cause small changes in the value of a loss function or probability distribution. To exclude nonidentifiable and noninfluential parameters from estimation, we applied structural global identifiability analysis [3] and Sobol’s variance-based global sensitivity analysis [7] (see step 1 in Figure 1).

We chose to use Bayesian inference for parameter estimation rather than an optimization-based approach because Bayesian methods provide a complete accounting of uncertainties via probability distributions [6, 7]. Consequently, we aimed to learn the probability densities for the random variables that model the parameters \(\boldsymbol{\theta}\). Bayesian inference leverages Bayes’ rule

\[p(\boldsymbol{\theta}|\mathcal{Y}) = \frac{p(\boldsymbol{\theta})p(\mathcal{Y}|\boldsymbol{\theta})}{p(\mathcal{Y})}\]

to find the posterior distribution \(p(\boldsymbol{\theta}|\mathcal{Y})\), which is the probability distribution of the parameters given the available data \(\mathcal{Y}\). Before incorporating data, the prior distribution \(p(\theta)\) encodes our assumptions about the model parameters. The normalization constant \(p(\mathcal{Y})\) is often omitted, and equality is replaced with proportionality (see step 2 in Figure 1). The likelihood function \(p(\mathcal{Y}|\boldsymbol{\theta})\) measures the probability that a model will predict the given data \(\mathcal{Y}\) by simulating the model with the parameters \(\boldsymbol{\theta}\) and evaluating the data mismatch. We adapted the constrained interval unscented Kalman filter Markov chain Monte Carlo (CIUKF-MCMC) method to approximate the likelihood function for uncertain model formulations [1] and used the CIUKF [8] to ensure that the states—i.e., concentrations of biochemical species—remain nonnegative. We then utilized the affine-invariant ensemble sampler (AIES) [2] for MCMC sampling from the resulting posterior distribution, since AIES is well suited for parameter spaces with widely varying scales (see step 3 in Figure 1).

Figure 2. Marginal distributions for the four influential and identifiable parameters of the mitogen-activated protein kinase (MAPK) model in the bistable regime. The gray distributions correspond to results from the estimation of only these parameters; the blue distributions correspond to results from the estimation of all structurally globally identifiable parameters; and the green distributions correspond to results from the estimation of all model parameters. Figure adapted from [4].

To test our framework, we applied it to estimate the parameters of a model of the mitogen-activated protein kinase (MAPK) signaling pathway [5]. This pathway comprises a series of biochemical reactions responsible for various biological processes and implicated in several diseases. The MAPK model represents the dynamics of three biochemical species in the MAPK pathway via the equations

\[\frac{\textrm {d} x_1(t)}{\textrm {d}t} = k_1 \cdot (S_{1t} - x_1(t)) \cdot \left[\frac{K_1^{n_1}}{K_1^{n_1} + x_3(t)^{n_1}}\right] - k_2 \cdot x_1(t)\]

\[\frac{\textrm {d} x_2(t)}{\textrm {d}t} = k_3 \cdot (S_{2t} - x_2(t)) \cdot x_1(t) \cdot \left[1 + \frac{\alpha \cdot x_3(t)^{n_2}}{K_2^{n_2} + x_3(t)^{n_2}}\right] - k_4 \cdot x_2(t) \]

\[\frac{\textrm {d} x_3(t)}{\textrm {d}t} = k_5 \cdot (S_{3t} - x_3(t)) \cdot x_2(t) - k_6 \cdot x_3(t).\]

It incorporates 14 model parameters: \(\boldsymbol{\theta}_f = [k_1, k_2, k_3, k_4, k_5, k_6, K_1, K_2, S_{1t}, S_{2t}, S_{3t}, \alpha, n_1, n_2]^\top\). This model displays multiple stability features under different combinations of parameters and initial conditions, including limit cycle oscillations and bistability.

How does failing to eliminate nonidentifiable parameters affect parameter estimation? To answer this question, we performed a set of estimation experiments with the MAPK model that focused on the bistable regime and the influential parameters: \(k_2\), \(k_4\), \(k_5\), and \(k_6\). We repeated parameter estimation for the complete set of 12 model parameters (excluding \(n_1\) and \(n_2\)), the partially reduced set of 10 parameters after performing identifiability analysis, and the fully reduced set after performing identifiability and sensitivity analyses. We found that identifiability and sensitivity analyses are crucial when estimating the parameters with high certainty (see the marginal distributions in Figure 2).

Next, we estimated the parameters that yield oscillatory dynamics given the noisy data in Figure 3b, focusing on the identifiable and influential parameters: \(k_2\), \(k_4\), \(k_6\), and \(\alpha\). The resulting marginal distributions for the four parameters in Figure 3a demonstrate that we can estimate each parameter with relatively high certainty. The green 95-percent credible interval in Figure 3b indicates that 95 percent of model simulations in an ensemble of 30,000 simulations closely match the actual trajectory.

Figure 3. Parameter estimation results for the mitogen-activated protein kinase (MAPK) model when using the constrained interval unscented Kalman filter Markov chain Monte Carlo (CIUKF-MCMC) method on the reduced model parameters. 3a. Marginal posterior distributions. 3b. Ensemble simulation with 30,000 random posterior samples. Figure adapted from [4].

Why did we only provide data that were sampled from the limit cycle? The common thought in parameter estimation is that including more data reduces estimation uncertainty. But for the MAPK oscillations, we found that adding more data points that were sampled from the initial transient (first 30 minutes) led to greater uncertainty in parameter estimates. Specifically, we had an approximately 90 percent chance of predicting a limit cycle oscillation with the original 15 data points that were only sampled from the oscillations. However, the probability of predicting a limit cycle dropped to roughly 45 percent when we used 30 data points that were sampled from both the initial transient and oscillations. We hypothesize that data quality—in this case, focused sampling of the oscillations—is more important in this scenario than data quantity.

In summary, the proposed framework enables parameter estimation and uncertainty quantification for systems biology models in the face of nonlinearity, nonidentifiability, limited and noisy data, and uncertain model formulations. Although the omission of nonidentifiable and noninfluential parameters was crucial to successful estimation, we believe that future efforts should focus on estimation methods that can handle nonidentifiable parameters. Additionally, we are actively working to apply our framework to models with high-dimensional parameter spaces and data from in vitro live cell imaging experiments. Ultimately, this work highlights an opportunity for computational science and mathematics to transform systems biology.

Nathaniel Linden delivered a minisymposium presentation on this research at the 2022 SIAM Conference on Mathematics of Data Science (MDS22), which took place in San Diego, Ca., last year. He received funding to attend MDS22 through a SIAM Student Travel Award. To learn more about Student Travel Awards and submit an application, visit the online page.

References
[1] Galioto, N., & Gorodetsky, A.A. (2020). Bayesian system ID: Optimal management of parameter, model, and measurement uncertainty. Nonlinear Dyn., 102, 241-267.
[2] Goodman, J., & Weare, J. (2010). Ensemble samplers with affine invariance. Comm. Appl. Math. Comput. Sci., 5(1), 65-80.
[3] Hong, H., Ovchinnikov, A., Pogudin, G., & Yap, C. (2019). SIAN: Software for structural identifiability analysis of ODE models. Bioinformatics, 35(16), 2873-2874.
[4] Linden, N.J., Kramer, B., & Rangamani, P. (2022). Bayesian parameter estimation for dynamical models in systems biology. PLoS Comput. Biol., 18(10), e1010651.
[5] Nguyen, L.K., Degasperi, A., Cotter, P., & Kholodenko, B.N. (2015). DYVIPAC: An integrated analysis and visualisation framework to probe multi-dimensional biological networks. Sci. Rep., 5, 12569.
[6] Oden, J.T., Babuška, I., & Faghihi, D. (2017). Predictive computational science: Computer predictions in the presence of uncertainty. In Encyclopedia of computational mechanics (2nd ed.) (pp. 1-26). New York, NY: John Wiley & Sons, Ltd.
[7] Smith, R.C. (2013). Uncertainty quantification: Theory, implementation, and applications. In Computational Science and Engineering Series (Vol. 12). Philadelphia, PA: Society for Industrial and Applied Mathematics.
[8] Teixeira, B., Torres, L.A.B., Aguirre, L.A., & Bernstein, D.S. (2008). Unscented filtering for interval-constrained nonlinear systems. In Proceedings of the 47th IEEE Conference on Decision and Control (CDC 2008) (pp. 5116-5121). Cancún, México: Institute of Electrical and Electronics Engineers.
[9] Vittadello, S.T., & Stumpf, M.P.H. (2022). Open problems in mathematical biology. Math. Biosci., 354, 108926.
[10] Wieland, F.-G., Hauber, A.L., Rosenblatt, M., Tönsing, C., & Timmer, J. (2021). On structural and practical identifiability. Curr. Opin. Syst. Biol., 25, 60-69.

	Nathaniel Linden is a Ph.D. student in the Department of Mechanical and Aerospace Engineering at the University of California, San Diego. His research involves developing and applying computational methods for model calibration and uncertainty quantification in systems biology.
	Padmini Rangamani is a professor of mechanical and aerospace engineering at the University of California, San Diego. Her research uses mathematical and computational methods to study the complex interplay between cellular biology, mechanics, and biochemistry.
	Boris Kramer is an assistant professor of mechanical and aerospace engineering at the University of California, San Diego. His research develops reduced-order models to enable control, design, optimization, and uncertainty quantification of complex and large-scale systems.