# Data-driven Mathematical Modeling of Cancer

A significant challenge in the mathematical modeling of various cancers is the presence of numerous unknown parameters and the limited number of available data sets. To combat this issue, researchers often make assumptions about parameter values or borrow estimates from related diseases or models. Consequently, many existing mathematical models for cancer consider only a subset of immune cells and assume uniform behavior for all tumors. But given that tumor evolution is intricately tied to specific immune profiles, it is more precise to first identify immune variations within tumors and comprehend their growth mechanisms in the absence of treatments.

To address this challenge, we used digital cytometry methods to estimate the values of key variables—such as immune cell abundances—in mathematical models. These estimated immune fractions are then subject to clustering analysis, which facilitates the identification of tumor groups that have distinct immune patterns (see Figure 1). We proceed to separately estimate the model parameters for each tumor group; some parameter values come from existing literature, while others are derived from available data sources. We conduct sensitivity analyses on the nondimensionalized system to identify the most critical parameters for each tumor group and assess their impacts. Finally, we investigate the dynamics of tumors within each group and compare them with clinical data to uncover the connections between the tumor microenvironment and cancer progression.

### Digital Cytometry with TumorDecon

Tumor heterogeneity is a key challenge in treatment response prediction. Traditional methods for the characterization of tumor composition—like immunohistochemistry, flow cytometry, and mass cytometry—are all resource intensive. Computational techniques for digital cytometry address this problem by estimating cell type fractions from gene expression data [7]. A 2009 study introduced a linear model \(X\beta=y\) to estimate cell type percentages from microarray data [1], and CIBERSORT improved this method in 2015 with support vector regression [13]. Existing score-based models, such as single-sample gene set enrichment analysis and singscore, use gene set enrichments instead of a signature matrix. In 2019, researchers extended CIBERSORT to CIBERSORTx in order to address cross-platform data variations [14].

We built upon the aforementioned methods to develop TumorDecon, which includes our unique signature matrix creation method and data preprocessing techniques [3, 7]. Our signature matrix derivation process involves three main steps: (i) batch effect removal via ComBat, (ii) cell type clustering, and (iii) differential expression analysis. TumorDecon effectively addresses the issue of batch effects, which are caused by variations in lab conditions and can distort gene expression data. We evaluated our deconvolution methods on both simulated and real data sets [7].

### Mechanistic Mathematical Modeling of the Tumor Microenvironment

To understand the complex interactions in the tumor microenvironment, we employ ordinary differential equation (ODE) models that benefit from simulation ease and parameter estimation. We developed a data-driven ODE-based quantitative systems pharmacology model for colon cancer, which revealed the dominant role of immune responses—especially macrophage dynamics—across various patient clusters with distinct immune patterns [6]. Despite the challenge of accurately predicting patient responses to treatments [2], this data-driven ODE model generated reasonably accurate predictions of individual patient responses to FOLFIRI treatments (a specific chemotherapy regimen) based solely on their primary tumor gene expression profiles [4]. It is worth noting that machine learning methods primarily succeed at predicting drug responses in cell lines and patient-derived xenografts, rather than in actual patients [2].

**Figure 1.**Overview of our mathematical model for colon cancer and osteosarcoma. Figure courtesy of [9].

### Parameter Estimation and Sensitivity Analysis

By using available data to assess parameter identifiability, we can determine the extent to which we can accurately estimate the model parameter values based on available measurements. Additionally, global and local sensitivity analyses identify the most influential parameters that significantly impact model predictions [10, 12]. This information helps us focus on the parameters that most substantially affect system dynamics. We strive for accurate estimates of the values of the most sensitive parameters by referring to published studies and existing knowledge. As we iteratively refine our model, we aim to develop a more accurate and reliable representation of the tumor microenvironment and its dynamics.

### Spatiotemporal Mathematical Modeling

We have also expanded our ODE breast tumor model [11] to incorporate spatial interactions [12]. Our unique approach combines reaction-diffusion-advection (RDA) partial differential equations (PDEs) with ODEs to ultimately enable the comprehensive investigation of the immune spatiotemporal patterns that encompass a vast reaction network and free boundary growth. This mechanistic model explores tumor microenvironment spatiotemporal patterns and their influence on immune cell dynamics.

To contribute to spatiotemporal solid tumor modeling, we utilized a porous medium framework that explores the interaction between fluid dynamics and solid deformation in the context of osteosarcoma [5]. This approach is grounded in Biot theory and aims to comprehend these processes’ interrelation with tumor behavior. Incorporating the Biot equations into the model is crucial for capturing the tissue’s poroelastic behavior. Yet despite the significance of these equations in poroelasticity theory, their complexity prevents most studies from integrating them into their models.

We developed a biomechanical multiphase PDE model with a biological and mechanical component that describes the dynamics of the primary tumor microenvironment. The biological aspect is an extension of our ODE model [8] to a system of RDA equations, while the mechanical element is derived from first principles and follows the same argument that leads to the Biot equations; the coupling terms in this part consist of the fluid velocity that is associated with convection. We use the mechanical portion to model the fluid motion, which carries various types of cells and cytokines. This motion is an essential part of the continuity equation because it signifies fluid flow through porous media. The coupling term on the biological side is the additional strain of the solid tumor that is caused by an increase or decrease of cells. We adopt the RDA equations by adding the diffusion term \(D_\textrm{cell}\Delta[X_i]\) and the advection term \(b\nabla \cdot ([X_i] \kappa \nabla p)\) to the ODEs of each cell type. Consequently, we can write the equations in the following vector form:

\[\frac{\partial [\mathbf{X}]}{\partial t} - D\Delta [\mathbf{X}] + B\nabla \cdot ([\mathbf{X}] \kappa\nabla p) = \mathbf{f},\]

where \(D = \mathrm{diag}\{D_i\}\) and \(B = \mathrm{diag}\{b_i\}\) are diagonal coefficient matrices.

We utilize the continuity equation that generally takes the form \(\frac{\partial \rho}{\partial t} + \nabla \cdot (\rho \mathbf{v}) = f\), where \(\rho\) is the density of the matter of interest, \(\mathbf{v}\) is the fluid velocity, and \(f\) is the external source. In the context of cancer modeling, \(\mathbf{v}\) represents the continuous motion of cells within the tumor. Specifically, we treat \(\mathbf{v}\) as the velocity of the fluids that flow through porous media and apply Darcy’s law, thus treating the tumor tissue as a porous medium and assuming that the flow carries the moving cells. In our porous media model for osteosarcoma [5], we took advantage of the full poroelasticity equations that cover Darcy’s law but still maintained a variable that describes the movement of the solid body (i.e., the tumor). We therefore add the description of fluid flow into the theory of porous media and obtain a source for the solid/mesh movement.

### Discussion

Data-driven mechanistic models allow us to simulate a wide range of scenarios to predict the reaction of cancer cells to various treatment strategies and help oncologists make more informed treatment decisions. This capability also paves the way for personalized therapies that suit the needs of individual patients and enhance the effectiveness of cancer treatments. Such a framework can even serve as a platform for the exploration and virtual testing of new therapies — an aspect that is particularly exciting because it can potentially accelerate the development of new cancer treatments. By enabling the rapid and efficient testing of new therapeutic strategies, data-driven models could help advance cancer treatment and improve outcomes for patients around the world. These approaches embody the cutting edge of cancer research and are a significant step forward in our fight against all of the complex manifestations of this disease.

Here, we introduced a computational framework that creates data-driven mathematical models for the tumor microenvironment and leverages gene expression profiles to estimate the model parameters. Unlike traditional models, this approach captures the complex nature of cancer by considering the diversity of cancer cells and their unique genetic alterations (as encoded by gene expression profiles). It also examines the dynamic interactions within the tumor microenvironment, including the interplay between cancer cells and immune cells. This type of holistic view is essential for understanding the intricate workings of cancer.

However, it is also important to acknowledge the potential limitations of our framework. Although it represents an advancement in cancer modeling, the model’s accuracy is inherently dependent on the quality and comprehensiveness of the input data — particularly the gene expression profiles. Gaps in data or variations in data quality can affect the model’s predictions. And while the model does acknowledge a wide range of cancer characteristics, biological factors that are as yet unaccounted for could still influence its accuracy. Lastly, translating these virtual predictions into clinical settings involves complex regulatory and ethical considerations that may complicate the practical application of this technology.

Though its limitations are worth nothing, our framework nonetheless provides a small yet hopefully meaningful step towards more personalized cancer treatments and introduces an informatics tool that can help practitioners navigate the complexities of this disease.

*Leili Shahriyari delivered a minisymposium presentation on this research at the 10th International Congress on Industrial and Applied Mathematics (ICIAM 2023), which took place in Tokyo, Japan, last year. She received funding to attend ICIAM 2023 through a SIAM Travel Award that was supported by U.S. National Science Foundation grant DMS-2233032. To learn more about SIAM Travel Awards and submit an application, visit the online page.*

**References**

[1] Abbas, A.R., Wolslegel, K., Seshasayee, D., Modrusan, Z., & Clark, H.F. (2009). Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. *PLoS One*, *4*(7), e6098.

[2] Adam, G., Rampášek, L., Safikhani, Z., Smirnov, P., Haibe-Kains, B., & Goldenberg, A. (2020). Machine learning approaches to drug response prediction: Challenges and recent progress. *NPJ Precis. Oncol.*, *4*, 19.

[3] Aronow, R.A., Akbarinejad, S., Le, T., Su, S., & Shahriyari, L. (2022). TumorDecon: A digital cytometry software. *SoftwareX*, *18*, 101072.

[4] Budithi, A., Su, S., Kirshtein, A., & Shahriyari, L. (2021). Data driven mathematical model of FOLFIRI treatment for colon cancer. *Cancers*, *13*(11), 2632.

[5] Hu, Y., Mirzaei, N.M., & Shahriyari, L. (2022). Bio-mechanical model of osteosarcoma tumor microenvironment: A porous media approach. *Cancers*, *14*(24), 6143.

[6] Kirshtein, A., Akbarinejad, S., Hao, W., Le, T., Su, S., Aronow, R.A., & Shahriyari, L. (2020). Data driven mathematical model of colon cancer progression. *J. Clin. Med.*, *9*(12), 3947.

[7] Le, T., Aronow, R.A., Kirshtein, A., & Shahriyari, L. (2020). A review of digital cytometry methods: Estimating the relative abundance of cell types in a bulk of cells. *Brief. Bioinform.*, *22*(4), bbaa219.

[8] Le, T., Su, S., Kirshtein, A., & Shahriyari, L. (2021). Data-driven mathematical model of osteosarcoma. *Cancers*, *13*(10), 2367.

[9] Le, T., Su, S., & Shahriyari, L. (2021). Investigating optimal chemotherapy options for osteosarcoma patients through a mathematical model. *Cells*, *10*(8), 2009.

[10] Mirzaei, N.M., Hao, W., & Shahriyari, L. (2023). Investigating the spatial interaction of immune cells in colon cancer. *iScience*, *26*(5), 106596.

[11] Mirzaei, N.M., Su, S., Sofia, D., Hegarty, M., Abdel-Rahman, M.H., Asadpoure, A., … Shahriyari, L. (2021). A mathematical model of breast tumor progression based on immune infiltration. *J. Pers. Med.*, *11*(10), 1031.

[12] Mirzaei, N.M., Tatarova, Z., Hao, W., Changizi, N., Asadpoure, A., Zervantonakis, I.K., … Shahriyari, L. (2022). A PDE model of breast tumor progression in MMTV-PyMT mice. *J. Pers. Med.*, *12*(5), 807.

[13] Newman, A.M., Liu, C.L., Green, M.R., Gentles, A.J., Feng, W., Xu, Y. … Alizadeh, A.A. (2015). Robust enumeration of cell subsets from tissue expression profiles. *Nat. Methods*, *12*(5), 453-457.

[14] Newman, A.M., Steen, C.B., Liu, C.L., Gentles, A.J., Chaudhuri, A.A., Scherer, F., … Alizadeh, A.A. (2019). Determining cell type abundance and expression from bulk tissues with digital cytometry. *Nat. Biotechnol.*, *37*(7), 773-782.