| June 12, 2019

Bridging the Gap between Cancer Risk-factor Prevalence and Cancer Incidence

Cancer is a potentially devastating disease that severely impacts individual patient’s lives and the healthcare system as a whole. It is difficult to study risk factors for the disease because cancer is relatively rare and diagnosed many years or decades after initial exposures; researchers seldom have high-quality, individual-level exposure data linked to cancer outcomes. For example, while we would like to know the average number of cigarettes an individual smoked per day at every age, or the timing and dose of every exposure to radiation, data like these are hard—if not impossible—to collect. Instead, we are more likely to have cross-sectional, population-level information, such as the percentage of people smoking every year. If we can use mathematical models to connect this kind of population-level, risk-factor prevalence data to the number of cancers diagnosed, we may be able to better understand the underlying biological mechanisms and dynamics of the disease.

Cancer arises from an accumulation of genetic abnormalities and mutations. Although the exact biological mechanisms of different cancers vary, they usually involve the disabling of tumor suppressor genes (such as p53 or pRB) or the activation of tumor-promoting mechanisms (such as Myc or Ras). The “multi-hit hypothesis” of carcinogenesis suggests that multiple mechanisms must be activated or deactivated for uncontrolled tumor growth to occur. In the 1970s, Alfred Knudson used this hypothesis to analyze incidence of retinoblastoma, a type of rare eye cancer that affects children under age five. The patterns of cancer diagnosis differed between children who had cancer in both eyes and those who had cancer in only one eye. Many of the children with cancer in both eyes also had a family history of this cancer, thus suggesting a strong genetic component. Knudson hypothesized that two “hits” to the genome caused cancer, and that children with a family history of the disease had already inherited one hit. The data backed up this hypothesis, predicting the existence of a tumor suppressor gene. Researchers later experimentally verified and named this gene (pRB) the retinoblastoma protein in recognition of its role in this cancer.

Scientists have since formalized this “multi-hit” hypothesis into a family of stochastic carcinogenesis models called multistage clonal expansion models, which treat cancer as arising from a series of rare, stochastic events. A number of initiation mutations are needed before cells enter a state of uncontrolled growth called clonal expansion. One or more of these cells may then develop a mutation that causes the cell to become malignant. A schematic of a two-stage clonal expansion model is below.

We decided to use these multistage clonal expansion models to bridge the gap between cancer risk-factor prevalence and cancer incidence, with three cancers as examples. We first considered the well-known connection between smoking and lung cancer, and found that we could model lung cancer incidence over time using only known smoking prevalence and a handful of biological parameters — even without individual-level smoking intensity patterns. Next we examined the role of the bacterium Helicobacter pylori in stomach cancer. H. pylori is found globally in over 50 percent of adults, but infection is usually asymptomatic. This indicates that H. pylori could be part of a normal stomach microbiome. However, it is associated with gastritis and gastric cancer in some people. Because it is usually acquired in childhood and persists unless treated, we assigned a probability of having H. pylori to every birth cohort. Again, we found that known H. pylori prevalence and a few biological parameters were enough to effectively model gastric cancer. In fact, when we tested whether we could estimate H. pylori prevalence from cancer incidence alone, we predicted trends very similar to the true data.

Finally, we matched the prevalence of human papillomavirus (HPV) in the oral cavity with oropharyngeal cancer, a type of head and neck cancer that is often caused by HPV. HPV is the most common sexually-transmitted infection, and nearly all sexually-active adults have it at some point in their lives. However, infections are usually asymptomatic and clear within a couple of years. But HPV does cause nearly every cervical cancer, 90 percent of other anal and genital cancers, and a larger fraction of oropharyngeal cancers. Unlike with the aforementioned cancers, testing for oral HPV presence has only recently become possible; this means that we have no historical record of HPV. Instead, we used our multistage clonal expansion model to estimate historic HPV prevalence from the incidence of cancer. While enough sources of uncertainty are present that we do not claim to measure this past trend with high accuracy, this is the first look at possible qualitative trends in historical HPV prevalence.

Multistage clonal expansion models offer a way for us to better understand and cross the long temporal gap between exposure to cancer risk factors and cancer incidence. When a strong, causal pathway exists between the exposure and the cancer, we find that we can successfully model cancer incidence with only a few model parameters alongside prevalence information. We can even go backwards and suggest how prevalence may have changed in the past. Ultimately, these tools will help us understand cancer dynamics as well as likely future trends in cancer incidence.

The author presented this research during a minisymposium at the 2019 SIAM Conference on Applications of Dynamical Systems, which took place last month in Snowbird, Utah. His corresponding paper is as follows:

Brouwer, A.F., Eisenberg, M.C., & Meza, R. (2018). Case Studies of Gastric, Lung, and Oral Cancer Connect Etiologic Agent Prevalence to Cancer Incidence. Cancer Res., 78(12), 3386-97.

Andrew Brouwer is a research faculty member in the Department of Epidemiology at the University of Michigan. He has a Ph.D. in applied and interdisciplinary mathematics, an M.S. in environmental science and engineering, and an M.A. in statistics. Andrew’s research is in mathematical and statistical modeling for biology and public health, particularly in regards to models of infectious disease and cancer. Considerations of parameter identifiability and estimation are an underlying theme of his work.