SIAM News Blog
SIAM News
Print

The Forensics of Emerging Diseases

By Matthew R. Francis

New diseases have always been part of humanity’s world, but today’s highly interconnected global culture facilitates the worldwide spread of epidemics more quickly than ever. To make matters worse, novel zoonotic diseases—brought about by pathogens that jump from animals to humans—pose an ongoing threat because of human practices that bring us into contact with animal species with which we do not generally interact. One such disease is currently running amok across a large part of the world: COVID-19, caused by the previously-unknown severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

When a novel infection emerges, researchers face a wide range of challenges, in addition to the obvious medical crises. Two queries are central to these challenges: Where did the disease come from and how bad will it be? Answering these questions requires a combination of detective work, sophisticated mathematics, on-the-ground ecological knowledge, and machine learning.

“It’s hard to know what we’re preparing for,” Michael Johansson, a biologist at the National Center for Emerging and Zoonotic Infectious Diseases (part of the Centers for Disease Control and Prevention), said. During his talk at the 2020 American Association for the Advancement of Science (AAAS) Annual Meeting, Johansson likened disease forecasting to hurricane prediction. In both cases, deterministic mathematical models—which require constantly-updated data—can only go so far before necessitating supplementation by other methods.

How to Be a Good (Disease) Host

Barbara Han is an ecologist at the Cary Institute of Ecosystem Studies, and her work largely focuses on identification of animal species that harbor novel diseases. During her presentation at the 2020 AAAS Annual Meeting, Han pointed out that most emerging pathogens come from mammals. In fact, over 190 identified unique zoonotic diseases have been associated with mammal species (see Figure 1).

“The primary goal is to predict which species would give rise to new zoonotic threats to humans,” Han said. “You can look at which rodents, bats, carnivores, and so forth have a high risk of transmitting something to humans, then examine their features to try to understand what separates those with the capacity to infect humans from those without it.” 

Figure 1. Prevalence of known zoonotic pathogens and their distribution among rodent species. Most classified pathogens are viruses and most species only harbor one type that is known to be harmful to humans. Figure courtesy of [2].
One complication is that reservoir species—rats, bats, monkeys, etc.—that harbor the pathogens often do not infect us directly. For instance, humans are infected with Zika virus via Aedes mosquito bites, but scientists think that the reservoir species is a type of monkey, thus making the mosquitoes disease vectors rather than reservoirs. In addition, the vector animals themselves may not be susceptible to zoonotic diseases; this means that they might harbor the pathogens and pass them to each other without getting sick, regardless of human vulnerability.

Identification of reservoir species has faced its share of successes and failures. For example, researchers have linked rodents to both Lyme disease and the plague and implicated bats in diseases such as rabies, the Nipah virus, and possibly Ebola. We know that monkeys are probably reservoirs for the Zika virus and bats almost certainly harbored the coronavirus that caused the 2003 SARS epidemic. However, the evidence is circumstantial.

Han and her colleagues identified five primary predictive factors that make species potential reservoirs for zoonotic pathogens: litter size \((\theta_1)\), the number of litters each species has per year \((\theta_2)\), maximum longevity \((\theta_3)\), population density \((\theta_4)\),  and social group size \((\theta_5)\). Physical characteristics like body size correlate with many of these properties; for instance, small rodents have many tiny offspring and often live in very high-density conditions that facilitate transmission of pathogens between animals.

During her AAAS talk, Han described these parameters’ use in a modified version of the classic SIR model for epidemics:

\[ \frac{dS}{dt}=b_0N-b_1N^2-\beta SI-\mu S \\

\frac{dI}{dt}=\beta SI- (\gamma +\mu)I \\

\frac{dR}{dt}= \gamma I - \mu R, \]

where \(N(t)=S+I+R\) is the total local population of a rodent species and \(S\), \(I\), and \(R\) respectively denote the number of susceptible, infected, and recovered animals. She derives the fixed parameters from the species characteristics: birth rate \(b_0= \theta_1 \theta_2\), death rate \(\mu = \theta_3^{-1}\), and population density parameter \(b_1=(b_0- \mu)/ \theta_4\). The variable parameters are recovery rate \(1/2< \gamma^{-1}< \theta_3\) and transmission probability \(0.1\theta_5<\beta<\theta_5\), which one can adjust to suit the specific population under consideration (the standard SIR model can be recovered by setting \(b_0=b_1=\mu=0\), which implies a constant population \(N\)).

In addition to species characteristics, the frequency of contact between humans and animals is a major factor. “You do not have a transmission event unless there is some opportunity for that pathogen to get to you,” Han said. “When humans encroach on wild lands, or prevent animals from accessing food resources so they have to forage in different places, these things all contribute to increased contact frequency with people.”

Changes in human behavior—including deforestation, widespread urbanization, disruption of traditional hunting patterns, and so on—can also introduce contact with species to which we are unaccustomed. These alterations to human-animal interactions mean that people hunt animals they typically did not in the past, or share living space with novel species. To put it another way: while bats may harbor SARS-CoV-2, a cascade of human behaviors brought our species into collision and transformed limited exposure into a global pandemic.

The Hole in the Doughnut

However, scientists can seldom immediately link a zoonotic outbreak to contact with another species, particularly when a vector or other intermediary species is involved — as with the 2003 SARS outbreak, where a civet was likely the carrier that infected the first humans. “We’re still clambering for information on primates,” Han said, referring to her own work to identify the source of the recent devastating Zika epidemic in the Americas. “Without understanding the reservoir’s biology, how can you make an accurate prediction that’s going to help you take preventative action?” 

When a paucity of information exists about a species, epidemiologists become detectives. To identify potential reservoirs for the Zika virus, Han and her collaborators tabulated 33 parameters for 364 primate species (not counting humans). These traits included those used for rodents but also comprised geographic range, metabolic rate, and other potentially useful information. At least one parameter was unknown for over a third of the primates, so the group applied an iterative method called multiply imputed chained equations (MICE) to estimate values for these species. Since the relationship between parameters is not entirely random within species—for example, small species do not normally produce multiple large offspring—and similar species often possess similar characteristics, the MICE method employs regression to fill in the missing information in a biologically consistent manner [1].

But species parameters are not enough; the hole in the doughnut inside the other doughnut is whether a given species is the reservoir for a zoonotic disease. To estimate this probability, Han and her team utilized Bayesian Multi-label Learning via Positive Labels (BMLPL) [3]. This method assigns binary labels to each species, specifically asking whether animals carry a given virus. The answer might be known for certain species, such as monkeys that are affirmatively tested to carry the Zika or dengue virus. The group’s analysis involved six flaviviruses (the type including Zika, dengue, and West Nile) and the aforementioned 364 types of primates.

The training data for BMLPL are a matrix \(\mathbf{X}\) composed of the known parameters and those obtained via MICE. The “label” matrix \(\mathbf{Y}\) consists of 0 and 1 values—depending on whether a given species carries a particular virus—for each of the six viruses under consideration. Since this information is unknown for most of the 364 species, \(\mathbf{Y}\) is a very sparse matrix. However, related species are more likely to harbor similar viruses, so the entries are not random in general. The BMLPL method takes in the vector of parameters for a species \(\mathbf{x}\) from \(\mathbf{X}\) and uses machine learning to extract a label vector \(\mathbf{y}\) from \(\mathbf{Y}\).

These coupled methods allowed Han and her colleagues to identify primate reservoirs for flaviviruses with 82 percent accuracy. They also found that these reservoirs were the most likely to live near or among humans. However, Han pointed out that reconstructing missing parameters can only go so far. “I think we are quickly heading towards a wall of just not having the kind of available data that we need,” she said. “There must be equal investment in generating the raw data required for useful, actionable, and accurate predictions. Without that, we cannot move forward.”

This is certainly a common refrain in the context of novel diseases, wherein we are hampered by the fact that obtaining data from noisy, real-world situations takes time. It is not enough to say that bats are the reservoir for a disease, or that Aedes mosquitoes carry Zika virus. We must know what specific human-animal interactions turn zoonotic infections into epidemics and learn how to mitigate them. Even the best detectives can only do so much; they need evidence in order to work.


References
[1] Han, B.A., Majumdar, S., Calmon, F.P., Glicksberg, B.S., Horesh, R., Kumar, A., …, Varshney, K.R. (2019). Confronting data sparsity to identify potential sources of Zika virus spillover infection among primates. Epidem., 27, 59. 
[2] Han, B.A., Schmidt, J.P., Bowden, S.E., & Drake, J.M. (2015). Rodent reservoirs of future zoonotic diseases. Proc. Natl. Acad. Sci., 112, 7039.
[3] Rai, P., Hu, C., Henao, R, & Carin, L. (2015). Large-scale Bayesian multi-label learning via topic-based label embeddings. In C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems 28 (NIPS 2015) (pp. 3222-3230). Montreal, Canada: Curran Associates, Inc.

Further Reading
Quammen, D. (2012). Spillover: Animal infections and the next human pandemic. New York, NY: W.W. Norton & Company.

Matthew R. Francis is a physicist, science writer, public speaker, educator, and frequent wearer of jaunty hats. His website is BowlerHatScience.org.

blog comments powered by Disqus