Machine Learning’s Impact on Global Public Health

By Paul Davis

How bad is malaria in sub-Saharan Africa? Whom does malaria disable or kill? Where does it strike? Are its dangers rising or waning? Which threat is worse, malaria or, say, road injuries?

The Global Burden of Disease (GBD) study offers answers to these kinds of questions to help public health officials choose between distributing mosquito nets and installing traffic signals. Through a minitutorial at the 2016 SIAM Annual Meeting, held in Boston, MA, last July, Abraham Flaxman (Institute for Health Metrics and Evaluation at the University of Washington) introduced applied mathematicians to this important health policy tool. Then he led his audience on a deeper, hands-on dive into one of the study’s computationally sophisticated components: verbal autopsies.

For your own version of Flaxman’s overview of the GBD study, go to the GBD Compare page on the Institute for Health Metrics and Evaluation site, or see Figure 1. These so-called tree maps display by country (though the entire world appears in Figure 1) the years of life lost to disability and death from infectious disease (red), noncommunicable disease (blue), and injury (green). Technically, these maps display disability-adjusted life years (DALYs), a construct that captures both death and disability in proportion.

Figure 1. This tree map displays the relative contribution worldwide to mortality and disability in 2015 from infectious disease (red), noncommunicable disease (blue), and injury (green). Image credit: Institute for Health Metrics and Evaluation (IHME). GBD Compare. Seattle, WA: IHME, University of Washington, 2016. Available from http://vizhub.healthdata.org/gbd-compare. (Accessed November 15, 2016).

Users of these online tree maps and other visuals can track changes over time in one country or region or disease; e.g., the peak in deaths and disability due to AIDS in Africa is painfully clear. Or they can track risk factors—matters that societal health policy can affect—like obesity, which is indicated by a high Body Mass Index.

The data underlying these visual displays is an account of who died from which cause. It is seemingly simple to gather, if you knew who died in the region of interest, say sub-Saharan Africa. And if you could see every death certificate. Assuming none of the death certificate entries were mistaken, perhaps confusing a risk factor (hypertension) with a cause (aneurysm). And presuming each certificate listed but a single cause of death, a supposition any clinician would reject out of hand.

Or perhaps you could succeed by changing tack to extract causes of death from hospital data, assuming it is valid, and somehow extend those causes to the population that died outside of hospitals throughout sub-Saharan Africa.

In the spirit of a tutorial, Flaxman paused at this discouraging juncture to “pair and share,” promoting engagement by letting audience members chat with those around them and come up with their own responses to this data challenge. Once everyone was invested in the problem, he explained verbal autopsies, the process now in use to determine cause of death.

In a verbal autopsy, a trained interviewer asks those who were with the deceased a series of structured questions, such as “Did your father have a fever during his last illness?” From the resulting narrative of the final illness, a health professional with experience in the specified location and culture can identify a cause of death.

Flaxman focuses on the use of machine learning to assign a cause of death in place of that experienced health professional. The balance of his tutorial introduced some tools of the trade via an online computational notebook, before moving from those immediate computational experiences to more subtle questions.

Which quantity should we optimize to accurately and usefully interpret the narrative of a verbal autopsy? How do you test the machine learning tool – by holding back an expensive subset of this “big cost data?” Are machine learning tools almost too flexible for useful generalizations in this setting?

Flaxman confesses to being “obsessed with reproducibility” in his decisions about machine learning training and testing. What metric of accuracy should we optimize to avoid being “wrong about half the time” when the tool is used out of sample? His choice on that front is the “cause-specific mortality fraction,” a kind of relative error not for the individual causes of death but for the overall error in the fraction of the deceased assigned each particular cause of death.

Flaxman seeks to “explain why,” in the sense of identifying particular answers to the verbal autopsy’s structured questions that led the interviewer to select certain causes of death. He suggests that “understanding errors builds trust” in the computational methods, certainly a valuable perspective in such a diverse disciplinary environment as the GBD study. An immediate practical challenge is achieving similar accuracy when identifying causes of death using fewer questions and therefore shorter, less expensive interviews, or so-called “data-driven item reduction.”

Flaxman’s insightful tutorial exposed a hard, practical, and immensely important contribution of machine learning to global public health. He cleverly let his audience dip their toes into puddles of questions surrounding formulation and implementation. And his litany of tough queries revealed that many significant challenges remain.

Links to a video and PDF of Flaxman’s presentation at the 2016 SIAM Annual Meeting offer further information. Students can also learn about post-baccalaureate fellowships offered by the Institute for Health Metrics and Evaluation.

Paul Davis is professor emeritus of mathematical sciences at Worcester Polytechnic Institute.