About the Author

Machine Learning Recommendations for Early Intervention in Lung and Heart Illnesses

By Jillian Kunze

Catching the symptoms of an adverse health episode early can significantly reduce the difficulty and cost of treating the condition. Unfortunately, early detection of the symptoms of conditions such as chronic obstructive pulmonary disease, asthma, COVID-19, influenza, and heart failure is not an easy task. “There are no solutions for early detection of at-home exacerbations and infections resulting from infectious and chronic lung and heart disease,” Sumanth Swaminathan said during his minisymposium presentation at the SIAM Annual Meeting, which is taking place virtually this week.

Figure 1. Vironix provides products that use disease-specific models, health data, and other resources to enable consumers to detect health deterioration early.
Swaminathan described the research that he participates in at Vironix, a company that specializes in the early detection of health deterioration events — especially heart and lung illnesses (see Figure 1). Vironix provides hardware-agnostic machine learning algorithms and disease management workflows that enable the users to detect health deterioration before it becomes a major crisis. This includes both proactive, personalized support for quickly intervening in the onset of adverse health events, as well as continuous monitoring for chronic diseases.

Vironix designs algorithms that collect a comprehensive set of health data based on information people know or can easily access from home. These algorithms use many different types of input, including disease-specific symptoms, age, gender, height, weight, other diagnoses, and risk factors. It is also helpful for the patient to measure and input signals such as their heart rate, temperature, and oxygen saturation, as well as record environmental factors, contact with others, and travel. Finally, the algorithms further incorporate evolving clinical standards and local regulations. 

“If you gather this data, you can get people a quick diagnosis at home without bothering them too much,” Swaminathan said. The consumer-level data can be used in predictive models to return accurate assessments of infection risk and health deterioration. The goal is to design a product that is not too esoteric for people to use — it should be an accessible and easy tool to use at home. 

However, two major issues appear in the development of these kinds of predictive models. The first is data access, as most of the institutions that collect health data—like hospitals and research institutions—generally gather this data at the point of care, and thus do not capture the entire disease degeneration process. The data that they do collect is also often incomplete, proprietary, and dirty. Vironix thus has a lot of intellectual property around generating data to build models. The second issue is the complex, multi-scale nature of disease degeneration, which involves multiple biological systems. “Building first-principle models that can be leveraged by consumers at home with measurable inputs is very challenging,” Swaminathan said. 

One methodology for modeling disease severity that Vironix has developed begins with using Bayesian interference to convert clinical characteristic data—relevant signs, symptoms, and health profiles obtained from research literature—into vignettes for training and validating a prediction algorithm. It was important to couple features to create a realistic patient — for example, perhaps a patient in the 0- to 14-year-old age group should not also be a heavy smoker. 

Figure 2. Vironix’s algorithms were able to outperform clinical specialists in predicting whether health events would be severe in the cases of chronic obstructive pulmonary disease, asthma, and heart failure.

These vignettes of simulated patient data, coupled with patient-entered data gathered from Vironix software, can then help to train accurate, specific, and sensitive machine learning classification models for predicting health severity based on a patient’s current state. Swaminathan described validating the machine learning predictions using both test data and the consensus opinions of clinical specialists — the latter based off of data sets that physicians labeled. The algorithms have been fairly accurate, in that they often handily outperform clinical specialists in predicting which cases will become severe or not (see Figure 2). The final step in this process, then, is to deploy these prediction models in a product that general consumers can use.

Swaminathan’s collaborator, James Morrill of the University of Oxford, took over presenting for the second portion of the talk. He discussed another approach to forecasting adverse health episodes—especially in the context of making early interventions—that is based on time series of health data. “We want to be able to use the time series of a patient,” Morrill said. “You have this extra dimension of data across time, so how can you use this information to make better diagnoses?” He presented techniques that can compress the complex information contained in a time series into a more informative format.

There are many difficult aspects in effectively analyzing medical time series. For starters, the data is often expensive to produce. And while it is a high-dimensional data set—with perhaps 40 or 50 parameters—adverse health events do not happen very often. This scenario can cause lots of features to be erroneously generated from the time series, leading to overfitting. 

To avoid this, it is essential to only use highly relevant features in model development. Vironix again drew on doctor-labeled data to develop exacerbation models of what an adverse health event is expected to look like. These models provide a proxy for the exacerbation state that occurs when patients are ill, reducing dependence on other aspects of the time series and decreasing the number of features generated by overfitting.

Figure 3. Predictions of adverse health events (AHEs) made using time series of medical data. The horizontal axis represents time and the vertical axis represents the predicted probability of an adverse health event occurring. The green dots are the predictions made by Vironix’s machine learning algorithm, and the gray boxes show when adverse health events actually occurred.

The approach that Vironix developed involves selecting just one part of the time series and using machine learning to extract its features, then repeating that process for other segments of the time series. With any luck, this will provide enough information that by the time there an adverse health event occurs, the algorithm will be able to recognize it.

Figure 3 displays several example predictions produced using this method on data concerning a range of parameters, including heart rate and symptoms. Patients entered their health data in a mobile app every day, which the algorithm then interpreted to make predictions. All of the examples in Figure 3 were able to correctly predict when adverse health events would occur, but the examples on the right side of the image also predicted events when none ended up happening. But it should be possible, given the size of the data set, to train the model to have relatively few false positives.

To conclude, Morrill noted how helpful these kinds of approaches can be for real-life users. “Temporal exacerbation data gathered from devices and consumer-facing applications can be leveraged to identify adverse health events within a clinically relevant time frame prior to the effect’s onset,” he said. Forecasting health deterioration can make an enormous difference to patients, improving ease of care and saving lives.

  Jillian Kunze is the associate editor of SIAM News