| June 01, 2021

Fighting a Pandemic with Medical Imaging and Machine Learning: Lessons Learned from COVID-19

By Michael Roberts, Derek Driggs, Ian Selby, Evis Sala, and Carola-Bibiane Schönlieb

Many researchers in applied mathematics have recognized that machine learning (ML) models can help clinicians during the ongoing COVID-19 pandemic. Models that utilize chest radiographs and computed tomography (CT) scans to detect patients with severe cases of COVID-19 have received considerable attention over the last year. However, pervasive systematic errors in existing research make it dangerous for doctors to consider these models for clinical use.

Here we identify a few of these complications and discuss avenues that researchers can explore to close the gap between model development and model deployment. In particular, we emphasise the need to (i) source high-quality data, wherein the biases are understood and appreciated; (ii) incorporate data from multiple sources (as a clinician would when making a decision); and (iii) employ a multidisciplinary team for model development.

The work, effort, and dedication of the mathematical imaging and ML communities during this pandemic has been inspirational and clearly shows ML’s potential for clinical decision support. It also demonstrates the possible pitfalls of ML during a global emergency.

By focusing on image-based diagnosis and prognosis for COVID-19, we make several observations about the quick and reliable development of ML-based clinical support tools. Our consequential discussion stems from a recent systematic review [12] and editorial piece [4].

Imaging and Machine Learning for the COVID-19 Pandemic

Chest imaging is a useful tool for the initial triage of patients with COVID-19 at hospitals with first-line utilization of chest X-rays (CXRs) and CT imaging. Although European and American radiological societies initially discouraged the use of CT and chest radiographs for COVID-19 diagnosis in early 2020 [1, 13], this position softened as the high false negative rate of existing tests became apparent and the pandemic began to strain the resources that are required to test patients quickly [14]. China has employed imaging exams as the primary initial diagnostic tool since the outbreak’s onset [13]. In addition, several studies indicate that the extent of opacification in the lungs of COVID-19 patients is a significant prognostic marker of mortality [2]. Figure 1 displays common presentations of COVID-19 in CT scans and chest radiographs.

Figure 1. Annotated examples of COVID-19 scans. 1a. Chest X-ray (CXR) with ground-glass opacification (GGO) in both lungs and consolidation (outlined in orange). 1b. A computed tomography (CT) scan that shows GGO (green) and consolidation (orange). 1c. A CT scan that indicates severe COVID-19 with a crazy-paving pattern. Images courtesy of [8] and inset courtesy of [5].

The COVID-19 pandemic is the first of the ML era, and pattern recognition algorithms have the potential to aid clinicians in the diagnosis and prognostication of COVID-19 via chest imaging [11]. Indeed, our systematic review [12]—which examines the entire literature from January 1, 2020 to October 3, 2020—identifies 320 published and preprint manuscripts that develop ML models using chest CT or radiographs for COVID-19 diagnosis or prognostication. Unfortunately, most papers contained systematic issues pertaining to image sourcing, quality, and documentation that introduce bias in developed models, ultimately making them unlikely to perform well in practice [4].

We now highlight some of the methodological issues in these studies and provide recommendations for the creation of ML models that incorporate imaging features and are suitable for clinical use. Fundamentally, we endorse the acquisition of high-quality data (preferably temporal) and detailed associated metadata.

Sourcing Issues

CXRs and CT scans for COVID-19 patients are commonly available in public repositories that do not independently verify the accuracy of the ground-truth labels of COVID-19 diagnosis. Researchers also frequently use a large pediatric pneumonia cohort [9] as the control “non-COVID-19” group during model development, which inadvertently trains a model to distinguish adults (COVID-19) from young children (non-COVID-19); the imaging differences in Figure 2 demonstrate this point. Furthermore, we discovered a prevalence of “Frankenstein” datasets, which are compiled from existing datasets and released under a new name. Scientists who train models with such Frankenstein datasets are unknowingly testing their models on overlapping data and thus producing optimistic performance metrics. Finally, many manuscripts utilize data from image repositories that consist of COVID-19 images from publications, preprints, or social media posts without acknowledging these images’ potential bias towards more interesting or unusual cases.

Figure 2. Annotated chest radiograph indicating the differences between pediatric and adult patients. Pediatric scans courtesy of [9] and adult scans courtesy of [3].

Quality Issues

Image quality can also drastically affect a model’s performance and robustness. In our review, we found that researchers rarely discuss image pre-processing steps. Image resizing is a common pre-processing step for deep learning models, but the effects of initial image resolution, input resolution, and aspect ratio adjustments on a model’s performance are unclear. We also do not know whether an image that one extracts from a publication—a “picture of a picture”—contains the same level of useful information as the original [10].

Documentation Issues

Although researchers should fully document the source and metadata for the images that contribute to model development, such documentation is lacking in most of the literature. For example, one must know whether radiographs were taken with mobile scanners and understand the distribution of scanner manufacturers, CT reconstruction kernels, and CT slice thicknesses. Without this information, scientists cannot comprehend biases in the dataset.

Multiple Data Streams

In the absence of a polymerase chain reaction test result or a suspected false positive/negative, one can make a clinical diagnosis with information from multiple data sources. A ML model for use in clinical trials must endeavour to emulate this ability.

Figure 3. Activation map depicting regions of interest for a neural network that detects lung abnormalities. Red areas indicate discriminative regions that the model uses to make its predictions. Original scans courtesy of the National Collaborating Centre for Infectious Diseases dataset [8].

In late February 2020, the Diamond Princess cruise ship had the largest cluster of positive COVID-19 cases outside of China. A study of 104 of these COVID-19-positive patients found that 73 percent (76 out of 104) were asymptomatic. However, 54 percent (41 out of 76) of these asymptomatic individuals displayed lung opacities on their CT scans. The converse was also true, as roughly 21.5 percent (six out of 28) of symptomatic patients had normal CT findings [6]. Imaging features alone are clearly not sufficient for accurate diagnosis.

Multidisciplinary Approach

Clinicians, ML experts, imaging specialists, mathematicians, and statisticians should all partake in the development process of a trustworthy model for clinical use. Clinical insight regarding model usability and data quality is invaluable. For instance, clinicians know that unstable patients may be primarily imaged from the front (i.e., anteroposterior) or whilst lying on their backs (supine) if they are critically unwell. Annotations that indicate this specification are commonly burnt into CXR images, and state-of-the-art models that classify lung pathologies on the widely used CheXpert dataset [7] frequently employ these annotations to inform the models (see Figure 3). Although such insights are not obvious to a non-clinician, they should heavily influence the development of ML algorithms to avoid irrelevant links between radiographs and outcomes.

Eye on the Prize

Unfortunately, most models have no viable path towards regulation and clinical use. The many hastily developed, poor-quality models in some manuscripts risk polluting the entire literature, obscuring high-quality models, and alienating clinicians who are eager to embrace ML methods. Maintaining a clear path towards the clinical adoption of algorithms throughout their development process—and working with relevant industrial partners and healthcare authorities—is crucial for ensuring model suitability for clinical implementation.

Acknowledgments: The authors thank their colleagues and partners in the AIX-COVNET collaboration for their contributions and input to this article, which is the result of many months of discussions and teamwork that began in March 2020.

References
[1] American College of Radiology. (2020, March 22). ACR recommendations for the use of chest radiography and computed tomography (CT) for suspected COVID-19 infection. Retrieved from https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection.
[2] Cleverley J., Piper, J., & Jones, M.M. (2020). The role of chest radiography in confirming covid-19 pneumonia. Brit. Med. J., 370, m2426.
[3] Cohen, J.P., Morrison, P., Dao, L., Roth, K., Duong, T.Q., & Ghassemi, M. (2020). COVID-19 image data collection: Prospective predictions are the future. Preprint, arXiv:2006.11988.
[4] Driggs, D., Selby, I., Roberts, M., Gkrania-Klotsas, E., Rudd, J.H.F., Yang, G., …, Schönlieb, C.-B. (2021). Machine learning for COVID-19 diagnosis and prognostication: Lessons for amplifying the signal whilst reducing the noise. Radiol. Art. Int., e210011.
[5] Harvey, D.M. (2005, June 25). Example of wallpaper group type p3. Computer-enhanced photograph of a street pavement in Zakopane, Poland. Retrieved from https://commons.wikimedia.org/wiki/File:Wallpaper_group-p3-1.jpg.
[6] Inui, S., Fujikawa, A., Jitsu, M., Kunishima, N., Watanabe, S., Suzuki, Y., …, Uwabe, Y. (2020). Chest CT findings in cases from the cruise ship Diamond Princess with coronavirus disease (COVID-19). Radiol. Cardio. Imag., 2(2).
[7] Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., …, Ng, A.Y. (2019). CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the thirty-third AAAI conference on artificial intelligence (pp. 590-597). Honolulu, HI: Association for the Advancement of Artificial Intelligence.
[8] Jacob, J., Alexander, D., Baillie, J.K., Berka, R., Bertolli, O., Blackwood, J., …, Joshi, I. (2020). Using imaging to combat a pandemic: Rationale for developing the UK national COVID-19 chest imaging database. Euro. Respir. J., 56(2).
[9] Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C.S., Liang, H., Baxter, S.L., …, Zhang, K. (2018). Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5), 1122-1131.
[10] Lensink, K., Parker, W., & Haber, E. (2020, July 13). Deep learning for COVID-19 diagnosis. SIAM News, 53(6), p. 1.
[11] Rajpurkar, P., Joshi, A., Pareek, A., Chen, P., Kiani, A., Irvin, J., …, Lungren, M.P. (2020). CheXpedition: Investigating generalization challenges for translation of chest X-ray algorithms to the clinical setting. Preprint, arXiv:2002.11379.
[12] Roberts, M., Driggs, D., Thorpe, M., Gilbey, J., Yeung, M., Ursprung, S., …, Schönlieb, C.-B. (2021). Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Int., 3, 199-217.
[13] The Royal College of Radiologists. (2020, March 12). The role of CT in patients suspected with COVID-19 infection. Retrieved from https://www.rcr.ac.uk/college/coronavirus-COVID-19-what-rcr-doing/rcr-position-role-ct-patients-suspected-COVID-19.
[14] Yang, W., Sirajuddin, A., Zhang, X., Liu, G., Teng, Z., Zhao, S., & Lu, M. (2020). The role of imaging in 2019 novel coronavirus pneumonia (COVID-19). Eur. Radiol., 30(9), 1-9.

Michael Roberts is a Senior Research Associate in Applied Mathematics at the University of Cambridge and a Senior Postdoctoral Fellow in Oncology R&D at AstraZeneca. Derek Driggs is a Ph.D. student in applied mathematics at the University of Cambridge. Ian Selby is a Clinical Research Associate at the University of Cambridge and an honorary radiology registrar at Cambridge University Hospitals NHS Foundation Trust. Evis Sala is a professor of oncological imaging at the University of Cambridge and an honorary consultant radiologist at Addenbrooke’s Hospital. Carola-Bibiane Schönlieb is a professor of applied mathematics at the University of Cambridge.