| March 29, 2023

New Surrogate Marker of Chronic Kidney Disease Progression and Mortality in Medical-word Virtual Space

The kidney is an important organ that filters blood, controls body fluids and various electrolytes, and removes toxins from the body. Chronic kidney disease (CKD) comprises a heterogeneous group of disorders during which the kidney experiences a gradual loss of function over the course of several years. It is a risk factor for end-stage kidney disease (ESKD)—in which the kidneys stop producing urine, necessitating dialysis or a kidney transplant for patient survival—as well as cardiovascular disease (CVD) and even death [6]. Various conditions can cause CKD, including diabetes mellitus, hypertension, glomerulonephritis, and aging. There are an estimated 37 million CKD patients in the U.S. and 15 million in Japan, all of whom face a heavy medical and economic burden [3].

Figure 1. Virtual space of medical words. The real-world data of chronic kidney disease (CKD) patients are connected to the virtual space of medical words via the natural transformation of category theory. Figure courtesy of the author.

Slowing the progression of CKD requires control of various risk factors. Accurate risk prediction can help identify high-risk patients and evaluate their conditions; it also connects individual candidates with suitable therapeutic strategies [4, 5, 7]. Because existing markers do not evaluate CKD progression with sufficient accuracy, new surrogate markers that reflect the complicated pathophysiology of CKD and patient conditions are therefore necessary. However, developing new markers based on CKD pathophysiology is difficult; doing so requires a huge sample size of CKD patients and acknowledgment of the assumptions that are associated with statistical methods [2]. Here, we consider inherent knowledge in the medical literature as a candidate data source for the pathophysiology of CKD.

Category theory is an area of modern mathematics that universally handles various structures of mathematical theory, as well as more general concepts; researchers have applied category theory to scenarios in mathematics, computer science, and even philosophy. It can thus link the different structures of a medical-word network with CKD patient data. A medical-word network that represents CKD patients’ conditions and prognoses would also have wide applicability to other diseases, such as heart disease and cancer. In this study, we used natural language processing (NLP) to establish a virtual space of medical-word vectors on the basis of category theory and develop a new surrogate marker for the renal and life prognoses of CKD patients.

A category can be the concept of relationships between the deterioration of kidney function and comorbid conditions or outcomes, such as ESKD and death. These relationships in real CKD patient data might even be connected to those in medical reports (see Figure 1). We employed NLP to construct a virtual space of medical words from the CKD-related literature in PubMed, wherein CKD-related words compose a network. The word vector \(\textit{CKD}\) was positioned near the vectors \(\textit{ESKD}\), \(\textit{CVD}\), and \(\textit{diabetes}\). Moreover, we found that vector calculation retained medical meanings that are related to the concepts of diabetic kidney disease (DKD) and acute kidney injury (AKI): \(\textit{CKD} + \textit{diabetes} = \textit{DKD}\) and \(\textit{CKD} \, – \textit{chronic} + \textit{acute} = \textit{AKI}\).

Figure 2. Distribution of medical words and patients. We categorized medical words and patients into groups based on medicine (orange), laboratory data (green), and comorbidities (purple). High-risk patients were in the same group of outcomes. Figure courtesy of the author.

Next, we transformed the data of CKD patients from a three-year cohort study [1] into a virtual space and linked it to the medical-word network via category theory (see Figure 2). We considered the relationship between a patient’s vectors and the primary outcome (ESKD or death) in the virtual space (inner product) as a surrogate marker of the outcome. The inner products between the vectors of patients and the outcomes predicted the outcomes highly accurately, with a c-statistic of 0.911. Application of the Cox proportional hazards model demonstrated that the outcome’s risk in the high-inner-product group was 21.92 times higher than the risk in the low-inner-product group (with a 95 percent confidence interval of 14.77, 32.51).

The medical-word network retained the relationships between CKD pathophysiological factors, successfully linked to real-world data, and reflected CKD progression. The network provides a novel disease model on the basis of the unification of medical knowledge as a new tool. For example, a list of medical words for the target disease includes various related terms such as medicine, risk factors, genes, and molecules. This network can help researchers identify new therapies and/or research needs.

In the future, we intend to investigate whether the medical-word virtual space reflects patients’ pathophysiological conditions and is potentially applicable to other diseases. For this purpose, we plan to conduct prospective clinical studies via a large database and use NLP to develop a more advanced category theory model.

Eiichiro Kanda delivered a contributed presentation on this research at the 2022 SIAM Conference on Mathematics of Data Science, which took place in San Diego, Ca., last September.

References
[1] Kanda, E., Epureanu, B.I., Adachi, T., & Kashihara, N. (2023). Machine-learning-based web system for the prediction of chronic kidney disease progression and mortality. PLOS Digit. Health, 2(1), e0000188
[2] Kanda, E., Kanno, Y., & Katsukawa, F. (2019). Identifying progressive CKD from healthy population using Bayesian network and artificial intelligence: A worksite-based cohort study. Sci. Rep., 9(1), 5082.
[3] Kanda, E., Kashihara, N., Kohsaka, S., Okami, S., & Yajima, T. (2020). Clinical and economic burden of hyperkalemia: A nationwide hospital-based cohort study in Japan. Kidney Med., 2(6), 742-752.e1.
[4] Kanda, E., Kashihara, N., Matsushita, K., Usui, T., Okada, H., Iseki, K., … Nangaku, M. (2018). Guidelines for clinical evaluation of chronic kidney disease: AMED research on regulatory science of pharmaceuticals and medical devices. Clin. Exp. Nephrol., 22(6), 1446-1475.
[5] Kanda, E., Kato, A., Masakane, I., & Kanno, Y. (2019). A new nutritional risk index for predicting mortality in hemodialysis patients: Nationwide cohort study. PLoS One, 14(3), e0214524.
[6] Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. (2013). KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int., Suppl., 3(1), 1-150.
[7] Levey, A.S., Gansevoort, R.T., Coresh, J., Inker, L.A., Heerspink, H.L., Grams, M.E., … Willis, K. (2020). Change in albuminuria and GFR as end points for clinical trials in early stages of CKD: A scientific workshop sponsored by the National Kidney Foundation in collaboration with the U.S. Food and Drug Administration and European Medicines Agency. Am. J. Kidney. Dis., 75(1), 84-104.

Eiichiro Kanda is a professor of data science and nephrology in medical science at Kawasaki Medical School in Japan. He received his M.D. and Ph.D. in medicine from Tokyo Medical and Dental University in Japan, and his Master of Public Health degree in epidemiology from Emory University. Kanda conducts research and develops mathematical models to find new therapies and improve patient prognoses.