About the Author

Developing Algorithm Helps Predict Genetic Disease

By Lina Sorg

Approximately one in 200 to 500 people worldwide suffer from familial hypercholesterolemia (FH), an under-diagnosed cardiovascular disorder. Patients with FH have high levels of low-density lipoprotein (LDL) cholesterol in their blood. This “bad” cholesterol causes plaque on arterial walls, often resulting in blood clots, blocked arteries, heart attacks, stroke, and even premature death. Men with FH face a 50 percent chance of having a heart attack by age 50, and women have a 30 percent chance by age 60.

While preliminary treatment has the potential to significantly reduce these odds, most people with FH don’t know they have it; estimates indicate that only 10 percent of cases are diagnosed. The average age of diagnosis is 47, often after a heart attack or other cardiac complication.

Researchers at Stanford University’s School of Medicine are developing an algorithm to identify patients with undiagnosed FH. Cardiologist Joshua Knowles and Nigam Shah, a biomedical informatics professor, are leading the effort. They accessed the health records of 120 Stanford patients with FH and a pool of patients with high LDL levels who do not have FH – true positives and true negatives, respectively. Using machine learning, big data, and advanced software, the researchers are training a computer to recognize patterns based on the medical records of true positives. The resulting algorithm, which analyzes and categorizes patterns in age, cholesterol levels, and prescriptions, will scan Stanford’s health records for undiagnosed cases of FH.

The research is part of an initiative called FIND FH (Flag, Identify, Network, Deliver), a joint undertaking of Stanford Medicine, the nonprofit FH Foundation, and Amgen, a biotechnology firm.

The FH Foundation is also pioneering a broader initiative, in which the Stanford researchers are involved. Using lab and billing data of 89 million Americans with cardiovascular disease, the foundation is developing a similar algorithm that will identify patients for further screening, but on a national scale. This algorithm is currently focused on sensitivity rather than precision, according to FH Foundation chief technology officer Kelly Myers (read more of Myer’s explanation here). Right now the algorithm’s F1 score, a measure of accuracy, is around 0.6 (for comparison, guessing heads or tails on a coin flip is 0.5). Researchers have generated a preliminary map, subject to change as the algorithm evolves, that accurately pinpoints multiple regions known for high occurrences of FH, including central and southern Pennsylvania and areas of Louisiana.

In the United States, testing for abnormal lipid levels is the most common form of FH diagnosis. Unknown genetic irregularities and mutations complicate genetic testing. FH is autosomal dominant, which implies that parents with the disease have a 50 percent chance of passing it to their children. Successful predictive algorithms can help spread FH awareness and increase early diagnosis, which can minimize the cost of care and allow patients to manage the condition with existing medication, diet, and exercise.

Although both algorithms require further adjustment to better handle the false positive paradox, Knowles and Shah remain hopeful about the outcome. While machine learning techniques have not been widely used in medicine, the researchers believe such techniques have the capacity to advance healthcare. Ultimately, they hope to make the algorithms available to multiple electronic health platforms, and use them to identify patients at high risk for other genetic diseases.

Learn more about the Stanford study and Knowles’ thoughts here.

 Lina Sorg is the associate editor of SIAM News