| May 29, 2019

Capabilities, Collaboration, and Cancer

Co-Design for Advanced Computing Solutions for Cancer

A combination of data science and bench science has the potential to transform the way we diagnose, treat, and prevent cancer — and can do so a whole lot faster. At the Frederick National Laboratory for Cancer Research, we are working at the intersection of high-performance computing, artificial intelligence, and cancer research to accelerate the translation of biomedical data to scientific discoveries, medical treatments, and diagnostic and prevention tools for cancer and AIDS patients.

JDACS4C: A Path to Predictive Oncology

The Frederick National Lab is collaborating with the National Cancer Institute (NCI) and the Department of Energy (DOE)—along with the Argonne, Oak Ridge, Lawrence Livermore, and Los Alamos national laboratories—on the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program. Established in 2016 to accelerate cancer research, JDACS4C uses emerging exascale computing capabilities.

A plethora of data—combined with advanced computational methods and exascale computing technologies—generates a powerful tool for understanding cancer biology, diagnostics, prognostics, and treatment. JDACS4C employs high-performance computing and artificial intelligence to create prediction models that become increasingly more effective with each additional data set. Ultimately, machine learning will enable drug developers to better assess the activity of potential compounds and help clinicians make the best treatment decisions. In short, the program’s goal is to transform predictive oncology into a reality for everyone experiencing or at risk of any form of the disease.

High-Performance Computing is a Game Changer

Today’s most advanced computers operate well into the petascale range. Ascending from petascale to exascale represents an incredible thousand-fold increase in computational capability; petascale environments can perform over a quadrillion floating-point operations per second (FLOPS), while exascale systems are able to execute a quintillion FLOPS. This type of high-performance computing can grow to these enormous scales by harnessing the power of hundreds to thousands of computer nodes that analyze and model complex biological systems.

Increases in the aforementioned computational capacity and capabilities will yield a much more profound understanding of fundamental cancer systems’ biological behavior at multiple scales, including molecular, cellular, multi-cellular, and tumor. Exascale computing can provide resources at the necessary scale to refine models that simulate across scales. This then reveals new understanding for the impacts of protein modifications, genomic aberrations, epigenomic changes, complex structural dynamics, transport, and signaling on disease processes — including initiation, progression, and metastasis.

The NCI and DOE are jointly pioneering advances in both informatics and information technology that improve our understanding of cancer biology and its clinical applications. Because it involves both driving scientific questions and complex computing challenges, the collaborative principle is known as “co-design” or “co-development.” It is a team-oriented, iterative approach through which advances in technical capability and biological knowledge mutually inform each other. The close synergy between biological insights and technical developments create paths that lead to further discoveries, questions, hypotheses, and computational solutions employing deep learning to construct predictive models that one can validate and reproduce.

JDACS4C’s Three Pilots and CANDLE

JDACS4C leverages the DOE’s pre-exascale technologies and the NCI’s cancer-driving computing advances to explore three pilot areas:

Creation of preclinical predictive models
Simulation of molecular multiscale biological models of RAS-related cancers
Application of advanced computational capabilities to the Surveillance, Epidemiology and End Results (SEER) Program’s population-based cancer data.

Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) pilot programs and the the Cancer Distributed Learning Environment (CANDLE).

Preclinical Pilot

Using machine learning, the preclinical pilot aims to aggregate, integrate, and analyze large, heterogenous data sets and utilize the results to build predictive models. These models are continually refined after validation with experimental data. When combined with the direct study of cells and animal models, the mathematical/statistical approaches can produce hybrid models that better approximate the biology of human cancer. Together they have the potential to improve risk identification, pre-clinical drug screening, and treatment selection.

Molecular Pilot

The goal of the molecular-level pilot is to (i) better understand the mechanisms and dynamics of RAS interaction with the plasma membrane and subsequent signal transmission through binding and activation of RAF kinase, and (ii) develop machine learning-powered multiscale computational tools that enable molecular dynamic simulation of realistic biological systems.

Mutant RAS proteins drive at least 30 percent of all human cancers and are implicated in the majority of pancreatic, lung, and colorectal cancers. In oncogenic RAS-driven cancers, signaling is constitutive and results in uncontrolled proliferation. The development and use of emerging high-resolution experimental approaches, predictive models, machine or deep learning, and new, more finely-delineated simulations can deepen our understanding of RAS/RAF/membrane biology and reveal unrealized therapeutic opportunities.

Population (SEER) Pilot

The NCI’s SEER Program collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 34 percent of the U.S. population. Leveraging the capabilities of high-performance computing will lead to a more advanced population-based cancer surveillance program and develop a framework for modeling and simulation from individual patient to population level.

CANDLE

The Cancer Distributed Learning Environment (CANDLE) is a widely-accessible open-source computer environment. The program encourages development of next-generation computer simulations that researchers can use to better understand biological processes in cancer and ultimately predict the drugs that would be most effective against specific cancers.

CANDLE is a deep-learning framework designed to detect complex patterns in large data sets that may be invisible to researchers. It supports deep learning needs for all JDACS4C pilots and is funded by the DOE. With its high-performance computing environment, the DOE scientific leads tackle pilot-specific deep learning challenges. The Frederick National Lab translates that computational environment to the broader cancer research community.

The JDACS4C collaboration continues to show how team science and multi-institution collaborations can benefit from high-performance computing and artificial intelligence, and bring new insights to cancer research. We are continually sharing advances from across the pilots with the community for application on new sets of problems and expansion of the effort’s impact.

The author presented this information during a minisymposium at the 2019 SIAM Conference on Computational Science and Engineering, which took place earlier this year in Spokane, Wash.

George Zaki is a bioinformatics manager with the Frederick National Laboratory for Cancer Research and part of the Biomedical Informatics and Data Science directorate. He works closely with the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program to enable tools, pipelines, and machine learning algorithms for effective use in cancer research.