About the Author

Structure-based Drug Discovery and Ensemble Docking for SARS-COV-2 Proteins

By Lina Sorg

Drug discovery—the process by which researchers identify and develop potential new medicines and novel drug treatments that interact with therapeutic targets—is critically important for understanding and mitigating SARS-CoV-2 and COVID-19. In particular, structure-based drug discovery and design relies upon the three-dimensional (3D) structure of the biomolecular targets. These drug targets are typically protein models that exist in 3D architectures with specific binding sites at which small molecules bind. The free energy difference—the balance between reactants and products in a given reaction—contributes to the chemical bind. “Understanding the physics of this would be ideal because that’s what nature does to bind the small molecule,” Jeremy C. Smith of Oak Ridge National Laboratory said.

Figure 1. Screenshot of a molecular dynamics (MD) simulation that depicts targets’ internal motions.
The strength of chemical binding helps researchers determine whether a drug is efficacious. It is also important in ascertaining the safety of prospective drugs, as off-target binding (when a molecule binds to an unintended protein) is a major source of side effects and toxicity. During a minisymposium presentation at the 2021 SIAM Conference on Computational Science and Engineering, which is taking place virtually this week, Smith discussed his efforts to develop a supercomputer-driven pipeline for in-silico drug discovery to treat COVID-19. To do so, he focused on physics-based interactions.

Smith began his presentation with a discussion about virtual screening — a computational technique that searches the libraries of small molecules and identifies structures that possess the highest likelihood of binding to a drug target. “Virtual screening is making a comeback,” Smith said. It is becoming more central to current researchers’ drug discovery efforts, in part due to the increasing speed of computers and the higher proportion of targets. Because many of the compounds in a random library will not be a “hit” and experts cannot perfectly predict which molecules will bind, Smith and his team instead aimed to enrich the library to inspire a higher density of “hits.” On average, nine of out of 10 chemicals that scientists predict will bind actually fail to do so.

Increasing computing power has allowed computational scientists to account for the reality of the targets’ wiggling motion. Smith shared a molecular dynamics (MD) simulation that depicted the internal movements of the targets (see Figure 1). Binding sites therefore change their shapes over time, allowing researchers to examine the different shapes to better understand the binding process.

Smith next moved to ensemble docking, a process of structure-based drug discovery that utilizes dynamical simulations of target proteins and MD results to dock compound databases into representative protein binding-site conformations. This method accounts for the binding sites’ dynamic properties. Ensemble docking is a primary technique in COVID-19 drug discovery studies, but Smith has been utilizing the practice for several years. To orient listeners, he presented an image of a virion for SARS-CoV-2 that identifies a myriad of potential targets for drugs (see Figure 2). 

Figure 2. SARS-CoV-2 virion.

Before transitioning into the specifics of the supercomputing pipeline for COVID-19 drug discovery, Smith summarized the general discovery process. Early-stage drug discovery relies upon a classical mechanism that inhibits an enzyme or receptor — specifically to stop the enzyme from causing a reaction or prevent the receptor from changing its shape. After selecting viral or human protein targets that they think will have therapeutic interest, researchers use high-performance computing (HPC) methods to perform MD simulations and generate target configurations. Viral high-throughput screening docks then compound libraries to target configurations by means of HPC. After scientists rank the compounds, experimental testing of the top-ranked compounds commences and ultimately concludes with clinical trials.

When the SARS-CoV-2 genome became available in mid-January of 2020, Smith’s colleague at Oak Ridge built models of COVID-19 spikes and executed ensemble docking on the Summit Supercomputer at the Oak Ridge Leadership Computing Facility. This marked one of the first instances of supercomputing in the context of COVID-19. However, Smith realized that the team needed to speed up the computational time to a solution. “The way we make dynamics calculations run efficiently on a supercomputer is by employing a method called enhanced sampling of the structures,” he said. This practice involves the creation of replicas of the protein structure that are simulated at slightly different increasing temperatures. The group then conducted a replica exchange MD simulation, which swaps temperatures to rapidly create a statistically correct ensemble of the formative space that the proteins will sample. This ensemble scales very well to the whole supercomputing machine and allows the researchers to generate massive amounts of space in a short time.

Smith concluded his talk with some comments about the preliminary results for 23 systems that involve eight proteins of the proteome of SARS-CoV-2. He and his colleagues have ensemble docked repurposing databases to 10 configurations of each SARS-CoV-2 system via AutoDock Vina (an open-source program for molecular docking). The use of the Autodock graphics processing unit with the Summit Supercomputer facilitates the docking of one billion compounds in less than 24 hours, thus expediting an otherwise very tedious process and advancing understanding of SARS-CoV-2 and COVID-19. 

 Lina Sorg is the managing editor of SIAM News.