A Combinatorial Approach to Predict RNA Structure

By Karthika Swamy Cohen

Ribonucleic acid molecules, or RNA, come in many forms. Both regulatory RNAs, which regulate gene expression, and messenger RNAs, which code proteins, contain highly structured components that are essential to their function.

Many of the coding regions in RNA contain functional structural motifs. Prediction of RNA structure is not only important for our understanding of basic biological functions, but is also crucial to many medical conditions. However, information about RNA structures is lacking since most of our knowledge comes exclusively from primary sequence data, and tools for determining secondary and tertiary structure are limited.

Knowing RNA sequence base pairings is critical to understanding its function. However, predicting a minimum free energy secondary structure, even for short sequences, does not always result in the native secondary structure.

At a minisymposium at the SIAM Annual Meeting held in Pittsburgh, Pa, earlier this month, Christine Heitsch of the Georgia Institute of Technology described mathematical methods that can be used for RNA structure prediction.

Figure 1.The four Vibrio Cholerae quorum regulating RNA predicted minimum free energy secondary structures drawn on a petri dish. Image credit: Henke J. and Bassler B. Princeton.

Heitsch began with an introduction to regulatory RNA molecules that control quorum sensing in bacteria, a cell–cell communication mechanism that the organisms use for collective regulation of gene expression.

The five “Qrr sRNAs” that control quorum sensing have a high degree of sequence similarity, but surprisingly, their structures are relatively different. The similarities in sequences imply a high degree of functional significance, Heitsch said, despite the differences in structure.

“What insights can mathematics give us about biology?” Heitsch asked. Her group wanted to understand the structural differences in these molecules using mathematics. However, they were challenged by the fact that the structures had too many suboptimal folds. For instance, while the first sequence can form a three-arm structure, it can also form a four-arm structure. But the thermodynamics implies that the three-arm structure is more optimal.

Hence, prediction accuracy improves when suboptimal structures are considered. Currently, the method typically used is to sample structures stochastically from the Boltzmann distribution and identify the set of base pairs that dominate the low-energy secondary structures, and hence are more probable in nature. The challenge is to extract the most meaningful structural signal from a noisy Boltzmann sample.

Figure 2. Conservation strongly implies functional significance, yet minimum free energy predictions for RNA vary and native structures are not yet known.

High probability helices correlate with native pairing and generally provide a structural signal strong enough to be identified by visual inspection. However, except for these specific well-determined regions, the signal is not as clear –there are regions which have significant competing alternatives and it’s hard to predict what the substructure might be in those regions.

Determining this is important to understand RNA structure and function, since in many cases, RNA functionality may depend on switching from one conformation to another. Current techniques identify dominant combinations of base pairs by dividing the Boltzmann sample into groups and determining a representative structure for each one. However, support for different substructures can be lost within a group or diluted across groups.

Given these pitfalls, Heitsch uses a novel combinatorial method, RNA structure profiling, which identifies the most probable combinations of base pairs across the Boltzmann ensemble. Her group uses ensemble-based approaches to predict the secondary structure of RNA molecules; the method identifies patterns in structural elements across a Boltzmann sample and is based on classifying structures as defined by features chosen from well-defined structural units called helix classes.

The first step in the approach is to create a Boltzmann sample, and then generate a list of helix classes. This is followed by determining the features of helix classes based on frequency, and finally selecting profiles based on frequency. This consolidates structural signals across profiles.

Heitsch demonstrated combinatorial profiling as a straightforward, stable, and comprehensive method, which clearly separates structural signal from thermodynamic noise, and has significant implications for precision, conditioning, and potency of RNA secondary structure prediction.

Karthika Swamy Cohen is the managing editor of SIAM News.