Viruses are a major public health burden. The continued emergence of viruses such as Zika and the large number of HIV infections worldwide are only two of many examples illustrating the urgent need for new, anti-viral solutions. Tackling this issue requires a better understanding of the mechanisms by which viruses form and infect their hosts.
Group, graph, and tiling theory can provide new insights into the architecture and formation of virus particles. Such insights are made possible by the highly regular geometric structures of the viral protein containers that encapsulate and thus provide protection to the viral genomes. In many viruses, these capsids resemble surface lattices with icosahedral symmetry, akin to Buckminster Fuller domes. We have developed group theoretical techniques to investigate the geometric constraints on such container architectures , and have shown that similar principles apply more widely in science, e.g. in fullerenes in carbon chemistry [1, 10]. We have also extended our tiling approach, Viral Tiling Theory, to assist experimentalists in understanding the geometries of self-assembling protein nanoparticles (SAPNs) . The structural organisation of these particles, which are used to design Malaria vaccines, follow mathematical rules similar to those underpinning the assembly of virus particles.
These structural models have paved the way for a major discovery that has fundamentally changed our understanding of virus formation. For decades researchers thought that viral genomes act like passive passengers in the formation of the viral capsids. Their impact on the formation process, if any, was attributed to electrostatics alone, triggered by the condensation of positively-charged capsid proteins on negatively-charged RNAs. Using mathematical insights, we have demonstrated that this view is not sufficient to account for the formation of single-stranded RNA viruses, a major group of viruses containing important human pathogens such as Hepatitis C, HIV, and the common cold.
The key to this discovery is rooted in combinatorial and graph-theoretical arguments exploiting viral geometry. The contact points between the encapsulated genomic RNA and the inner capsid surface act as the vertices of a polyhedron related in shape and symmetry to that of the capsid itself. The order in which the RNA-protein contacts are formed must correspond to a connected path visiting every vertex exactly once. Thus, the genome organisation in proximity to the capsid shell is topologically equivalent to a Hamiltonian path on this polyhedron (see Figures 1 and 2).
Figure 1. The Hamiltonian path representing genome organization in proximity to the MS2 capsid is shown in yellow inside a crystal structure of the capsid, together with a genomic fragment containing packaging signals (PSs) identified in . The core sequence motif, the two As shown in magenta, is very sparse, demonstrating why sequence analysis alone is not sufficient to identify these PSs. Image credit: Richard Bingham, adapted from a figure by Tom Keef.
A classification of these Hamiltonian paths is a powerful tool in interrogating viral genomes for the existence of sequence-specific contacts between genomes and their capsids . Such contacts are difficult to identify with bioinformatics alone due to the lack of any repeated, contiguous sequence motif of sufficient length in the genome, perhaps accounting for the long-held belief that any interactions between genome and capsid must rely entirely on electrostatics. Using our Hamiltonian paths classification, we showed that, by contrast, there is an ensemble of cryptic signals that vary around a minimal core sequence motif. These signals—which we termed packaging signals (PSs), in analogy to the single known specific contact previously identified by virologists—correspond to short, in many cases even disconnected, sequence elements presented in the context of specific types of RNA secondary structures, i.e. RNA shapes arising as a consequence of Watson-Crick base pairing (see Figure 1). A striking outcome of this work is the conclusion that genome organisation is much more constrained inside viral particles than previously appreciated. Indeed, only a very small number of the possible Hamiltonian path organisations can actually be realized by a virus particle for geometric and combinatorial reasons. This initially-surprising result has subsequently fit excellently with cryo-electron microscopy data  (see Figure 3), corroborating this astonishing mathematical conclusion.
Figure 2. A planar net representing the MS2 capsid architecture shows the location of the Hamiltonian path in Figure 1 with respect to the positions of the capsid building blocks, here represented as rhombs. Blue/green rhombs are in contact with PSs in the viral genome, and the yellow Hamiltonian path meets every such contact exactly once. Image credit: Richard Bingham, adapted from a figure in  by James Geraets.
The variation around a minimal core sequence motif in the PS ensemble of a viral genome may explain why this hidden code has so long been overlooked. It also opens up the puzzle of how this code actually functions. To address the mechanistic implications of this code and provide an explanation for this variation in the recognition motifs, we used Gillespie-type algorithms to study the assembly of a dodecahedral shell as a proxy for a viral capsid [2, 4]. We monitored the assembly of the pentagonal building blocks in the presence of hypothetical RNAs, each with 12 PSs capable of binding to these assembling units. From a biophysical point of view, the variation of the PS motifs across the genome manifests itself in differing PS affinities for capsid protein. We therefore allowed the affinities of individual PSs in a viral RNA to vary between three settings, representing weak, intermediate, and strong interactions.
Figure 3. The MS2 capsid is shown from the outside (left half), and as a cross-sectional view revealing ordered RNA density (right half) based on results in . Our Hamiltonian path corresponds to the outer ring of RNA density in proximity to capsid. Image credit: Richard Bingham, adapted from a figure in  by James Geraets.
Interestingly, we observed significant differences in capsid yield for assembly around RNAs with distinct affinity distributions. The simulations alone could not explain why this was the case, but graph theory again was key in providing answers. We used Hamiltonian paths on the polyhedron representing all possible connections between neighboring binding sites—in this case an icosahedron—to classify different assembly scenarios. We showed that RNAs with better performing affinity distributions were organised inside the fully-assembled capsids via a very limited range of Hamiltonian paths. Translating these Hamiltonian path organisations into information regarding the geometries of the partially-formed capsids on the pathway to the complete particle showed that PSs act collectively to bias their geometries towards structures with larger numbers of protein-protein bonds. The hidden PS code thus acts as a construction manual for viral capsids, solving the viral equivalent to Levinthal’s Paradox in protein folding. Indeed, the PS code directs assembly towards a small number of efficient pathways within the vast complexity of combinatorially possible ones, hence giving the virus an advantage in the arms race against the host’s immune defenses.
Our modeling offered another astonishing conclusion. The impact of this construction manual can only be observed if capsid protein is ramped up slowly—as in the case of a real viral infection—rather than added in a single step, as is often the case in in vitro experiments. Inspired by this mathematical result, our collaborators at the University of Leeds performed assembly experiments under the condition of such a protein ramp, leading to the first direct experimental demonstration of PS-mediated assembly . In collaboration with experimentalists at the University of Leeds—in particular my Wellcome Trust co-investigator Peter Stockley—and the University of Helsinki, we have since identified PSs in a number of human viruses and jointly hold a patent exploiting this discovery in anti-viral therapy.
A recent review by Peter Prevelige, a world-leading authority on virus assembly, is entitled “Follow the Yellow Brick Road: A Paradigm Shift in Virus Assembly” , in reference to our graph theoretical approach; we usually depict Hamiltonian paths in yellow (as in Figures 1 and 2). Ultimately, this research demonstrates that mathematics can drive discovery in molecular biology, functioning as a key player in interdisciplinary efforts to understand how viruses form, evolve, and infect their hosts.
This article is based on an invited lecture by Reidun Twarock at the SIAM Conference on the Life Sciences, which was held in Boston this July.
 Dechant, P., Wardman, J., Keef, T., & Twarock, R. (2014). Viruses and fullerenes – symmetry as a common thread? Acta Cryst A, 70, 162-167.
 Dykeman, E.C., Stockley, P.G., & Twarock, R. (2013). Building a viral capsid in the presence of genomic RNA. Phys. Rev. E, 87, 022717.
 Dykeman, E.C., Stockley, P.G., & Twarock, R. (2013). Packaging signals in two single-stranded RNA viruses imply a conserved assembly mechanism and geometry of the packaged genome. J. Mol. Biol., 425, 3235-3249.
 Dykeman, E.C., Stockley, P.G., & Twarock, R. (2014). Solving a Levinthal’s Paradox for Virus Assembly suggests a novel anti-viral therapy. PNAS, 111, 5361-5366.
 Geraets, J.A., Dykeman, E.C., Stockley, P.G., Ranson, N.A., & Twarock, R. (2015). Asymmetric genome organization in an RNA virus revealed via graph-theoretical analysis of tomographic data. PLoS Computational Biology, 11(3), e1004146.
 Indelicato, G., Wahome, N., Ringler, P., Müller, S.A., Nieh, M.P., Burkhard, P., & Twarock, R. (2016). Principles Governing the Self-Assembly of Coiled-Coil Protein Nanoparticles. Biophys J., 110, 646-660.
 Keef, T., Wardman, J.P., Ranson, N.A., Stockley, P.G., & Twarock, R. (2013). Structural constraints on the three-dimensional geometry of simple viruses: case studies of a new predictive tool. Acta Crystallogr A., 69, 140-150.
 Patel, N., Dykeman, E.C., Coutts, R.H.A., Lomonossoff, G., Rowlands, D.J., Phillips, S.E.V….Stockley, P.G. (2015). Revealing the density of encoded functions in a viral RNA. PNAS, 112, 2227-2232.
 Prevelige, P. (2016). Follow the Yellow Brick Road: A Paradigm Shift in Virus Assembly. J. Mol. Biol., 428, 416-418.
 Verberck, B. (2014, April). Know your onions. Nature Physics, 10, 244.