We are sorry, but we are not currently accepting applications for postings.
In addition, you can continue to use the publications you have published until now.
In the macromolecular world, the evolution of two building blocks (two nucleotides or two amino acids) can be interdependent in various ways, including: (i) a mutation at one site compensates a deleterious mutation at another site or (ii) mutations at two different sites are lethal only when they co-occur in the same genome. These two situations are known as “compensatory mutations” and “synthetic lethals,” respectively. Although the first category has been studied extensively, especially since the 1970s—a period of time during which prokaryotic genetics grew by leaps and bounds—the second remained unstudied until the late 1990s. Studies on yeast first placed synthetic lethals at the forefront; at the beginning of the new century, therapies against cancers relied on such relationships. Finally, in recent years, synthetic lethals were used to develop stable therapies against RNA viruses, and these studies revealed a promising method for developing vaccines against these viruses. Here, we review the current understanding of these two situations and the implications of both compensatory mutations and synthetic lethals for the elucidation of biological pathways, cancer research, evolution, and gene expression.
Keywords covariation, compensatory mutations, synthetic lethals, drug targets, RNA viruses.
Author info
1 Paris Diderot University, Sorbonne Paris Cité University, Paris, France
2 Jacques Monod Institute, CNRS, UMR7592, Paris, France
3 Department of Functional and Adaptative Biology, CNRS, UMR8251, Paris, France
4 Atelier de Bio Informatique, 75005, Paris, France
ReceivedJul 10 2014 AcceptedSep 26 2014 PublishedOct 22 2014
CitationBazin C, Coupaye R, Middendorp S, Vanet A (2014) Between compensatory mutations and synthetic lethals: genetic mutations, a new challenge for tomorrow's medicine. Science Postprint 1(1): e00035. doi: 10.14340/spp.2014.10R0002
Copyright©2014 The Authors. Science Postprint is published by General Healthcare Inc. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs 2.1 Japan (CC BY-NC-ND 2.1 JP) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
FundingOur research is not supported by any funding.
Competing interestNo relevant competing interest were disclosed
Donation messageOur team is working in silico to find a new strategy against RNA-viruses as HIV, flu, ebola virus or Hepatitis C virus. These viruses are very dangerous because they accumulate mutations that can allow the drug to escape. To bypass this problem, we have described a bioinformatic method which compute new therapeutic targets using viral protein alignments. We now would like to confirm this concept by biological experiments and use chemistry studies to define the future drug which will bind to the viral protein.
Patents
Vanet A, Muller-Trutwin M, Valère T, Brouillet S, Ollivier E, Marsan L, inventor. Method for identifying combinations of motifs that do not mutate simultaneously In a set of viral polypeptide sequences comprising a putative drug binding site. Patent US 7917303 B2. Mar 29 2011. Available from: http://www.lens.org/lens/patent/US_7917303_B2.
Vanet A, Muller-Trutwin M, Valère T, inventor. Method for identifying motifs and/or combinations of motifs having a boolean state of predetermined mutation in a set of sequences and its applications. Patent US 7734421 B2, Jun 8 2010. Available from: http://www.lens.org/lens/patent/US_7734421_B2.
Vanet A, Muller-Trutwin M, Valère T. Method for identifying motifs and/or combinations of motifs having a boolean state of predetermined mutation in a set of sequences and its application. Patent US 8032309 B2, Oct 4 2011. Available from: http://www.lens.org/lens/patent/US_8032309_B2.
Corresponding authorAnne Vanet
AddressParis Diderot University, Sorbonne Paris Cité University, F-75013, Paris, France
E-mailanne.vanet@univ-paris-diderot.fr
During the twentieth century, geneticists used logical inference and macroscopic methods to facilitate the discovery of microscopic interactions. The reductionist paradigm was, and still is, that a function must be performed by a structure. When this function is so complex that it requires several molecules to be driven, these various molecules should cooperate, either directly, by interacting to form a protein complex, or indirectly, by sharing the same metabolic or signaling pathway. In the case of protein complexes, it is easy to conceive that the residues mediating the interactions are not independent of each other: they form either pairs or groups of covariant residues, also known as epistatic interactions. We will focus on two major covariant relationships: compensatory mutations (CM) and synthetic lethals (SL). Classically, the mutation that compensates a deleterious mutation has been uncovered by the appearance of revertants. Jarvik and Botstein 1 explain that “if a revertant is not a true wild type, they may have acquired suppressors–new mutations that act so as to correct, replace, or bypass the original defect.” Figure 1 shows a physical analogy involving an electrical circuit to explain the effects of covariant relationships: a mutation causes an increase in electrical resistance, which decreases the current flow. A second mutation decreases the resistance of the other resistor and the current intensity is restored (when the light is on, the combination has a wild-type phenotype, and when it is off, the combination is deleterious). In the case of metabolic or signaling pathways, according to Jarvik and Botstein, reversion can imply a restoration of the original sequence (Figure 1B). Otherwise, a first deleterious mutation appears at random in the genome followed by an additional mutation elsewhere in the genome (second-site reversion), which will remedy the defect caused by the first mutation (Figure 1C). The latter mutation is known as a CM. Both mutations can be located in the same gene (intragenic CM) or in different genes (intergenic CM). In the latter case, they often involve actors in the same metabolic pathway or protein complex. Interacting proteins within complexes are studied by detecting covariant pairs, as follows: a first mutation abolishes a function, which is restored by a second mutation. Thus, the return to a wild-type phenotype is made possible either by the restoration of the mutated factor’s functionality itself, or through a change elsewhere that compensates for the abolition of the first activity. Structural CMs are also powerful tools for defining spatial interactions. All of these scenarios will be treated in extenso below.
We illustrate all possible scenarios that compare double mutations to the wild type: when the light is on, the combination produces a wild-type phenotype, when it is off, the combination is deleterious. M1 to M6 are point mutations.
A: wild-type circuit configuration.
B: revertant at the same position circuit configuration. M1 is a deleterious mutation at position X, M2 is a mutation at the same position, which restores the wild type sequence.
C: compensatory mutation pairs circuit configuration. The first mutation, M3, is deleterious alone but is compensated by the second mutation, M4, which has no effect alone and is not located at the same position.
D: synthetic lethal pair circuit configuration. M5 and M6 have no effect alone, but together are lethal.
For their part, the SL pairs were finally recognized mainly through studies of Saccharomyces cerevisiaeGiaever’s team 2 has shown that no more than a thousand proteins (out of more than six thousand) perform the so-called essential functions: only 18.2% of yeast genes are absolutely essential. However, the number of supposed essential functions is larger than this, which pleads in favor of the existence of functions performed by protein complexes or by alternative pathways. These complexes and/or pathways can be borne by covariant phenomena involving SL covariant pairs. Synthetic lethality describes a genetic interaction in which the combination of two separately non-lethal mutations results in lethality (Figure 1D). The central question involves identifying residues of several proteins that either cooperate to perform these essential functions (in the case of CMs) or that cannot co-occur due to synthetic lethality.
Originally CM and SL were studied using genetic, biochemical, and evolutionary biology methods. In the genomic era, bioinformatics can be used to detect them in interacting networks. These historical fields share a rich vocabulary behind which are hidden the intricacies related to biological covariation: epistasis, coevolved residues, covariants, CMs, extragenic suppressors, revertants, correlated mutations, etc. We focused on these two concepts because they initially represented a tool for studying biological functions and they have now become a powerful therapeutic tool. We also advance the vision that the landscape of epistasis is richer than previously supposed because of the existence of networks involving CM and SL.
Originally, prokaryotic organisms were the reference model for studying metabolic pathways using CM. One of the first famous revertants was identified by Beckwith. It restored the sensitivity to the Lactose operon repressor 3 and then the wild-type phenotype. Following the tremendous work by Jacob and Monod, the notion of CM expanded beyond the questions of phenotype and genetic loci by introducing the concept of individual nucleotides in the DNA sequence. This moment marks the transition from classical genetics to molecular biology, which subsequently opened the way to genomics. CM could henceforth be identified with all possible molecular precision.
After decades of work on prokaryotic models, the CMs had been analyzed in haploid eukaryotes, yeast in particular, in order to dissect the analyzable pathways. In a covariant pair, the second mutation is called the suppressor, in reference to the revertant phenotype it causes. The second mutation can be located either in the originally mutated gene (intragenic suppressor) or in another gene (extragenic suppressor).
How should functional reversion by an extragenic suppressor be interpreted? Figure 2 illustrates the simplest case, in which two interacting proteins are necessary for a function 4In the case of proteins that physically interact, as in a "key/lock" mechanism, if the lock is changed, the key must also be changed to retain the function. This implies that CMs have physical and/or structural interactions, and the second mutation then belongs to the class of conformational suppressors. The suppressor reflects compensating interactions between gene products that interact physically via a “lock and key” mechanism (Figure 2). The understanding of this mechanism opened various domains of research, especially in the comprehension of functional biological pathways 4, as CMs between proteins in the same pathway in the same organism 5-7, pathway links which were not a priori related 8, or links between two different organisms such as a host and a pathogen 9, 10, facilitated a tremendous step forward in comprehension of the pathological mechanism.
CMs also exist in the RNA domain 11, 12 mostly between snoRNA and rRNA, and are important in understanding ribosome structure (Figure 3A). Suppressors of ribosomal frame shifts allow a P-site realignment during translation 13Some CMs have even been described between a protein and an RNA 14-16.
A: As explained in 11, compensatory mutations in one-half of helix VI fully restored cell growth.; B: Here two point mutations alter the secondary structure, which is restored by two compensatory mutations. WT: Wild Type RNA
The concept of CM is very broad and includes all types of protein relationships within the cell. An array approach pioneered in S. cerevisiae has facilitated the development of high-resolution CM genetic mapping techniques; it has shown that CM relationships can be numerous and complex 17.
At another level, important biological fields such as antibiotic resistance have greatly benefited from our growing understanding of such compensatory mechanisms 16, 18, 19 involved in the infectivity of a pathogen 19, which have provided significant openings for development of new therapies.
Not all CMs are necessarily located in close physical proximity to the compensated mutation 20-23When they are far from each other, the covariant pair cannot be explained by a physical interaction between residues, but rather by a functional relationship participating in the establishment and maintenance of the relationship between secondary structural elements that are distant in the primary sequence and in the tridimensional structure of the protein 23For example, intergenic compensation can help restore proper peptide synthesis following an initial mutation that has created a stop codon in the open reading frame. One such example was described for the first time in the 1950s, a tRNA-aminoacyl tRNA synthetase charged with an amino acid that did not correspond to the anticodon it carried 24Subsequent to this first example, many other teams have identified other mutations in the translational apparatus 25, 26.
The formation of a biological bypass is one way to overcome a deleterious mutation. Thus, the deletion of the gene for T7 ligase is compensated by other mutations in different metabolic pathways for DNA formation to recover the original fitness 27Astonishingly, a novel approach has been proposed that can identify a CM pair using SL genetic interaction analysis 28This method identifies pairs of genes rarely bound by the SL relationship, but each taken separately is connected by many SL interactions with other genes. This approach selects pairs of CMs that repair the deleterious effect of one of the two genes, if it was mutated, without the need for physical interaction.
The definition of an intragenic CM is the following: it is a mutation that has a beneficial effect on the fitness of an organism that contains a deleterious mutation in the same gene. This secondary mutation (which compensate the first mutation) does not seem to appear randomly in the gene sequence, but rather close to the deleterious mutation, making this system a predictable mechanism 29Considering all revertants, the overall frequency of CM was approximately 70%, against 30% reverse mutation. Half of all CMs are located intragenically 30Although this percentage is rather high, it is undervalued because it does not include deletion-related mutations. Taking into account this bias, the rate of intragenic reversion is estimated at 78%, compared to 22% for extragenic reversion. Moreover, Poon et al. 31, based on a compilation of data from several genes in different organisms, estimate that among deleterious mutations, 20% cannot be compensated, 43% can be compensated by some CMs, and 37% are associated with at least 10 CMs 31, which illustrates the extreme complexity of the interaction networks and demonstrates that few proteins are essential by themselves. Chen et al. 32 focused on tandem-based substitution mutations and showed that CMs appear in 0.4% of single nucleotide substitutions in human germline cells. Bazykin et al. 33, who are interested in multiple amino acid variation, have shown that this phenomenon occurs rapidly via a succession of amino acid replacements at many sites, instead of via relaxed negative selection. Indeed, many different mutational trajectories can be described, but only few are possible 34.
Determination of intracompensatory pairs of mutations has also been used to determine the location of non-coding RNA, which can function as structural, regulatory, or even catalytic RNA. Indeed, if the position of a gene in a DNA sequence can be predicted by bioinformatic analyses using translation signals and open reading frames, these specific signals cannot be used to identify non-coding RNA.
However, even if these RNAs contain no translation signals, they have a particular secondary structure that must be preserved for them to perform their role 35Polymorphism analysis of RNA sequences and their secondary structures will highlight the maintenance of these structures via study of CM pairs 36 (Figure 3B). Several software packages have been developed to analyze deleterious-compensatory pairs of allosteric ribozymes and riboswitches 37.
In evolutionary biology, the ability of individuals to reproduce can be summarized as their fitness. All organisms that are able to reproduce have a specific fitness. The appearance of a mutation is usually deleterious and sometimes neutral. When the mutation is deleterious, the organism’s fitness decreases. More often than not, a secondary mutation is selected when its appearance compensates the first mutation to restore the original fitness. External pressure can require selection for a mutation that allows an organism to survive, even if this mutation decreases the general fitness of the organism. Indeed, for 60 years —since humans have been using antibiotics to treat infectious diseases—a strong selective pressure has been applied to infectious organisms such as bacteria; later this pressure was extended to viruses, fungi, insects, and plants because of the use of antiviral and antifungal therapies, insecticides, and herbicides. A first mutation, which allow these organisms to resist the treatment, quickly appeared, but in most cases showed decreased fitness. Pathogenic organisms possess several mechanisms for recovering their original fitness 38: intragenic compensatory mechanisms have been described above, but we would like to emphasize a study on rifampicin resistance mutations in Pseudomonas aeruginosaIn this work, it was shown that two deleterious mutations, when associated in the same genomic background, often yielded improved fitness, and compensated for each other’s costs 39The same conclusion was reached in studies of other organisms such as HIV 40 and Candida glabrata 41In these cases, CMs allowed recovery of higher fitness. Corbett-Detig 42 explained that fitness epistasis is widespread within natural populations in which many different alleles are present in the same species. The material necessary for reproduction is segregated within species, and does not necessarily require the emergence of genetically incompatible mutations, as previously thought. This segregation pressure is only released when divergent lineages hybridize.
What impact do these interactions have on the evolution of species? Phylogeneticists describe coevolution as an example of microadaptation: a CM is an adaptation to the first mutation to appear. This adaptation will be positively selected if the final fitness of the double-mutated organism is higher than the fitness of the wild-type organism 43If so, how does natural selection influence genetic equilibrium and how can scientists better understand species’ genetic polymorphisms? Do only neutral mutations accumulate, or can we identify mutations selected to compensate original deleterious mutations? Coevolution can result from parallel or convergent changes with a very slow substitution rate. They were first described as neutral mutations 44, but were also presented as deleterious mutations, revealing a more highly constrained evolutionary landscape 45, 46.
Unfortunately, this controversy remains unresolved 47We are of the opinion that if a mutation appears that is deleterious in a given genetic background, the organism disappears. However, in another background the same mutation might not be deleterious and would appear neutral because its CM already exists in this genomic context 48Genomic screening can differentiate neutral intragenic polymorphism from CM by selecting some revertants, as discussed by Jarvik and Botstein 1.
Kondrashov's team 48 have shown that about 90% of amino acid substitutions have a neutral or beneficial impact only in the genetic background in which they occur, and could be deleterious elsewhere.
Epistasis is not just about compensation, however complex it may be. In “The Genetics of Natural Populations,” published in 1946, Dobzhansky 49 revealed an SL phenotype via crossing and recombination of chromosomes in Drosophila pseudoobscuraWhen detected by crossbreeding, this phenomenon is termed “synthetic” because it cannot appear in nature, but can only be detected in the laboratory. Except Lucchesi in 1967 50, who discovered an SL system similar to the one discovered in 1922 by Bridges in Drosophila melanogaster, studies of SL essentially vanished, partially because of the unclear importance of the results, and partially because of the difficulty of detecting SL pairs. In addition, SL can be defined in multiple ways, though with essentially the same meaning. First, when both parts of an SL pair are detected in two different genes, they are called extragenic or intergenic SLs. Second, a group of mutations belonging to the same invariance group. In other words, alone, these positions are not invariant sites, but become so when grouped. In this situation, the essential function is not performed by a single amino acid but rather by several (Figure 1D and Figure 4).
The model explains that if binding and function are located on the same amino acids, this site makes an excellent therapeutic target.
N, S, T, Q, K+, R+, −H+, E−, L, and G: amino acids
In the 1990s, Kayser et al. 51 discovered SL gene pairs by studying the genes involved in yeast secretory pathways. Subsequently, several different techniques were developed, such as colony-sectoring assays 52, cell synchronization 53, SL screens 54, and studies of double mutants 55, allowing the discovery of new functions not performed by a single gene but by a pair; these techniques were developed almost exclusively in yeast. Indeed, as already explained, only 18.5% of S. cerevisiae genes were shown to be essential, but essential functions are far more numerous in eukaryotic cells. The experimental accessibility of budding yeast makes it a test-bed for technology development and application. There are hundreds of yeast examples where these techniques showed the importance of SL pairs in specific biological pathways, including morphogenesis 52, 56, 57, the N-end rule pathway 58, a new subclass of nucleoporins 59, 60, the spliceosome assembly pathway 61, replication 62-64, protein translocation 65-67, the cell cycle 53, the cytoskeleton 55, telomere size regulation 68, transcription 69, translation 70, mitosis 71, fission in Schizosaccharomyce pombe 72-74, and budding of S. cerevisiae 75.
SL pairs have also been described in the RNA realm, such as the U2 and U6 snRNA pair, which are involved in selection of pre-mRNA splice sites 76.
In the 2000s, a turning point was reached, when the complete sequence of the yeast genome 77 was made available, allowing an international consortium to generate a collection of 6,000 deletions 78 in the 6,000 predicted genes of the yeast genome, opening up new horizons for global analysis of SL genetic interactions, called a synthetic genetic array (SGA). In this screen, the query mutation is compared to an ordered array of –5,000 viable gene deletion mutants. Following a series of replica-pinning steps, the meiotic progeny harboring both mutations can be scored for fitness defects 79, 80This new technique allows the discovery of genes that are in a SL relationship with a specific query gene. Next, the entire SL network for kinetochores 81, chromosome segregation 82, S. cerevisiae budding 83, and many others were compiled in the DRYGIN database 84The same SL screening method was developed for S. pombe 85This new technology allowed the development of the reporter synthetic genetic array (R-SGA), where the mutated query gene is replaced by a reporter gene downstream of a promoter of interest 86, 87 introduced in the same SGA gene deletion collection. This technique allowed the discovery of new promoter trans-acting regulators, where a mutated regulator can regulate the query promoter differently than the WT regulator. The results of S. pombe SGA were compared to the results for S. cerevisiae, generating the conclusion that 29% of the genetic interactions are common to both species 88An alternative technique is synthetic lethality analysis by microarray (SLAM), in which the mutated query gene is directly introduced into the yeast knockout mutant collection. The quantitative performance of the microarray allows ranking of candidates 89Using colony size as a proxy for fitness, another quantitative SGA has been described by Baryshnikova 85An elegant model for inferring compensatory pathways from SGA results in yeast was described by MA et al. 27.
At the beginning of the 21st century, most SL interactions had been discovered in yeast and Caenorhabditis elegans, but few had been described in mammalian cells, likely owing to the lack of availability of efficient tools to identify such interactions in these cells. This new field of discovery of SL pairs provides new perspectives on cellular functions and facilitates discovery of new therapeutic approaches. Indeed, drugs act on pathogenic targets and on any proteins in the host body that have a function related to the target, producing significant side effects. Study of SL allows identification of new targets that may provoke less damage in cells. This method has largely been applied to cancer therapy during the first decade of the 21st century. In an SL pair, both proteins must be non-functional to produce a lethal phenotype. In a cancer cell, the first non-functional protein is the one responsible for the cancer, and if another protein in an SL relationship with this first protein can be identified, this new protein would make an excellent target for an anticancer drug. If it were so, both proteins would be non-functional, leading to cancer cell death. However, normal cells would have a wild-type first protein and a non-functional version of the second protein (the protein targeted by the drug). Thus, the drug’s effect on normal cells, to render one protein non-functional, would not change the phenotype of the cell, and thus should not induce any secondary effects. The use of SL as a target provides a therapeutic window for the treatment of cancer. In other words, if protein X is SL with a mutated protein Y that causes cancer, a proposed solution is to inhibit X, which will cause the death of the cancer cells but little or no harm to normal cells.
Thus, the objective is double: targeting proteins on cancer cells to destroy them and minimizing the effects of the drugs on non-cancerous cells (Figure 5).
In this example, gene B is in a SL relationship with gene A, which causes cancer. The goal is to target the product of gene B with a drug. This drug will cause no damage to wild-type (WT) cells, but will kill cancerous cells.
Some cancers develop because of loss-of-function 90, 91 mutations in tumor suppressor genes. Mammalian genes in an SL relationship with such mutated tumor suppressor genes could provide an excellent therapeutic target 91, 92Indeed, a good example for TP53 is described in 93As an example, Dicer1 targeting prevents retinoblastoma formation in mice by SL with combined inactivation of p53 and Rb 94Blocking protein kinase targets produced selective SL when combined with phosphatidylinositol-3-kinase inhibitors, a molecule that is a malignant glioma signature 95MYC oncogene family members have been broadly implicated in human cancers, yet are considered "undruggable" because they encode transcription factors. Experiments by a different research group 96-99 have revealed a rich therapeutic space comprising more than 48 genes in SL relationship with MYC. SL also shows promise for identifying drugs against colorectal cancer, involving the tumor suppressor BRCA1/2 100 and the DNA repair pathway poly-(ADP-ribose) polymerase 1 (PARP) 101, 102The same BRCA gene has been studied as a target in breast cancer (review: 103, where it has been shown that it shares a SL relationship with PARP 104). SL approaches using an siRNA library have shown that NF-kB signaling pathways were significantly enriched, making this protein a good target in combination with topotecan, an anticancer molecule 105 in neuroblastoma.
Extragenic SLs were discovered first. However, a few intragenic SLs have also been detected 106-109The intragenic SLs have been so little studied that they have not yet been named. Rather, they are defined by the negation of the covariance concept: negative coevolved pairs, negative co-variants, and negative epistasis. The principal reason that they have not been investigated is that no role has been assigned to them. Nevertheless, the same role as the one described for extragenic SL in cancer research can be exploited. Indeed, if a pair of SL mutations has the following definition: when both mutations are non-lethal when alone but lethal when combined in one genome, the identical definition can be used when both mutations are located in the same gene. A functional example is shown in Figure 4A. The mutations could be referred to as intra-SL pairs, and their role may be very similar to the one mentioned in the cancer section. Indeed, to identify good protein targets in cancer research, researchers do not directly target the query protein but a protein, which is in a SL relationship with it. To make a parallel argument, an escape-proof protein target should be comprised of a group of SL positions. If the same group of positions must be mutated to resist a drug and to perform an essential function, this group defines a new “therapeutic” pocket of choices for a protein target (Figure 4B).
This approach was used by Brouillet et al. 110 to define the region of the HIV protease to which it might be appropriate to target potential drugs. On RNA viruses, the main challenge in drug development is the rapid accumulation of resistant mutations. SL can be used to identify new binding sites. Even if resistant mutations appear on these sites, these mutations render the protein inactive 110-112These authors used this method to define new therapeutic targets on the HIV protease. Interestingly, Perryman’s team, using molecular dynamics simulations, achieved the same results 113The positions involved in SL relationships were then deeply studied, to describe the exact amino acid pairs that are implicated. Indeed, it is often possible to identify several SL pairs composed of various pairs of amino acids, but localized to the same pair of positions. Moreover, to define a complete schematic of dependent positions, an entire network involving SL and invariant positions has been constructed. All proximate pairs of SL positions that localize to the surface of proteins, as well as invariant positions in their vicinity, are shown in the graph in Figure 6 (personal data). This graph is composed of subgraphs containing positions that can be displayed on the three-dimensional structure of the protein of interest (Figure 6), at which point two binding sites for a potential drug can be drawn. Such SL pairs can now be simulated using VIRAPOPS, a forward RNA virus sequence population simulator 114, 115.
The graph is composed of 2 subgraphs derived from our computational analyses. A link between two positions indicates that these two positions are at the surface of the protein and near each other. Moreover, a red link binds two SL positions. A green link binds an SL position to an invariant position. Finally, a blue link binds two invariant positions. The numbers correspond to the HIV protease positions.
The different target sites are shown on a 3D representation of the HIV-1 homodimer (pdb:1HSG) protease. On the ribbon scheme: the red target is lining a pocket, which is necessary for flap opening. The blue target is located at the fulcrum of the protein.
The same concept can be applied to the development of vaccines. Indeed, the development of an anti-RNA virus vaccine is an annual public health issue. A vaccine is usually composed of different antigenic epitopes against which the human body develops antibodies that should protect against a future viral infection. Vaccine production is time consuming, time during which the virus can continue to mutate. Thus, the patient may become infected with a virus with different mutations that the one from which the vaccine was developed, and this virus will not necessarily be recognized by the patient’s antibodies. The development of stable peptide vaccines not sensitive to this phenomenon can be achieved using SLs. An escape-proof vaccine should be composed of different peptides containing positions that are in a SL relationship 116, 117In these conditions, the vaccine will be effective against the non-mutated virus and not effective against the mutated virus, but the latter will not be able to reproduce because it will have mutations in positions that perform essential functions.
The constant growth of sequence databases makes it easy to identify in silico covariant sites. The correlation between two sites (nucleotides or AAs) has been formalized by Atchley, W.R., et al. 118, who explain that global correlation is due to five different correlations: structure, function, phylogeny, interactions, and stochastic correlations. These first two types, structural and functional, were illustrated above. Phylogenic relationships describe correlations due to a common ancestor and they must be treated differently than the first two. Interaction correlations describe relationships between these three previously mentioned types. Finally, random covariation, due to uneven sequences or unknown random effects is represented by stochastic correlation.
Different methods have been used to describe structural and functional correlations. Some have used different correlation coefficients such as Fisher’s exact test, Lewontin’s D’ coefficient, and the chi squared test. Other methods have used information theory, probabilistic methods, and machine learning approaches 119Others have used spectral clustering 120Computational predictions could also be used to infer viral fitness from viral sequences by targeting regions that are vulnerable to selection pressure 121.
In the 2000s, an interesting technique was developed to identify the difference between functional covariation and covariation due to a common ancestor (phylogenic correlations). A mutation giving rise to synonymous codon (S) cannot be selected, because selection acts at the protein level, and in the case of a synonymous mutation, the synthesized protein is the same with or without this mutation. However, a non-synonymous mutation (A) could be selected. Thus, if the number of synonymous versus non-synonymous mutations at a given codon is the same, it has not been more selected than not selected. Concerning covariation, the same work can be performed on two codons, showing if this pair of codons gives rise to (A,A) or (S,S) codon pairs 122Other algorithms have employed reconstruction of phylogenic trees 123 to determine the difference between functional and common ancestor covariation. Unfortunately, although this type of work can efficiently select covariant pairs of positions, it cannot reveal whether they are CM or SL. However, a special dissimilarity coefficient can be used to determine this.
Several bioinformatics methods can be used to detect CM and intergenic SL, but are ineffective for detecting intragenic SL groups. However, as explained above, the concept used to describe new therapeutic targets against cancer could be used to identify targets for rapidly evolving organisms (e.g. RNA viruses). One or several molecules perform functions. For each of these functions, different positions are involved which can be involved in CM and SL relationships. All these data will require further study using large-network bioinformatics analysis. These data will involve CM and SL in the same covariant network, interacting with each other. Large genomic sequences will yield sufficient information on mutation to discover new functional and regulatory pathways, which will give rise to new and precisely targeted therapies.
A very long story about epistasis was told during the 20th century, a story that has not yet ended. These pairs of positions allowed the discovery of whole functional pathways, structural and functional relationships in protein and RNA studies, development of new targets in cancer research, and detection of regulators of genomic expression. They also provided many defenses against pathogenic organisms, without which they could have bypassed medical treatments. These pairs have been found in most of the organisms studied so far. The list of the scientific advances based upon studies of covariant positions is so long that we could believe that the debate is closed. However, in most of the field described in this work one can imagine new perspectives. Much has been done in the study of SGA to find functional pathways using SL pairs, but very few studies have used the same system to detect CM pairs using the SGAM process 16Several software packages employ knowledge of protein structural constraints, to infer evolutionary relationships between several organisms (i.e. HIV) 124In fact, covariant positions impose other constraints on protein structure, and can be used to infer evolutionary relationships. The double role of CMs, first in recovering fitness from a deleterious but necessary first mutation, and second, as a way to facilitate gene evolution, remains unclear. Finally, the use of intragenic SLs in the discovery of new stable pockets in protein targets to prevent resistance to drugs and vaccines shows great promise. This development will be most important in the RNA virus realm, where multitherapies rapidly end with failure.
We thank Anne-Rozenn Jouble for proofreading this manuscript and P. Le Chien for his encouragement since 1993. This work was conducted for Lou, 5 years old, who died of AIDS in 1997.
Bazin C: Analyzed and interpreted the data and drafted the work.
Coupaye R, Middendorp S: Acquired the data and revised and approved the work.
Vanet A: Conceived and designed the work, analyzed and interpreted the data, drafted the work, revised and approved the work.