Abstract
Free full text
A simple model to explain evolutionary trends of eukaryotic gene architecture and expression
Abstract
Enormous phylogenetic variation exists in the number and sizes of introns in protein-coding genes. Although some consideration has been given to the underlying role of the population-genetic environment in defining such patterns, the influence of the intracellular environment remains virtually unexplored. Drawing from observations on interactions between co-transcriptional processes involved in splicing and mRNA 3′-end formation, a mechanistic model is proposed for splice-site recognition that challenges the commonly accepted intron- and exon-definition models. Under the suggested model, splicing factors that outcompete 3′-end processing factors for access to intronic binding sites concurrently favor the recruitment of 3′-end processing factors at the pre-mRNA tail. This hypothesis sheds new light on observations such as the intron-mediated enhancement of gene expression and the negative correlation between intron length and levels of gene expression.
Introduction
The instructions for the synthesis of proteins are stored within genes in sequences called exons. In eukaryotes and in some viruses, genes often contain additional DNA sequences, called spliceosomal introns, which neither inform nor regulate the assemblage of amino acid chains. Introns are removed from the precursor mRNAs (or pre-mRNAs) through an RNA-splicing process carried out by the spliceosome, a complex molecular machine comprised of proteins and small nuclear RNAs. The spliceosome is guided by critical signals located at the intron termini (the 5′ or donor splice site and the 3′ or acceptor splice site), and in the intron body (an adenine nucleotide called the branch site and a DNA region rich in C and/or U known as the polypyrimidine tract). The recognition of and binding to the 5′ splice site by one of the major spliceosomal components, the small nuclear ribonucleoprotein U1 (or U1 snRNP), is typically necessary to trigger the assembly of the spliceosome [1].
RNA splicing is just one of several mRNA-associated processes. 5′-end mRNA capping, mRNA editing, mRNA cleavage and polyadenylation, nuclear export, mRNA surveillance, and mRNA degradation are also associated with the transcription of protein-coding genes. Together with splicing, these processes form a dynamic network of interactions whose structure is currently being elucidated [2], and whose influence on the evolution of the exon-intron gene structure remains virtually unexplored [3].
Here we offer a concise review of the elementary steps in the molecular biology of RNA splicing and its interplay with other mRNA-associated processes, and we present some reasoning on how this interplay could affect the evolution of the intron-exon structure of eukaryotic genes. More specifically, we put forward the hypothesis that gene expression and architecture in eukaryotes are influenced by antagonistic interactions between two major players: (i) the factors that regulate splicing (herein denoted as SFs, for splicing factors), and (ii) the factors that mediate the formation of the mRNA tail or 3′ end (herein denoted as CPFs, for cleavage/polyadenylation factors).
The central idea of the proposed hypothesis is that SFs and CPFs compete for access to overlapping or neighboring signal sequences along the transcription unit, particularly within introns and 3′ UTRs. This antagonistic relationship likely involves uridine-rich sequences [4, 5] and produces distinct effects. Let us consider, for example, the effects of competition for signal sequences within introns: when SFs efficiently access intronic binding sites, they prevent (or sterically inhibit) the local recruitment of CPFs, thereby enhancing the latters’ engagement at the pre-mRNA tail. In contrast, when the engagement of SFs to intronic sequences is inefficient (e.g. due to weak splice signals), CPFs may access introns and encourage a fraction of transcripts to undergo non-canonical splicing events or to remain unspliced (these events are known as alternative splicing). Intron-bound CPFs may remain inoperative (e.g. because of the proximity of U1 snRNP, see below). Alternatively, when key cleavage/polyadenylation signal sequences exist nearby, intron-bound CPFs may promote premature mRNA 3′-end formation (a process known as alternative polyadenylation; Fig. 1).
Several determinants influence the outcome of the competition between splicing factors and cleavage/polyadenylaton factors
In the proposed model, the recruitment of competing SFs and CPFs to pre-mRNA binding sites is facilitated by a number of determinants that are able to influence the relative local concentration (or molar ratio) of these two sets of factors. Some of these determinants are described below (see also Fig. 2 and Box 1).
At the two ends of the transcription unit, the Cap-Binding Complex and the 3′-end termination signals presumably influence, in opposite ways, the local molar ratio of SFs to CPFs. The Cap-Binding Complex, which binds to 5′-capped RNA polymerase II transcripts, is known to enhance the recruitment of SFs at the mRNA 5′-end [6]. Thus, by increasing the local molar ratio between SFs and CPFs, the Cap-Binding Complex should facilitate splicing in such locations [7, 8]. In contrast, the termination signals that populate mRNA 3′-ends should enhance the recruitment of CPFs, thereby decreasing the local molar ratio of SFs to CPFs and facilitating pre-mRNA 3′-end processing.
Within the transcript unit, strong splicing signals, as well as the proximity of an optimally base-paired U1 snRNP to CPF binding sites (Box 2), are expected to favor splicing. A pronounced exon-intron differential GC content may also favor splicing by guiding the recruitment of serine/arginine-rich proteins that bind AG-rich or AC-rich sequence elements [9, 10]. Serine/arginine-rich proteins typically assist the recruitment of spliceosomal components to the proximal splice sites when they bind within exons, whereas they act as splicing inhibitors when they bind within introns [11–13]. Thus, the combined presence of GC-rich exons and GC-poorer flanking introns should facilitate the specific binding of serine/arginine-rich proteins to exonic target sites, thereby enhancing splicing.
While the list of determinants presented here is probably not exhaustive, it is worth noting that determinants need not be RNA elements; they may also involve the transcribed DNA, as exemplified by recent studies which demonstrate that histone modifications facilitate the recruitment of SFs [14, 15].
Competing splicing factors and cleavage/polyadenylation factors promote coupled processes
Although an antagonistic relationship between SFs and CPFs is widely reported in the literature [5, 16–20], the processes of splicing and cleavage/polyadenylation are known to be coupled [21]. Together, these observations raise the obvious question: how can competing factors promote cooperative processes?
A real-life example may help reconcile these seemingly conflicting observations. Imagine a situation in which an individual performs two tasks either at the same time or sequentially. If she multitasks, each task is performed inefficiently because of competition for common energy resources. In contrast, if she performs the two tasks sequentially, there is no competition (or simultaneous engagement with common resources) and the efficiency of each task is enhanced. When considered like this, the two tasks appear to be coupled.
When applied to the intracellular environment, the proposed example may be read as follows: competition between SFs and CPFs for access to pre-mRNA binding sites hampers both splicing and mRNA 3′-end formation. In contrast, reduced antagonism between SFs and CPFs – which may occur when determinants inhibit the access of CPFs to intronic sites or of SFs to 3′-end termination signals – enhances the processes of splicing and mRNA 3′-end formation.
Under this scenario, the introduction of efficiently spliced introns into intronless transcripts is expected to reduce competition at the pre-mRNA 3′-end, and ultimately enhance transcriptional yield. Remarkably, this prediction corresponds with (and may shed light on the underlying mechanism of) a commonly observed phenomenon called “intron-mediated enhancement of gene expression” [22, 23].
Why are genes with small introns typically more highly expressed than genes with large introns?
The scenario proposed above may yield insight into the longstanding question of why genes with small introns are typically more highly expressed compared to genes with large introns [24–27] (see also Box 3). At least three explanations have been invoked to answer this question: (i) selection acts to minimize the cost of transcription [26]; (ii) selection acts against large introns in active chromosomal compartments [28]; and (iii) selection favors less complex regulation and architecture of housekeeping genes [29].
While we acknowledge the plausibility of each of these hypotheses, we propose that the negative correlation between expression level and intron size is a direct byproduct of molecular interactions. Specifically, we suggest that SFs are able to outcompete CPFs for access to intronic binding sites more efficiently within small introns than within large introns. Under this scenario, CPFs that are effectively outcompeted for binding intronic sites in short-intron-containing transcripts are more efficiently engaged to the pre-mRNA tail and enhance transcription yield.
Why is CPF recruitment to intronic binding sites less likely in short introns than in large introns? We suggest that the distance-dependent inhibitory effect of U1snRNP on CPFs provides a feasible explanation (Box 2). While it is difficult to accurately quantify how rapidly the inhibitory effect of U1 snRNP decays with distance, it seems reasonable to propose that, all else being equal, this inhibitory effect is reduced in naturally large introns or artificially expanded small introns. In these cases, the access of CPFs to intronic binding sites would be facilitated and alternative splicing may occur [30, 31]. This event would effectively promote less efficient recruitment of CPFs at the tail of the transcript and result in low or moderate transcription yield (Fig. 3).
Trade-offs between the determinants of competition influence the antagonistic relationships between splicing factors and cleavage/polyadenylation factors
Under our model, if large introns contain a weak 5′ splice site and key termination signals then CPFs are more likely to access intronic binding sites and promote premature mRNA 3′-end formation. This expectation is consistent with events observed during alternative polyadenylation in the introns of humans (3,344 genes involved [32]) and Arabidopsis (2,100 genes involved [33]), which are significantly larger (medians: 3,236 nt vs. 1,552 nt in humans; 270 nt vs. 99 nt in Arabidopsis) and have weaker 5′ splice sites (but comparably strong 3′ splice sites) compared to poly(A)-free introns. Although genes with large introns undergo accurate splicing less often than genes with short introns [34], large introns can, nonetheless, be correctly spliced. This suggests that other variables must come into play to guarantee correct splicing besides the proximity of an optimally base-paired U1 snRNP to CPF binding sites.
Under our model, large introns that are accurately spliced should exhibit and/or be in proximity of signals that increase the local molar ratio between SFs and CPFs (Fig. 2). In remarkable agreement with these predictions, splice-site strength has been found to scale positively with intron size across numerous species [35–38]. In addition, large introns are typically located in genic regions where we expect the recruitment of SFs to be enhanced, e.g. in proximity of the Cap-Binding Complex and beside GC-richer exons. Indeed, 5′ UTR introns are typically >twofold larger, on average, than CDS or 3′ UTR introns [39], whereas first CDS introns tend to be ~40% larger than other CDS introns [40]. Also, in several eukaryotes the GC-content differential between exons and their flanking introns is more pronounced for large introns than for short introns [41].
The intracellular and population-genetic environments are linked
Thus far we have proposed that the antagonistic relationship between SFs and CPFs may help explain a number of trends in gene expression and architecture across eukaryotes. Under our model, this antagonistic relationship is mediated by trade-offs between determinants of competition (e.g. splice-site strength, exon-intron differential GC content), whose interactions determine a dynamic equilibrium. Deviations from this equilibrium (e.g. due to weak selective pressures) are expected to perturb gene structure, and generate non-canonical or alternative splicing isoforms or intronic polya-denylation events, of which a large number are presumably non-functional and thus subject to purifying selection. This latter scenario leads to an intriguing prediction: because the efficiency of natural selection decreases with increasing organism size [42], the fraction of intronic polyadenylation and alternative splicing events in eukaryotes should gradually increase from unicellular to multicellular species. While there is currently not enough information available to explore trends for intronic polyadenylation, estimates for the prevalence of alternative splicing across multiple eukaryotes appear to support our model prediction (Fig. 4).
How are splice sites recognized?
Two mechanisms have been invoked to explain how the spliceosome selects splice sites: the intron- and the exon-definition models. According to the former, introns are the spliceosome’s units of recognition and the spliceosome identifies the 5′ and 3′ splice sites flanking the intron [43]. Mutation of a splice site at one end of the intron would inhibit splicing and lead to intron retention. In exon-definition [44, 45], 5′ and 3′ splice sites are recognized in concert across an exon. In exon-definition, the spliceosomal machinery would search for and bind splice sites at the ends of an internal exon, and mutation of a splice site at one end of this exon would lead to exon skipping.
Perhaps the most compelling evidence to support the exon-definition mechanism is that mutational inactivation of the 5′ splice site of a downstream intron represses splicing of the upstream intron [46]. In an intron-defined mechanism, the disruption of a splice site would be isolated to the intron that contains the mutation. In contrast, under exon-definition, the mutation of a 5′ splice site downstream of an internal exon would inhibit recognition of the exon, producing both the observed suppression of splicing and exon skipping.
Recognition of terminal exons is problematic for the exon-defined mechanism because these exons lack either the 3′ splice site or 5′ splice site. While the definition of terminal exons is suggested to be mediated by the 5′ mRNA capping complex and factors involved in mRNA maturation, respectively [44], how this mediation might occur remains unclear.
Exon-definition is commonly thought to occur in vertebrates where exon size is typically much shorter than intron size, while intron-definition is thought to occur in eukaryotes with short introns. However, exon skipping can also occur (even if infrequently) in species with typically short introns, such as Arabidopsis [47]. The opposite is true for intron retention, which is also found in species with large introns such as humans [21]. This raises some doubts as to whether intron (exon) size alone mediates intron (exon)-definition, as previously proposed [30]. In vitro studies suggest that the unit of definition may switch from intron to exon when intron size reaches ~250 nucleotides [31], but this may not always be the case [48]. At least one case of splicing has been reported where neither of the two models completely accounts for the experimental observations [49], and the evolutionary relationship between the two mechanisms of splice-site recognition is uncertain [50]. Furthermore, although particular attention has been given to the putative transition from exon definition to intron definition [51], the mechanisms that would allow the two modes to coexist (as they seem to in various species [52]) and to alternate in genes or species with both short and long introns [30] have not yet been identified.
U1-dependent definition
We propose a novel mechanism for the modus operandi of splicing, U1-dependent definition, in which aspects of both an exon and its upstream intron elucidate the selection of splice sites. The mechanism we put forward relies on the distance-dependent inhibitory effect that the U1 snRNP exerts on CPFs as well as on the central role that the U1 snRNP plays in the formation of the splicing commitment complex [1, 53] and in the binding of additional SFs to 3′ splice sites [54–56].
In the following sections we illustrate how U1-dependent definition can convincingly explain changes in splice-site recognition that result from the strength of 5′ splice sites and gene architecture. It is worth noting here that U1-dependent definition provides a logical explanation for the findings of Sterner et al. [57], a well-designed study that represents a cornerstone for the intron- and exon-definition models.
Internal exons
As we have discussed above, suboptimal splicing conditions (e.g. weak splice signals, large intron size, and poor intron-exon GC-content differential) encourage the access of CPFs to introns; this facilitates the occurrence of alternative splicing events, such as the skipping of the downstream exon. If this downstream exon is sufficiently small, however, and the 5′ splice site of the intron located downstream from this exon has optimal base-pairing with U1 snRNP, then it is unlikely that the skipping event will take place. Under these conditions, this U1 snRNP would exert an inhibitory effect on the access of CPFs to the proximal upstream intronic sites, favoring splicing [57, 58] (Fig. 5A). If the exon is small, but the 5′ splice site of the downstream intron is suboptimal, then the U1 snRNP binding would be less favored and alternative splicing (or premature mRNA 3′-end formation) is possibly promoted (Fig. 5B). In this view, the size expansion of the downstream exon would tend to repress the distance-dependent inhibitory effect of U1 snRNP on CPFs and favor alternative splicing (or premature mRNA 3′-end formation) despite the presence of an optimal downstream 5′ splice site [57] (Fig. 5C). Notably, the expansion of a small exon and/or a mutational deactivation of the 5′ splice site downstream of an internal exon would upset the splicing of the upstream intron, unless the latter is short. In such a case, the U1 snRNP bound to the 5′ splice site of the upstream intron would have an inhibitory effect on the access of CPFs to the downstream intronic binding sites, thereby favoring splicing [59, 60] (Fig. 5D).
Terminal exons
Berget [44] suggests that the Cap-Binding Complex plays a role in the splicing of the first intron; from the exon perspective, this implies that the Cap-Binding Complex and the first 5′ splice site may be recognized in concert. How this would take place is yet to be shown. As for the definition of last exons, in vitro and in vivo observations are consistent with the idea that the ends of these terminal exons – a 3′ splice site and a poly(A) site – are recognized in concert. More specifically, it has been reported that while a mutated 3′ poly(A) site impairs splicing of the last intron, a mutated terminal 3′ splice site or polypyrimidine tract disfavors or inhibits polyadenylation [48, 61–64].
Taking the molecular interactions discussed above into account, we propose that first-intron splicing is assisted by the Cap-Binding Complex-enhanced recruitment of SFs at the pre-mRNA 5′-end. At the other end of the transcript, splicing of last introns may be facilitated by 3′-end termination signals that recruit CPFs and in so doing help SFs to outcompete CPFs within the last intron. Under this scenario, mutated 3′-end transcription termination signals inhibit last intron splicing because they ineffectively recruit CPFs. Similarly, mutated splicing signals of the last introns perturb the process of mRNA maturation because they intensify the antagonism between SFs and CPFs that have overlapping or neighboring binding sites in the last exon (e.g. U-rich sequences located upstream of the cleavage site [65]). These hypothetical dynamics may explain the interdependence between 3′-end processing and splicing observed by earlier studies [66, 67].
Additional layers of complexity may be added to the scenario described above. For example, the close proximity of the last intron to 3′-end termination signals probably disfavors rather than facilitates splicing, in that the termination signals that efficiently recruit CPFs would increase the local molar ratio of CPFs to SFs. Observations in yeast support this scenario. Specifically, Tardiff et al. [68] measured the efficiency with which two SFs, U1 snRNP and U2 snRNP, are recruited to the 5′ splice site and the branch site, respectively, of an intron within two-exon constructs. The second exon in these constructs has a variable length (ranging from 350 to 2,300 bp) and contains a 3′ UTR segment with functional termination signals. Their results suggest that the levels of recruitment of U2 snRNP are considerably higher in constructs with larger second exons. Also, premature cleavage and poly-adenylation is likely to take place in constructs with short second exons [68]. These results are consistent with the idea that the 3′-end termination signals compromise the (co-transcriptional) splicing of introns that are close to the tail of the transcript.
Open questions and limitations of U1-dependent definition
Our model makes several predictions (Box 4) but leaves a number of central questions unresolved, some of which are listed below.
What are the nature and the relative locations of the CPF intronic binding sites? While we propose that U-rich sequences are suitable candidates [4, 5], other motifs may be involved. Also, do CPF intronic binding sites preferentially reside in specific portions of the intron (e.g. flanking or overlapping the polypyrimidine tract), or are they distributed throughout the intron [69]?
What is the range of the U1 snRNP inhibitory action on the access of CPFs to downstream binding sites? While a recent study shows that this action extends up to ~500–1,000 nt in humans, mouse, and Drosophila [70], estimates for other eukaryotic groups await formal investigation.
Is the inhibitory effect of U1 snRNP bidirectional as it is assumed above? And if so, is the strength of this effect comparable with that exerted on upstream binding sites?
Unfortunately, it is difficult to disentangle the effects of U1-dependent definition from those of exon- and intron-definition, as these effects coincide in many cases. This is made most clear by the previously discussed observations of Steiner et al. [57], which can be explained equally well under exon- and intron-definition and U1-dependent definition. The development of strategies to examine the validity of U1-dependent definition is under way, and at least two additional observations make the investigation of this theoretical mechanism of definition worthwhile. First, in humans, selective constraints exist on the length of exons and their flanking introns [71]. Second, in Drosophila and humans, the length of the upstream intron is significantly more important than the length of the downstream intron in determining whether or not the encompassed exon will be skipped [31]. These observations hint at the possibility that the spliceosome recognizes an intron and the following exon as a unit. While not expected under the exon- and intron-definition models, this is consistent with U1-dependent definition (Fig. 5).
Conclusions
The ideas outlined above provide a broad and coherent view on how interacting transcriptional processes may affect the evolution of gene expression and architecture in eukaryotes. What we accomplish is the formulation of a coherent molecular scenario wherein the competition for access to pre-mRNA binding sites between splicing factors and cleavage/polyadenylation factors is one of the major driving forces in the selection of exons and transcription termination sites. Such a scenario (i) makes predictions that extend to the selective environment to which distinct eukaryotic lineages are subject [42], and (ii) integrates logically with (and extends) the proposal that the intracellular environment contributes to the physical establishment of spliceosomal introns in eukaryotic genes [3].
We have presented simple and logical connections to frame observations such as intron-mediated enhancement of gene expression, negative correlation between intron size and levels of gene expression, and exon- and intron-definition modes of splice-site recognition in the context of the proposed model. One key influential factor is the U1 snRNP, which interacts with 5′ splice sites and exerts a distance-dependent inhibitory effect on cleavage/polyadenylation factors. The proposed U1-dependent definition potentially unifies the intron- and exon-definition models and, if experimentally verified, would imply that the mechanisms underlying splice-site recognition across eukaryotes are more similar than currently thought.
Acknowledgments
We thank J. Schmitz for comments on an earlier version of this manuscript. This work was supported by a Marie Curie International Incoming Fellowship (grant 254202) awarded to F.C., MetaCyte funding from the Lilly Foundation to Indiana University, and the National Science Foundation grant EF-0827411 to M.L.
Abbreviations
CPFs | cleavage/polyadenylaton factors |
SFs | splicing factors |
U1 snRNP | small nuclear ribonucleoprotein U1 |
References
Full text links
Read article at publisher's site: https://doi.org/10.1002/bies.201200127
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc4968935?pdf=render
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1002/bies.201200127
Article citations
Intronization Signatures in Coding Exons Reveal the Evolutionary Fluidity of Eukaryotic Gene Architecture.
Microorganisms, 10(10):1901, 25 Sep 2022
Cited by: 2 articles | PMID: 36296178 | PMCID: PMC9612004
Exploring the Impact of Cleavage and Polyadenylation Factors on Pre-mRNA Splicing Across Eukaryotes.
G3 (Bethesda), 7(7):2107-2114, 05 Jul 2017
Cited by: 3 articles | PMID: 28500052 | PMCID: PMC5499120
mRNA-Associated Processes and Their Influence on Exon-Intron Structure in Drosophila melanogaster.
G3 (Bethesda), 6(6):1617-1626, 01 Jun 2016
Cited by: 4 articles | PMID: 27172210 | PMCID: PMC4889658
NP1 Protein of the Bocaparvovirus Minute Virus of Canines Controls Access to the Viral Capsid Genes via Its Role in RNA Processing.
J Virol, 90(4):1718-1728, 04 Dec 2015
Cited by: 23 articles | PMID: 26637456 | PMCID: PMC4733993
Go to all (8) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
mRNA-Associated Processes and Their Influence on Exon-Intron Structure in Drosophila melanogaster.
G3 (Bethesda), 6(6):1617-1626, 01 Jun 2016
Cited by: 4 articles | PMID: 27172210 | PMCID: PMC4889658
An intron enhancer recognized by splicing factors activates polyadenylation.
Genes Dev, 10(2):208-219, 01 Jan 1996
Cited by: 91 articles | PMID: 8566754
An active role for splicing in 3'-end formation.
Wiley Interdiscip Rev RNA, 2(4):459-470, 16 Dec 2010
Cited by: 45 articles | PMID: 21957037
Review
From intronization to intron loss: How the interplay between mRNA-associated processes can shape the architecture and the expression of eukaryotic genes.
Int J Biochem Cell Biol, 91(pt b):136-144, 01 Jul 2017
Cited by: 7 articles | PMID: 28673893
Review
From polyadenylation to splicing: Dual role for mRNA 3' end formation factors.
RNA Biol, 13(3):259-264, 17 Nov 2015
Cited by: 23 articles | PMID: 26891005 | PMCID: PMC4829302
Funding
Funders who supported this work.
Lilly Foundation to Indiana University, and the National Science Foundation (1)
Grant ID: EF-0827411
Marie Curie International Incoming Fellowship (1)
Grant ID: 254202
NIGMS NIH HHS (1)
Grant ID: R01 GM036827