Alternative RNA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/271965892

Alternative mRNA transcription, processing, and translation: Insights from


RNA sequencing

Article in Trends in Genetics · January 2015


DOI: 10.1016/j.tig.2015.01.001

CITATIONS READS

296 1,658

2 authors, including:

Eleonora De Klerk
UCSF University of California, San Francisco
18 PUBLICATIONS 735 CITATIONS

SEE PROFILE

All content following this page was uploaded by Eleonora De Klerk on 27 October 2017.

The user has requested enhancement of the downloaded file.


TIGS-1175; No. of Pages 12

Feature Review

Alternative mRNA transcription,


processing, and translation: insights
from RNA sequencing
Eleonora de Klerk and Peter A.C. ‘t Hoen
Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands

The human transcriptome comprises >80 000 protein- processing, and translation restricts the number of combi-
coding transcripts and the estimated number of proteins nations of possible alternative transcripts and proteins.
synthesized from these transcripts is in the range of
250 000 to 1 million. These transcripts and proteins are Initiation of transcription: alternative promoters
encoded by less than 20 000 genes, suggesting extensive During the biogenesis of mRNAs, regulation of transcrip-
regulation at the transcriptional, post-transcriptional, tion initiation represents the first layer in the control of
and translational level. Here we review how RNA se- gene expression [1–4]. Alternative transcription initiation
quencing (RNA-seq) technologies have increased our leads to the formation of transcripts differing in their first
understanding of the mechanisms that give rise to alter- exon or in the length of the 50 untranslated region (50 -UTR).
native transcripts and their alternative translation. We The use of alternative first exons leads to transcripts with
highlight four different regulatory processes: alternative different open reading frames (ORFs) and diversifies the
transcription initiation, alternative splicing, alternative repertoire of encoded proteins giving rise to protein iso-
polyadenylation, and alternative translation initiation. forms with alternative N termini [5] (Figure 1A). Alterna-
We discuss their transcriptome-wide distribution, their tively, transcripts sharing the same coding region but a
impact on protein expression, their biological relevance, different 50 -UTR can be subject to differential translational
and the possible molecular mechanisms leading to their regulation (Figure 1B) [6] through short upstream ORFs
alternative regulation. We conclude with a discussion (uORFs) involved in translational control [7–9] or in the
of the coordination and the interdependence of these production of biologically relevant peptides [10–12].
four regulatory layers. The use of alternative promoters and transcription start
sites (TSSs) in protein coding transcripts was established
Regulatory layers defining gene expression before the development of transcriptome-wide approaches,
The diversification of cellular and organismal functions through studies based on a method called cap analysis of
observed in higher eukaryotes cannot be explained by the gene expression (CAGE) [13]. CAGE still represents the basic
sheer number of genes but is mostly due to the expression technology for the detection of TSSs. Recently, several high-
of different transcripts and proteins from the same genes. throughput CAGE methods, such as DeepCAGE, have been
Variation in the expression of coding genes is controlled at developed [14]. These transcriptome-wide studies suggest
multiple levels, from transcription to RNA processing and that TSS use is highly tissue specific [4,15–18] and that the
translation. Alternative transcripts and proteins may arise number of alternative TSSs differs by tissue type, with the
from alternative transcription initiation, alternative splic- hippocampus accounting for a larger number of TSSs than
ing, alternative polyadenylation (APA), and alternative any other tissue [18,19]. To what extent alternative TSSs
translation initiation. These co- and post-transcriptional lead to alternative 50 noncoding regions or translate into
regulatory mechanisms expand the genome’s coding capa- novel protein isoforms is virtually impossible to determine
city modifying protein function, stability, localization, and from DeepCAGE reads, which consist of 25 or 26 nucleotides.
expression levels. In this review, we discuss how high- To assess the potential for novel ORFs arising from the use
throughput RNA-seq has helped us to understand these four of alternative TSSs, it is essential to integrate DeepCAGE
regulatory processes. We describe their transcriptome-wide data with RNA-seq, ribosome profiling, and proteomics.
abundance in mammalian cells, their impact on protein The FANTOM Consortium is leading most of the re-
expression, their biological relevance, and the molecular search in the field of promoters and TSSs. In their most
mechanisms underlying these processes. Finally, we high- recent TSS survey [4], which includes approximately
light how the interdependence between transcription, RNA 200 human primary cell types, 150 human tissues, and
250 human cancer cell lines, it was shown that on average
Corresponding author: ‘t Hoen, P.A.C. (p.a.c.hoen@lumc.nl).
there are four TSSs per gene, but the number of TSSs
Keywords: gene expression; transcriptome; RNA sequencing; alternative polyadeny-
lation; alternative splicing; translation. reported strictly relies on the filtering method used. An
0168-9525/
estimate of the transcriptome-wide distribution of alterna-
ß 2015 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2015.01.001 tive TSSs can indeed be complicated by the presence of
CAGE peaks marking enhancer regions [4], 30 -UTRs
Trends in Genetics xx (2015) 1–12 1
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

(A) Alternave first exons (B) Alternave 5′-UTR


10 kb 1 kb
525
TSS 1 198
Prol cells

Prol cells
TSS 1
TSS 2
TSS 2
1 1
99 233
Diff cells

TSS 3 TSS 1

Diff cells
TSS 2
TSS 2
1 1
RefSeq genes RefSeq genes
Tpm3 Cryab
Ensembl gene predicons
ENSMUST00000149115

5′-UTR

Alternave first exons uORF Stop codon pORF

(C) Promoters and enhancers (D) Long range transcriponal control

TSS 1 TSS 2 TSS 3 Lmbr1 Rnf32 Shh


TF1 TF2 TF3
P1 Exon 1 P2 Exon 2 E P3 Exon 3

TF1 TF2 TF1 TF3

850 Kb

Key: Limb-specific Brain-specific

Epithelial lining-specific Floorplate-specific

TRENDS in Genetics

Figure 1. Alternative transcription initiation. (A) Data from a deep cap analysis of gene expression (DeepCAGE) experiment showing alternative transcription start sites
(TSSs) used during muscle differentiation in proliferating myoblasts and differentiated myotubes [16]. In the Tpm3 gene, different promoters lead to the formation of
transcripts with different first exons. One alternative TSS (TSS3) is specifically used in differentiated cells. (B) In the Cryab gene, proliferating cells make use of an alternative
TSS to extend their 50 untranslated region (50 -UTR). The sequence of the 50 -UTR is shown below the reference track. The extension on the 50 -UTR leads to the transcription of
a potential upstream open reading frame (uORF) starting at a canonical AUG codon and ending before the start codon of the primary ORF (pORF). (C) An illustrative
example of cell- and tissue-specific alternative TSSs regulated by the binding of transcription factors (TFs) to promoters and enhancer regions. While TF1 and TF2 bind to
promoters (P1, P2) surrounding the TSS, TF3 binds to a distal upstream sequence corresponding to an enhancer region (E), which enhances transcription from a third TSS
(TSS3). Some TFs are present in multiple tissues (TF1) whereas others are tissue specific (TF2, TF3), and their transcription can also be regulated during cell differentiation
(TF1 regulates transcription in undifferentiated cells and TF2 in differentiated cells). (D) Long-range transcriptional control mediated by enhancers. Transcriptional
regulation of the Shh gene is tightly controlled during development by enhancer regions located up to 850 kb from the gene. Whereas some enhancers are located within
the coding region of Shh, others are located in intergenic regions or within intronic regions of the Lmbr1 and Rnf32 genes. Genes are depicted as gray boxes. Known
enhancer regions in the mouse are marked in different colors according to their tissue specificity.

[4,20,21], coding regions (a phenomenon called exon paint- type-restricted expression due to the presence of proximal
ing [16,22,23]), and promoter-associated short RNAs enhancers [4].
(PASRs) [20]. Whereas exon painting may arise as a con- The molecular mechanisms responsible for the choice of
sequence of recapping of degradation products, many other alternative promoters and TSSs can be divided into two
CAGE peaks represent short capped transcripts whose categories: alteration of the chromatin state and regulation
functions remain largely unknown. A striking recent find- mediated by cell- and tissue-specific transcription factors
ing from this large TSS survey [4] is that most genes are (Figure 1C). Understanding the biological importance of
regulated in a tissue-specific manner and only a small alternative and tissue-specific TSSs requires learning how
percentage can be considered to be truly housekeeping. the choice of a specific TSS is made and which transcription
The use of alternative tissue-specific TSSs seems to be factor and regulatory networks are involved. This can be
regulated by the presence of enhancer regions more than achieved by making inferences on transcriptional net-
by alternative core promoters. Half of all detected CpG works. In a DeepCAGE time-course study on the differen-
island promoters and more than 90% of all promoters tiation of human monocytic leukemia cells [17], the authors
lacking both CpG islands and a TATA box exhibit cell predicted transcription factor binding sites around the
2
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

TSSs identified in each condition and subsequently built a From analysis of the transcriptomes of 15 different
network model of gene expression using motif activity human cell lines [1], it appears that up to 25 different
response analysis. This provided important insights into transcripts can be produced from a single gene and that up
the key regulators active in transcriptional control in to 12 alternative transcripts may be expressed in a partic-
distinct phases of differentiation. Similarly, another study ular cell. Alternative transcripts are not expressed at the
[24] inferred transcriptional regulatory networks after the same level, but one transcript is usually dominant
perturbation of specific transcription factors (PU.1, IRF8, [34]. According to the latest GENCODE release [version
MYB and SP1) in the same cells. This led to the discovery 20 (http://www.gencodegenes.org/stats.html)], there are
of target genes for each transcription factor and led to the almost 80 000 transcript variants encoded by about
identification of de novo binding site motifs. 20 000 protein-coding genes in humans – an average of
Many studies focusing on single genes have shown that four transcripts per gene. A previous GENCODE release
the choice of a specific TSS has critical roles during devel- (version 7) reported an average of six transcripts per gene,
opment [25–27] and cell differentiation [28] and aberra- while RefSeq, the University of California, Santa Cruz
tions in alternative promoter and TSS use lead to various (UCSC), and the Collaborative Consensus Coding Se-
diseases including cancer [29,30], neuropsychiatric disor- quence (CCDS) project [35] report a much lower average.
ders [31], and developmental disorders [32]. Whereas These discordances suggest that variations in the number
some disorders are caused by epigenetic changes or genetic of transcripts per gene reported are due to the different
aberrations in the promoter region, others are caused by methods used to annotate RNA sequences, highlighting
genetic changes in distal elements affecting long-range the current limitations in fully characterizing transcrip-
transcriptional regulation. The ENCODE project has tomes.
shown the presence of more than 1000 long-range inter- It remains challenging to predict which transcripts are
actions between TSSs and distal elements within a range present in a specific cell type. Splice site selection depends
of 120 kb [3]. An example of such a long-range interaction is on multiple parameters including the presence of splicing
Shh [32] (Figure 1D), a gene that is spatially and tempo- regulators, the strength of splice sites, the structure of
rally regulated during development. To date, ten Shh exon–intron junctions, and the process of transcription. So
enhancers have been identified, located within a region far, various molecular mechanisms have been shown to
of 1 Mb in humans and 850 kb in mice (Figure 1D). These regulate alternative splicing.
enhancers play a key role during development, as indicated Next to conserved cis elements such as the splice donor
by mutations in the limb-specific enhancer that lead to and acceptor sites, branch sites, polypyrimidine tracts, and
various skeletal limb abnormalities. a range of other sequence motifs are recognized by various
auxiliary splicing factors. These auxiliary RNA-binding
Splicing: alternative exons proteins (RBPs) are not part of the spliceosomal machinery
During and after transcription, almost all mRNAs are but can enhance or suppress alternative splicing by inter-
spliced. Alternatively spliced transcripts result from the fering with it [36–39]. Various crosslinking and RNA
differential inclusion of subsets of exons (Figure 1A and immunoprecipitation techniques, followed by next-genera-
Box 1). Of the regulatory mechanisms discussed in this tion sequencing, have been developed to map RNA–protein
review, alternative splicing is the most prevalent event, interactions in vivo [14]. An early goal of these studies was
affecting approximately 95% of mammalian genes the identification of RNA-binding sites. Many of these
[33]. RNA-seq has the potential to elucidate the number, studies have shown that RBPs recognize short (3–7 nt)
structure, and abundance of alternative transcripts and degenerate motifs, have multiple RNA-binding domains,
the molecular mechanisms responsible for their formation. and display variable efficiency when multiple motifs clus-
ter together [40,41]. Moreover, many RBPs regulate the
expression of other auxiliary factors. The differing cellular
Box 1. Alternative splicing events
and temporal localization of RBPs [42,43] may explain the
different dynamics regulating alternative and constitutive
Five major alternative splicing events are distinguished: exon splicing: whereas constitutive splicing mainly occurs
skipping (also called cassette exon), use of alternative acceptor
and/or donor sites, intron retention, and mutually exclusive exons.
cotranscriptionally, alternative splicing mainly occurs
Exon skipping appears to be the most common, occurring in 38% post-transcriptionally [44]. For recent mechanistic models
of mouse and human genes, whereas intron retention is less of splicing regulation through RBPs, see [45]. Alternative
common (3%) [135]. How the spliceosome recognizes alternative splicing can also be regulated in a manner totally indepen-
exons and decides which exons to include remains not fully dent of auxiliary splicing factors [46]. Splicing silencer
understood. Before the advent of RNA-seq, studies revealed some
general characteristics in conserved alternative cassette exons: they
sequences regulate alternative splicing when competing
tend to be smaller in size compared with constitutive exons [136] 50 splice sites are present in the same RNA molecule
and their length is divisible by three, thus maintaining the same (Figure 2B). The competing 50 splice sites are equally well
reading frame when the alternative exon is skipped or included recognized by the U1 small nuclear ribonucleoprotein
[137]. Non-conserved cassette exons do not show these character-
(snRNP), but silencer sequences alter the configuration
istics. In addition, alternative exons seem to contain weaker splice
sites (the exon–intron junctions at the 50 and 30 ends of introns; i.e., in which U1 binds to the 50 splice sites, leading to silencing
donor and acceptor sites), although the other primary cis-acting of the 50 splice site. This can change the efficiency of a splice
elements used to define the intron (the branch site and the site: weak 50 splice sites can be recognized and used instead
polypyrimidine tract located upstream of the acceptor site) are of stronger 50 splice sites. RNA-seq datasets can be used
generally similar to those found in constitutive exons [138].
to computationally identify common and tissue-specific
3
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

(A) Alternave exons splicing regulatory sequences. These studies have shown
2 kb that the same sequence can act as an enhancer or a silencer
31
Muscle 0 in different tissues, but experimental validations of these
brain 18
predicted regulatory sequences are needed to confirm these
0

SLC25A3
RefSeq genes
observations [47].
Alternative splicing can also be regulated by RNA sec-
ondary structures (Figure 2C). Short-range RNA second-
Alternave skipped exons
ary structures can mask primary cis elements such as the
(B) Splicing silencer sequence acceptor and donor sites or the polypyrimidine tract
[48,49]. They have been associated with alternative splic-
Weak 5′ ss Strong 5′ ss 3′ ss ing at alternative 50 splice sites. For example, the RBP
MBNL1 forms a secondary structure upstream of exon 5 of
(I) Exon 1 GU AG Exon 2
human TNNT2 and upstream of the fetal exon of mouse
U1 U1 Tnnt3, blocking U2AF65 binding to the polypyrimidine
tract [50,51]. Long-range secondary structures bring dis-
tant splice sites into closer proximity, facilitating alterna-
Weak 5′ ss Strong 5′ ss 3′ ss tive splicing, and are associated with weak alternative 30
splice sites [49]. Computational studies based on RNA-seq
(II) Exon 1 GU AG Exon 2
U1 datasets suggest that the splicing of thousands of mam-
U1 SSS malian genes is dependent on RNA structures, both short
and long range [49]. Recently developed high-throughput
techniques combine nuclease digestion [52] or chemical
Weak 5′ ss Strong 5′ ss 3′ ss
probing [53] with next-generation sequencing to provide
transcriptome-wide RNA structural information. Two
(III) Exon 1 GU AG Exon 2 studies have recently shown a transcriptome-wide rela-
U1 U1 tionship between secondary structures and alternative
SSS SSS splicing [54,55], by reporting the presence of strong sec-
ondary structures at 50 splice sites that correlate with
(C) Short-and long-range RNA structures unspliced exons. The question that remains unsolved by
RNA-seq studies is whether the plethora of transcript
Weak 5′ ss Strong 5′ ss 3′ ss variants produced affect protein expression. This question
AG
has been recently addressed by studies using ribosome
(I) Exon 1 Exon 2
profiling, discussed further below. A general observation
U1 from transcriptome-wide studies is that alternative splic-
ing is essential for development [56,57] and cell, tissue [58],
and species specificity [59]. A plausible explanation of how
alternative exons can confer such specificity is the inclu-
sion or exclusion of binding motifs and post-translational
modification sites, as shown in a study where the authors
5′ ss Weak 3′ ss investigated the structural and functional properties of
(II) Exon 1 Exon 3
alternative exons [60].
GU
Due to the widespread role of alternative splicing, it is
U1 unsurprising that errors in this process lead to various
diseases, from neurodegenerative disorders to muscle
dystrophies and cancer; we refer the reader to recent
detailed reviews [61,62].
Strong 3′ ss
30 End maturation: APA
TRENDS in Genetics
Another step in mRNA processing is the process of poly-
adenylation [63]. The use of APA sites represents an extra
Figure 2. Alternative splicing. (A) Data from an RNA sequencing (RNA-seq)
regulatory layer during gene expression that results in the
experiment showing tissue-specific alternative splicing [139]. The SLC25A3 gene
is differentially spliced in brain and muscle tissues through exon skipping. (B) formation of transcripts differing in their 30 ends. Tran-
Alternative splicing regulated by silencer sequences. In (I) the U1 small nuclear scripts arising from APA may differ in their coding region
ribonucleoprotein (snRNP) splicing factor recognizes both strong and weak 50
splice sites (50 ss) but splicing occurs only at the strong 50 ss. In (II) a splicing
(if APA sites are located in a different exon or intron)
silencer sequence (sss) is located downstream of the strong 50 ss. U1 binds both (Figure 3A) or in the length of their 30 -UTRs [tandem
the weak and the strong 50 ss, but the conformation in which it binds the strong polyadenylation sites (PASs)] (Figure 3B). The impact of
50 ss is suboptimal for splicing; therefore, only the weak 50 ss is used for splicing. In
(III) the sss is located downstream of both the weak and the strong 50 ss. U1 binds
APA on the regulation of gene expression can be extended
both with suboptimal conformation, but only the strong 50 ss is used for splicing.
(C) Alternative splicing regulated by RNA secondary structures. Example of short- upstream. (II) The long-range RNA secondary structure brings together a strong
and long-range RNA secondary structures. (I) The short-range RNA secondary 50 ss and a weak 30 ss, causing the loss of a complete exon (in green) and a region
structure masks a strong 50 ss, leading to the recognition of a weaker 50 ss located of the last exon (in purple).

4
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

(A) Intronic alternave polyadenylaon


20 kb
Distal PAS
134

Intronic PAS
0
RefSeq genes

Luc7l2

Alternave 3′ terminal exon

(B) Tandem alternave polyadenylaon


(I) (II)
2 kb 1 kb

OPMD Control
OPMD Control

7_
46_ Distal PAS
Distal PAS
Proximal PAS
0_ 0_
14_
32_

Distal PAS Proximal PAS


Proximal PAS
0_ 0_
miR-19
Full 3′-UTR Full 3′-UTR
Arih2 Ccnd1
Truncated 3′-UTR Truncated
HuR mof: uukruuu
HuR
HNRNPL mofs: amayama,acacrav
Loss / gain of miRNA binding site HNRNPL
HNRNPK mof: ccawmcc

HNRNPU mof: uguauug


HNRNPK
HNRNPU
Loss / gain of RBPs binding sites

(C) Polyadenylaon site selecon


(I) Pol II
PABPN1
CTD
Exon 2
Non-canonical PA signal
Proximal PAS CPSF CstF
CFIm
USE UGUU AUUAAA CA DSE USE UGUU AAUAAA CA DSE Full 3′-UTR
Distal PAS
Canonical PA signal
(II)
Pol II PABPN1

CTD
Exon 2
CFIm Canonical PA signal
CPSF CstF
Distal PAS

USE UGUU AUUAAA CA DSE USE UGUU AAUAAA CA DSE Truncated 3′-UTR
Proximal PAS
Non-canonical PA signal
TRENDS in Genetics

Figure 3. Alternative polyadenylation (APA). (A) Data from a poly(A)-sequencing experiment showing APA in the intron of the Luc7l2 gene [71], leading to an intronic
proximal polyadenylation site (PAS) located in a different terminal exon giving rise to transcript variants with different open reading frames (ORFs). (B) Two examples of
tandem APA in muscle tissue from a mouse model for oculopharyngeal muscle dystrophy (OPMD) [71]. In the Arih2 gene (I), both the distal and the proximal PASs can be
used in the disease state. Recognition of a proximal PAS leads to shortening of the 30 untranslated region (30 -UTR) and loss of a miRNA binding site, causing an increase in
transcript levels. In the Ccnd1 gene (II), shortening of the 30 -UTR leads to the loss of many recognition sites for RNA-binding proteins (RBPs) that stabilize the transcript. Loss
of stability leads to a decrease in transcript level. (C) Model mechanisms regulating tandem APA. Common sequences in the 30 -UTR that regulate polyadenylation are the
upstream sequence element (USE), the UGUU sequence recognized by cleavage factor I (CFIm), the polyadenylation (PA) signal recognized by cleavage and
polyadenylation specific factor (CPSF), and the downstream sequence element (DSE) recognized by cleavage stimulation factor (CstF). CPSF and CstF are brought to the
RNA by RNA polymerase II (Pol II), together with poly(A)-binding protein nuclear 1 (PABPN1), through its C-terminal domain (CTD). Generally, CPSF recognizes the
canonical PA signal and cuts at a distal PAS, at a CA dinucleotide (I). If PABPN1 or CFIm is present at a lower concentration, the CPSF recognizes noncanonical (weaker) PA
signals (II) and cuts at proximal PASs, leading to the formation of transcripts with truncated 30 -UTRs.

through effects on transcript localization [64], stability, and studies able to detect overall changes in polyadenylation,
translation efficiency [65] and on the nature of the encoded to serial analysis of gene expression (SAGE)-based methods
protein. Numerous RNA-seq methods have contributed to able to specifically quantify and characterize the 30 ends
our understanding of APA, ranging from RNA-seq of transcripts, to a series of dedicated protocols for the
5
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

accurate detection and quantification of PASs [14]. These Box 2. The biological relevance of APA
transcriptome-wide studies have deepened our understand- A study based on expressed sequence tags comprising 42 human
ing of APA, providing information on newly discovered tissues [140] showed that certain tissues preferentially produce
PASs, elucidating the impact of APA on gene expression, mRNAs of a certain length. Brain, pancreatic islet, ear, bone marrow,
and discovering new APA regulatory mechanisms. and uterus showed a preference for distal PASs, leading to longer
Although the number of alternative PASs detected dif- 30 -UTRs. Retina, placenta, ovary, and blood showed a preference for
proximal PASs. This classification might change when considering
fers greatly between studies [66–68], these studies contrib- the levels at which these mRNAs are expressed. Although most
ute to the notion of the ubiquity of APA events, which of the transcripts detected in the brain contain distal PASs, the
involve approximately 70% of human genes. According to transcripts that are highly abundant generally show a preference for
a study conducted on 15 human cell lines, there are on proximal PASs and have short 30 -UTRs [72]. Other studies showed
average two PASs per gene [1]. APA within the same last that the choice between a distal and a proximal PAS was modulated
during differentiation and development. Progressive lengthening of
exon (tandem 30 -UTRs) is the most abundant type of APA 30 -UTRs was shown for most of the transcripts during cell
[68]. Intronic APA events are reported less frequently differentiation and during embryonic development [141]. By con-
and thousands of intronic PASs are usually suppressed trast, shortening was observed during proliferation [142] and during
[69]. APA is generally linked to changes in gene expression reprogramming of somatic cells [143].
levels and, ultimately, to protein abundance. Studies have
shown an inverse correlation between 30 -UTR length and different species and APA profiles from different species
protein expression levels [70,71]. Some human tissues are similar for the same tissues [80,81,86]. Modulation of
(such as brain, testis, lung, and breast) are enriched for APA has also been widely observed during proliferation,
highly abundant transcripts with short 30 -UTRs, whereas differentiation, and development [68,87–89].
others (such as heart and skeletal muscle) contain many Widespread alteration of APA profiles has been ob-
low-abundance transcripts with long 30 -UTRs [72]. In- served in several diseases. Many studies have reported
creased expression of transcripts with shortened 30 -UTRs shortening of 30 -UTRs in cancer [90–92], linked to exten-
can be explained by loss of miRNA target sequences, loss of sive upregulation and activation of oncogenes. However,
UPF1-binding sites, which leads to RNA decay [73], or loss shortening of 30 -UTRs poorly correlates with breast, lung,
of AU-rich elements (AREs), which leads to ARE-directed and colorectal cancer prognosis [93,94], suggesting that the
mRNA degradation [71]. However, there are many excep- relationship between APA and cancer is not straightfor-
tions to the general rule, as proteins that bind to the ward. More recently, altered APA profiles have been linked
30 -UTR can also stabilize mRNAs [74–76]. to muscle disorders such as myotonic dystrophy [95] and
Transcriptome-wide studies have been undertaken to oculopharyngeal muscular dystrophy [70].
elucidate the dynamics of APA regulation. In general,
disruption of the polyadenylation machinery leads to loss From mRNA to protein: alternative translation initiation
of fidelity in the choice of PAS and shortening of the 30 - In addition to the regulation of transcription and proces-
UTRs. There are numerous 30 processing factors involved sing, the translation of transcripts is also tightly regulated.
in polyadenylation; nevertheless, changes in the expres- Regulation of translation defines not only the abundance
sion levels of a single specific factor are sufficient to influ- of a protein but also its amino acid composition through the
ence the choice of PAS. For example, decreased levels of use of different start codons [96], as translation may start
cleavage factor I (CFIm) 68 or poly(A)-binding protein at uORFs or at alternative ORFs (aORFs) (Box 3 and
nuclear 1 (PABPN1) lead to transcriptome-wide shorten- Figure 4).
ing of 30 -UTRs, corresponding to an increased preference In the past, changes in protein synthesis were measured
for noncanonical polyadenylation signals (Figure 3C) exclusively based on proteomic approaches or estimated
[70,77,78]. based on total mRNA levels. More recently, they have been
Many recent transcriptome-wide studies have con-
firmed that distal PASs generally have a strong canonical Box 3. Alternative translation initiation
signal motif [A(A/U)UAAA], whereas proximal PASs di-
verge from the canonical sequence [68,79–81]. Interesting- uORFs are located in the 50 -UTR of a transcript. Depending on the
presence or absence of stop codons and their coding frame, a uORF
ly, tissue-specific regulated PASs can be depleted of the can overlap with the pORF or not. Overlapping and in-frame uORFs
canonical motif. For example, APA in brain seems to be lead to N-terminal extended protein isoforms [8], whereas non-
regulated by an A-rich motif starting just downstream of overlapping uORFs affect the translation of pORFs in various ways
the PAS [82]. A-rich sequences have also been reported [144]: they can block the translation of the pORFs, reducing protein
upstream of cleavage sites for transcripts lacking canonical production; they can promote reinitiation of translation at down-
stream start codons; or they can enhance translation of the main
motifs [83]. pORFs. aORFs are located downstream of the annotated start codon.
Numerous studies based on expressed sequence tags In-frame aORFs give rise to N-terminal truncated isoforms
and microarrays have previously shown the biological [145]. uORFs and aORFs can also be out of frame with respect to
relevance of APA (Box 2) [84,85]. APA profiles are tissue the pORFs and lead to the production of different peptides. The
sequences translated in more than one reading frame are called
specific and appear to be tightly regulated during develop-
dual coding regions [103]. We also note that uORFs and aORFs are
ment and cell differentiation. Most of the findings achieved not the only events that increase the diversity of the translated
by recent transcriptome-wide approaches confirm at a mRNAs and affect protein production. The genetic code can be read
larger scale what was previously observed. The tissue in alternative ways, leading to frameshifting, hopping, stop codon
specificity of APA and the correlation between tissue read-through, recoding, and codon reassignment [146,147], topics
beyond the scope of this review.
and 30 -UTR length seem to be highly conserved between
6
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

(A) Alternave open reading frame assessed via ribosome profiling [97]. Deep sequencing of
(I)
RNA fragments protected by ribosomes determines the
500 bases

212
position of the ribosomes on the RNA molecule at nucleo-
TIS 1 (pORF) tide resolution, allowing exact characterization of the
Prol cells

translation initiation site (TIS) and quantification of levels


TIS 2 (aORF) of translation. Ribosome profiling studies in combination
0
108 with RNA-seq have assessed the extent of alternative
translation initiation, provided insights into the regulatory
Diff cells

TIS 1 (pORF)
mechanisms of this process, and shed light on how it
0
impacts gene expression.
RefSeq genes
Rps20 A common finding of many recent ribosome profiling
mm10 5 bases studies is the widespread use of alternative TISs. Initiation
of translation at alternative TISs may be caused by various
forms of stress but is also observed under normal physio-
logical conditions. Between 50% and 65% of transcripts
aORF in frame, truncated isoform contains more than one TIS [7,98,99]. Most of the detected
TISs are located upstream of the annotated start codons
(II) 500 bases (50–60%), leading to potential uORFs. A minority are
94
located downstream of the annotated start codons
Transcripon

(20%) and lead to N-terminally truncated proteins or


TSS
out-of-frame ORFs. However, some ribosome profiling
0
112
peaks detected as alternative TISs may represent cases
TIS 2 (aORF) of ribosomal stalling. To distinguish these from genuine
Translaon

TIS 1 (pORF)
TISs, proteomic data are essential. These are often difficult
to obtain because the peptides are usually short and
0
RefSeq genes unstable. Moreover, the study of the proteome in a high-
Crip1
5 bases
throughput fashion presents certain technical limitations,
especially for low-abundance proteins, which are difficult
to detect among a diverse pool of proteins [100].
aORF out of frame, novel protein Insights into the mechanisms regulating the choice of an
uORF or aORF over a primary ORF are starting to emerge.
Initiation of translation at near-cognate codons and non-
(B) Upstream open reading frame AUG codons, previously reported for a small number of
2 kb mm10
mRNAs, appears to be common, as approximately 50% of
58
TIS 1 (pORF) translation is initiated at noncanonical codons [98,99].
These noncanonical start codons are enriched in uORFs.
By contrast, TISs located downstream of annotated TISs
Prol cells

uORF
comprise mainly AUG codons. The use of near-cognate
0 _ and non-AUG start codons has been confirmed by mass
102
spectrometry [101]. Interestingly, these codons are recoded
to regular methionines, as all of the produced proteins
pORF
Diff cells

seem to contain an N-terminal methionine.


Recent studies support the leaky scanning theory [102],
0
RefSeq genes
according to which the choice of a downstream TIS depends
Cryab on the strength of the Kozak consensus sequence. It was
shown on a transcriptome-wide scale that initiation at
5′-UTR downstream TISs usually occurs when the Kozak sequence
in the annotated start codon is suboptimal. A similar
uORF Stop codon pORF
mechanism applies for initiation at uORFs. uORFs are
TRENDS in Genetics translated in parallel to their downstream primary ORFs
Figure 4. Alternative translation initiation. Alternative translation initiation sites
(pORFs) if the start codon used in the uORF is a non-AUG,
(TISs) detected by ribosome profiling (http://www.ebi.ac.uk/ena/data/view/
PRJEB7207). (A) Examples of alternative TISs leading to alternative open reading
frames (aORFs) in frame (I) or out of frame (II) with the primary ORF (pORF). In the ribosome profiling), one corresponding to the annotated start codon and one
Rps20 gene (I), a switch in TIS use occurs during cell differentiation. Proliferating located downstream of the annotated start codon, leading to an aORF. The
cells use two TISs, one corresponding to the annotated start codon and the other alternative TIS is shown in the highlighted box. The alternative TIS corresponds to
corresponding to an aORF, the latter of which leads to a truncated protein isoform. an AUG start codon that is out of frame compared with the pORF, indicating the
The alternative TIS is shown in the highlighted box. The top part (gray) shows the presence of a dual coding region. (B) Examples of alternative TISs leading to an
three possible frames and the blue bar shows the frame of the pORF. Because upstream ORF (uORF) in the Cryab gene. Proliferating cells use two TISs, one
ribosome profiling peaks are usually displayed using only the 50 end of each located in the 50 untranslated region (50 -UTR) and one corresponding to the
mapped read, the black line indicates the actual TIS location of the aORF, located annotated start codon. The sequence of the 50 -UTR incorporated by the alternative
12 bp downstream of the mapped peak. In the Crip1 gene (II), only one TIS is shown below the reference track. Extension of the 50 -UTR leads to the
transcription start site (TSS) is present (top track, deep cap analysis of gene translation of an uORF, with a canonical AUG codon and ending before the start
expression (DeepCAGE) [16]) but two different TISs are used (bottom track, codon of the pORF, negatively regulating translation.

7
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

but translation of pORFs is usually repressed if the uORFs chance of recognizing alternative exons [117] or proximal
contain an AUG start codon and a strong Kozak sequence PASs [118,119] and the choice of TSS is linked to a specific
[99]. splicing pattern [120,121] or to the use of specific PASs
Both aORFs and uORFs can give rise to ORFs with [71,122,123].
reading frames different from the pORFs, a phenomenon In addition to links between transcription and mRNA
known as dual coding [103]. The triplet periodicity ob- processing, alternative splicing and APA also appear to
served in ribosome profiling data enables the detection be interdependent. Twenty years ago, it was shown that
of dually decoded regions. Although the extent of dual splicing of the last intron requires definition of the last
coding observed in the human genome in ribosome profil- exon (at least in mammals [124]) and this occurs through
ing studies is only approximately 1%, it has been suggested the cooperation of splicing and polyadenylation factors that
that this might be an underestimate due to technical and interact across the last exon, leading to mutual enhance-
analytical limitations (low coverage and the assumption ment of both splicing and polyadenylation [125]. The
that the two frames must be translated at the same rate) snRNPs U1 and U2 and the U2 auxiliary factor 65 kDa
[103]. subunit (U2AF65), all spliceosome components, are also
The extent to which mRNA levels explain differences in part of the human pre-mRNA 30 processing complex
protein abundance is still debated. Although some studies [126]. These spliceosome components directly interact with
have reported a poor correlation [104] – in the range of cleavage and polyadenylation specific factor (CPSF) and
approximately 40% of protein levels explained by mRNA with CFIm. Splicing factors can also play a role in prema-
levels [105–108] or even less than 20% [109] – others claim ture cleavage and polyadenylation, as shown by the spli-
a much higher correlation of up to approximately 80% ceosomal factor TRAP150 [127].
[110]. Ribosome-associated RNA levels seem to be a good Recent transcriptome-wide studies further support the
proxy for protein levels, as the correlations between mRNA links between splicing and polyadenylation. Alteration of
and protein observed are between 60% and 90% the splicing factor hnRNP H has been shown to have
[109,111]. Nevertheless, a study that compared changes widespread effects on tandem APA, with increased 30 -
at mRNA levels and ribosome-bound mRNAs showed pro- UTR shortening in the presence of hnRNP H and length-
found uncoupling between transcription and translation in ening in its absence (Figure 5A, top). Changes in APA were
several different experiments after treatments with extra- accompanied by changes in alternative splicing. A direct
cellular stimuli or during cell and tissue differentiation link between hnRNP H and the choice of a specific PAS was
[112]. Therefore, it remains unclear whether regulation at shown by crosslinking immunoprecipitation sequencing
the translational level has a major influence on global (CLIP-seq) analysis, by the presence of a higher CLIP
protein abundance or whether it is restricted to a subset tag density next to the proximal PAS [128]. An increase
of genes. in proximal PAS use was also observed after alteration of
Nova, a RBP involved in alternative splicing [36].
Transcription, RNA processing, and translation: High CLIP tag density surrounding proximal PASs has
interdependent processes also been observed for the RBPs MBNL1 and MBNL2
The molecular machineries involved in transcription and (Figure 5A, bottom), which are known to regulate splicing
RNA processing are spatiotemporally coupled. Several [38], and a direct link between MBNL proteins and APA
reviews have extensively described cotranscriptional regu- was recently explained by the competition of MBNL with
lation of capping, splicing, and polyadenylation [113,114]. CFIm68, a component of the polyadenylation machinery
RNA polymerase II (Pol II) is an important player in the [95].
regulation of this coupling, as its C terminus recruits pro- Whether alternative splicing is also coupled to non-
teins involved in capping, splicing, and polyadenylation tandem APA remain unclear. A few studies have specifi-
[115]. There is ample support of the coupling between cally investigated the interdependency between intronic
transcription and splicing. Splicing predominantly occurs polyadenylation and splicing. Cryptic intronic PASs are
during transcription [1,44], as indicated by the following mainly located in large introns with weak 50 splice sites.
three observations: many introns are already spliced in This suggests that intronic polyadenylation can be inhib-
chromatin-associated RNAs; there is enrichment of spliceo- ited if there are splicing enhancers that recognize the 50
somal small nuclear RNAs in chromatin-associated RNAs; splice site, as shown for U1 [129], or enhanced in the case of
and exons that are spliced are enriched for epigenetic chro- suboptimal splicing [130]. The coupling observed in this
matin marks [116]. Nevertheless, splicing events at the case represents kinetic competition between splicing and
30 end of a transcript might occur post-transcriptionally, polyadenylation [131].
giving a general 50 –30 trend in splicing completion. Finally, coupling is not restricted to processes connected
Transcription and splicing are coupled not simply in in space and time. Interdependency has also been shown
space and time but are also jointly responsible for the between processes occurring in different subcellular com-
formation of alternative transcripts. The interdependence partments; for example, between APA and translation.
of different RNA-processing events restricts the number of Cytoplasmic polyadenylation element-binding protein 1
combinations of alternative TSSs, exons, and PASs. Splic- (CPEB1), which shuttles between the nucleus and the
ing and polyadenylation might be influenced not only by cytoplasm, has been shown to play a dual role in APA
the transcription elongation rate but also by transcription and translation [132] (Figure 5B). Interestingly, CPEB1
initiation: a lower elongation rate is linked to slower can also regulate alternative splicing. CPEB1 prevents
splicing and polyadenylation and therefore to an increased recruitment of the splicing factor U2AF65 to the 30 splice
8
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

(A)
(I) (II)
hnRNP H
CPSF CPSF
hnRNP H

AUUAAA CA AAUAAA CA AUUAAA CA AAUAAA CA

Non-canonical Canonical Non-canonical Canonical


PA signal PAS 1 PA signal PAS 2 PA signal PAS 1 PA signal PAS 2
(III) (IV)
CFIm MBNL
CPSF CPSF
MBNL CFIm CFIm
UGUU AUUAAA CA UGUU AAUAAA CA UGUU AUUAAA CA UGUU AAUAAA CA

Non-canonical Canonical Non-canonical Canonical


PA signal PAS 1 PA signal PAS 2 PA signal PAS 1 PA signal PAS 2

(B)

(I)

CPSF CPEB PAP

CPE AUUAAA CA AAUAAA CA CPE AAAAA

Non-canonical Canonical
PA signal PAS 1 PA signal PAS 2 PAS 2
(II)

CPEB CPSF CPEB PAP

CPE AUUAAA CA AAUAAA CA CPE AAAAAAAAAAAAAAAAAAAAAA

Non-canonical Canonical
PA signal PAS 1 PA signal PAS 2 PAS 1
Nucleus Cytoplasm
TRENDS in Genetics

Figure 5. Coupled regulatory mechanisms. (A) Tandem alternative polyadenylation (APA) regulated by splicing factors. The RNA-binding proteins hnRNP H and MBNL
regulate APA in opposing ways. In the presence of hnRNP H (I), cleavage and polyadenylation specific factor (CPSF) binds weaker noncanonical polyadenylation (PA)
signals and cuts at the proximal polyadenylation site (PAS 1) leading to shortening of the 30 untranslated region (30 -UTR), while in its absence (II) only the canonical PA
signal is recognized and cleavage occurs in the distal PAS (PAS 2). (III) MBNL masks the region upstream of weak noncanonical PA signals, blocking the binding of cleavage
factor I (CFIm). This leads to binding of CFIm to a more distal UGUU sequence, followed by binding of CPSF to the distal canonical PA signal and use of the distal PAS (PAS
2). In the absence of MBNL (IV), CFIm can bind proximal UGUU regions and bring the CPSF to weaker PA signals, causing cleavage at the proximal PAS (PAS 1) and
shortening of the 30 -UTR. (B) Coupling of APA and translation. In the nucleus, in the absence of cytoplasmic polyadenylation element-binding protein 1 (CPEB1) (I), CPSF
binds the canonical PA signal and cleaves the RNA at a distal PAS (PAS 2). In the presence of CPEB1 (II), CPEB1 binds the cytoplasmic polyadenylation element (CPE) located
upstream of weak noncanonical PA signals. CPEB1 directly interacts with CPSF, bringing it to regions proximal to the weak PA signal. This leads to their recognition by CPSF
and cleavage at the proximal PAS (PAS 1). When CBEP1 shuttles to the cytoplasm, it again binds to the CPE, but this time to promote lengthening of the poly(A) tail by
poly(A) polymerase (PAP), which results in increased translation efficiency. Lengthening of the poly(A) tails of transcripts bearing proximal PASs (PAS 1) (II) is enhanced by
the fact that the CPE, PAP, and the polyadenylation site are in close proximity, whereas this enhancement is disrupted when the distance is greater due to the 30 -UTR
lengthening in transcripts bearing a distal PAS (PAS 2).

site, but simultaneously recruits the polyadenylation ma- molecular mechanisms that coordinate their formation
chinery. The RBP CPEB1 is an example of a master during transcription and mRNA processing, we still face
regulator that affects three layers of gene expression: technical limitations due to the short read length of next-
splicing, polyadenylation, and translation. generation sequencing data and reliance on statistical and
computational approaches to reconstruct transcript struc-
Concluding remarks ture. This represents an obstacle when trying to link
RNA-seq technologies are elucidating the mechanisms different events occurring in the same RNA molecule.
that expand the genome’s coding capacity and are The only way to specifically determine the exact transcript
quickly redefining the concept of gene expression regula- structure for each detected RNA molecule is by sequencing
tion. full-length RNAs, an option that is currently becoming
Although there is a continuing increase in the number of more feasible [133,134] and that is opening a new era in
transcripts identified, and in the understanding of the the field of RNA-seq.
9
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

References 29 Pedersen, I.S. et al. (2002) Promoter switch: a novel mechanism


causing biallelic PEG1/MEST expression in invasive breast cancer.
1 Djebali, S. et al. (2012) Landscape of transcription in human cells.
Hum. Mol. Genet. 11, 1449–1453
Nature 489, 101–108
30 Agarwal, V.R. et al. (1996) Use of alternative promoters to express the
2 Neph, S. et al. (2012) An expansive human regulatory lexicon encoded
aromatase cytochrome P450 (CYP19) gene in breast adipose tissues
in transcription factor footprints. Nature 489, 83–90
of cancer-free and breast cancer patients. J. Clin. Endocrinol. Metab.
3 Sanyal, A. et al. (2012) The long-range interaction landscape of gene
81, 3843–3849
promoters. Nature 489, 109–113
31 Tan, W. et al. (2007) Molecular cloning of a brain-specific,
4 FANTOM Consortium and RIKEN PMI and CLST (DGT) (2014) A
developmentally regulated neuregulin 1 (NRG1) isoform and
promoter-level mammalian expression atlas. Nature 507, 462–470
identification of a functional promoter variant associated with
5 Goossens, S. et al. (2007) Truncated isoform of mouse aT-catenin is
schizophrenia. J. Biol. Chem. 282, 24343–24351
testis-restricted in expression and function. FASEB J. 21, 647–655
32 Hill, R.E. and Lettice, L.A. (2013) Alterations to the remote control of
6 Barbosa, C. et al. (2013) Gene expression regulation by upstream open
Shh gene expression cause congenital abnormalities. Philos. Trans. R.
reading frames and human disease. PLoS Genet. 9, e1003529
Soc. Lond. B: Biol. Sci. 368, 20120357
7 Calvo, S.E. et al. (2009) Upstream open reading frames cause
33 Pan, Q. et al. (2008) Deep surveying of alternative splicing complexity
widespread reduction of protein expression and are polymorphic
in the human transcriptome by high-throughput sequencing. Nat.
among humans. Proc. Natl. Acad. Sci. U.S.A. 106, 7507–7512
Genet. 40, 1413–1415
8 Fritsch, C. et al. (2012) Genome-wide search for novel human uORFs
34 Gonzalez-Porta, M. et al. (2013) Transcriptome analysis of human
and N-terminal protein extensions using ribosomal footprinting.
tissues and cell lines reveals one dominant transcript per gene.
Genome Res. 22, 2208–2218
Genome Biol. 14, R70
9 Yamashita, R. et al. (2003) Small open reading frames in 50
35 Harrow, J. et al. (2012) GENCODE: the reference human genome
untranslated regions of mRNAs. C. R. Biol. 326, 987–991
annotation for the ENCODE Project. Genome Res. 22, 1760–1774
10 Slavoff, S.A. et al. (2013) Peptidomic discovery of short open reading
36 Licatalosi, D.D. et al. (2008) HITS–CLIP yields genome-wide insights
frame-encoded peptides in human cells. Nat. Chem. Biol. 9, 59–64
into brain alternative RNA processing. Nature 456, 464–469
11 Magny, E.G. et al. (2013) Conserved regulation of cardiac calcium
37 Ule, J. et al. (2003) CLIP identifies Nova-regulated RNA networks in
uptake by peptides encoded in small open reading frames. Science
the brain. Science 302, 1212–1215
341, 1116–1120
38 Wang, E.T. et al. (2012) Transcriptome-wide regulation of pre-mRNA
12 Jorgensen, R.A. and Dorantes-Acosta, A.E. (2012) Conserved peptide
splicing and mRNA localization by muscleblind proteins. Cell 150,
upstream open reading frames are associated with regulatory genes
710–724
in angiosperms. Front. Plant Sci. 3, 191
39 Lebedeva, S. et al. (2011) Transcriptome-wide analysis of regulatory
13 Shiraki, T. et al. (2002) Cap analysis gene expression for high-
interactions of the RNA-binding protein HuR. Mol. Cell 43, 340–352
throughput analysis of transcriptional starting point and
40 Fu, X.D. and Ares, M., Jr (2014) Context-dependent control of
identification of promoter usage. Proc. Natl. Acad. Sci. U.S.A. 100,
alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 15,
15776–15781
689–701
14 de Klerk, E. et al. (2014) RNA sequencing: from tag-based profiling
41 Zhang, C. et al. (2013) Prediction of clustered RNA-binding protein
to resolving complete transcript structure. Cell. Mol. Life Sci. 71,
motif sites in the mammalian genome. Nucleic Acids Res. 41, 6793–6807
3537–3551
42 Ameur, A. et al. (2011) Total RNA sequencing reveals nascent
15 de Hoon, M. and Hayashizaki, Y. (2008) Deep cap analysis gene
transcription and widespread co-transcriptional splicing in the
expression (CAGE): genome-wide identification of promoters,
human brain. Nat. Struct. Mol. Biol. 18, 1435–1440
quantification of their expression, and network inference.
43 Hao, S. and Baltimore, D. (2013) RNA splicing regulates the temporal
Biotechniques 44, 627–628 630, 632
order of TNF-induced gene expression. Proc. Natl. Acad. Sci. U.S.A.
16 Hestand, M.S. et al. (2010) Tissue-specific transcript annotation and
110, 11934–11939
expression profiling with complementary next-generation sequencing
44 Tilgner, H. et al. (2012) Deep sequencing of subcellular RNA fractions
technologies. Nucleic Acids Res. 38, e165
shows splicing to be predominantly co-transcriptional in the human
17 Suzuki, H. et al. (2009) The transcriptional network that controls
genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625
growth arrest and differentiation in a human myeloid leukemia cell
45 Witten, J.T. and Ule, J. (2011) Understanding splicing regulation
line. Nat. Genet. 41, 553–562
through RNA splicing maps. Trends Genet. 27, 89–97
18 Valen, E. et al. (2009) Genome-wide detection and analysis of
46 Yu, Y. et al. (2008) Dynamic regulation of alternative splicing by
hippocampus core promoters using DeepCAGE. Genome Res. 19,
silencers that modulate 50 splice site competition. Cell 135, 1224–1236
255–265
47 Wen, J. et al. (2010) Computational identification of tissue-specific
19 Gustincich, S. et al. (2006) The complexity of the mammalian
alternative splicing elements in mouse genes from RNA-seq. Nucleic
transcriptome. J. Physiol. 575, 321–332
Acids Res. 38, 7895–7907
20 Kapranov, P. et al. (2007) RNA maps reveal new RNA classes and a
48 Shepard, P.J. and Hertel, K.J. (2008) Conserved RNA secondary
possible function for pervasive transcription. Science 316, 1484–1488
structures promote alternative splicing. RNA 14, 1463–1469
21 Andersson, R. et al. (2014) An atlas of active enhancers across human
49 Pervouchine, D.D. et al. (2012) Evidence for widespread association of
cell types and tissues. Nature 507, 455–461
mammalian splicing and conserved long-range RNA structures. RNA
22 Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome
18, 1–15
Project (2009) Post-transcriptional processing generates a diversity of
50 Warf, M.B. et al. (2009) The protein factors MBNL1 and U2AF65
50 -modified long and short RNAs. Nature 457, 1028–1032
bind alternative RNA structures to regulate splicing. Proc. Natl. Acad.
23 Otsuka, Y. et al. (2009) Identification of a cytoplasmic complex that
Sci. U.S.A. 106, 9203–9208
adds a cap onto 50 -monophosphate RNA. Mol. Cell. Biol. 29, 2155–2167
51 Yuan, Y. et al. (2007) Muscleblind-like 1 interacts with RNA hairpins
24 Vitezic, M. et al. (2010) Building promoter aware transcriptional
in splicing target and pathogenic RNAs. Nucleic Acids Res. 35,
regulatory networks using siRNA perturbation and DeepCAGE.
5474–5486
Nucleic Acids Res. 38, 8141–8148
52 Kertesz, M. et al. (2010) Genome-wide measurement of RNA
25 Levanon, D. and Groner, Y. (2004) Structure and regulated expression
secondary structure in yeast. Nature 467, 103–107
of mammalian RUNX genes. Oncogene 23, 4211–4219
53 Lucks, J.B. et al. (2011) Multiplexed RNA structure characterization
26 Steinthorsdottir, V. et al. (2004) Multiple novel transcription
with selective 20 -hydroxyl acylation analyzed by primer extension
initiation sites for NRG1. Gene 342, 97–105
sequencing (SHAPE-seq). Proc. Natl. Acad. Sci. U.S.A. 108, 11063–
27 Davis, W., Jr and Schultz, R.M. (2000) Developmental change in
11068
TATA-box utilization during preimplantation mouse development.
54 Wan, Y. et al. (2014) Landscape and variation of RNA secondary
Dev. Biol. 218, 275–283
structure across the human transcriptome. Nature 505, 706–709
28 Pozner, A. et al. (2007) Developmentally regulated promoter-switch
55 Ding, Y. et al. (2014) In vivo genome-wide profiling of RNA secondary
transcriptionally controls Runx1 function during embryonic
structure reveals novel regulatory features. Nature 505, 696–700
hematopoiesis. BMC Dev. Biol. 7, 84

10
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

56 Giudice, J. et al. (2014) Alternative splicing regulates vesicular 85 Yan, J. and Marr, T.G. (2005) Computational analysis of 30 -ends of
trafficking genes in cardiomyocytes during postnatal heart ESTs shows four classes of alternative polyadenylation in human,
development. Nat. Commun. 5, 3603 mouse, and rat. Genome Res. 15, 369–375
57 Kim, K.K. et al. (2013) Rbfox3-regulated alternative splicing of Numb 86 Miura, P. et al. (2013) Widespread and extensive lengthening of 30
promotes neuronal differentiation during development. J. Cell Biol. UTRs in the mammalian brain. Genome Res. 23, 812–825
200, 443–458 87 Hoque, M. et al. (2013) Analysis of alternative cleavage and
58 Pimentel, H. et al. (2014) A dynamic alternative splicing program polyadenylation by 30 region extraction and deep sequencing. Nat.
regulates gene expression during terminal erythropoiesis. Nucleic Methods 10, 133–139
Acids Res. 42, 4031–4042 88 Li, Y. et al. (2012) Dynamic landscape of tandem 30 UTRs during
59 Gracheva, E.O. et al. (2011) Ganglion-specific splicing of TRPV1 zebrafish development. Genome Res. 22, 1899–1906
underlies infrared sensation in vampire bats. Nature 476, 88–91 89 Mangone, M. et al. (2010) The landscape of C. elegans 30 UTRs. Science
60 Buljan, M. et al. (2012) Tissue-specific splicing of disordered segments 329, 432–435
that embed binding motifs rewires protein interaction networks. Mol. 90 Fu, Y. et al. (2011) Differential genome-wide profiling of tandem 30
Cell 46, 871–883 UTRs among human breast cancer and normal cells by high-
61 Costa, V. et al. (2013) RNA-seq and human complex diseases: recent throughput sequencing. Genome Res. 21, 741–747
accomplishments and future perspectives. Eur. J. Hum. Genet. 21, 91 Lin, Y. et al. (2012) An in-depth map of polyadenylation sites in
134–142 cancer. Nucleic Acids Res. 40, 8460–8471
62 Pistoni, M. et al. (2010) Alternative splicing and muscular dystrophy. 92 Mayr, C. and Bartel, D.P. (2009) Widespread shortening of 30 UTRs by
RNA Biol. 7, 441–452 alternative cleavage and polyadenylation activates oncogenes in
63 Danckwardt, S. et al. (2008) 30 End mRNA processing: molecular cancer cells. Cell 138, 673–684
mechanisms and implications for health and disease. EMBO J. 27, 93 Lembo, A. et al. (2012) Shortening of 30 UTRs correlates with poor
482–498 prognosis in breast and lung cancer. PLoS ONE 7, e31129
64 Andreassi, C. and Riccio, A. (2009) To localize or not to localize: mRNA 94 Morris, A.R. et al. (2012) Alternative cleavage and polyadenylation
fate is in 30 UTR ends. Trends Cell Biol. 19, 465–474 during colorectal cancer development. Clin. Cancer Res. 18, 5256–
65 Fabian, M.R. et al. (2010) Regulation of mRNA translation and 5266
stability by microRNAs. Annu. Rev. Biochem. 79, 351–379 95 Batra, R. et al. (2014) Loss of MBNL Leads to disruption of
66 Derti, A. et al. (2012) A quantitative atlas of polyadenylation in five developmentally regulated alternative polyadenylation in RNA-
mammals. Genome Res. 22, 1173–1183 mediated disease. Mol. Cell 56, 311–322
67 Ozsolak, F. et al. (2010) Comprehensive polyadenylation site maps in 96 Kochetov, A.V. (2008) Alternative translation start sites and hidden
yeast and human reveal pervasive alternative polyadenylation. Cell coding potential of eukaryotic mRNAs. Bioessays 30, 683–691
143, 1018–1029 97 Ingolia, N.T. et al. (2012) The ribosome profiling strategy for
68 Shepard, P.J. et al. (2011) Complex and dynamic landscape of RNA monitoring translation in vivo by deep sequencing of ribosome-
polyadenylation revealed by PAS-seq. RNA 17, 761–772 protected mRNA fragments. Nat. Protoc. 7, 1534–1550
69 Yao, C. et al. (2012) Transcriptome-wide analyses of CstF64–RNA 98 Ingolia, N.T. et al. (2011) Ribosome profiling of mouse embryonic stem
interactions in global regulation of mRNA alternative cells reveals the complexity and dynamics of mammalian proteomes.
polyadenylation. Proc. Natl. Acad. Sci. U.S.A. 109, 18773–18778 Cell 147, 789–802
70 de Klerk, E. et al. (2012) Poly(A) binding protein nuclear 1 levels affect 99 Lee, S. et al. (2012) Global mapping of translation initiation sites in
alternative polyadenylation. Nucleic Acids Res. 40, 9089–9101 mammalian cells at single-nucleotide resolution. Proc. Natl. Acad.
71 Ji, Z. et al. (2011) Transcriptional activity regulates alternative Sci. U.S.A. 109, E2424–E2432
cleavage and polyadenylation. Mol. Syst. Biol. 7, 534 100 Wasinger, V.C. et al. (2013) Current status and advances in
72 Ni, T. et al. (2013) Distinct polyadenylation landscapes of diverse quantitative proteomic mass spectrometry. Int. J. Proteomics 2013,
human tissues revealed by a modified PA-seq strategy. BMC 180605
Genomics 14, 615 101 Menschaert, G. et al. (2013) Deep proteome coverage based on
73 Hogg, J.R. and Goff, S.P. (2010) Upf1 senses 30 UTR length to ribosome profiling aids mass spectrometry-based protein and
potentiate mRNA decay. Cell 143, 379–389 peptide discovery and provides evidence of alternative translation
74 Ray, D. et al. (2013) A compendium of RNA-binding motifs for products and near-cognate translation initiation events. Mol. Cell.
decoding gene regulation. Nature 499, 172–177 Proteomics 12, 1780–1790
75 Gupta, I. et al. (2014) Alternative polyadenylation diversifies post- 102 Kozak, M. (2005) Regulation of translation via mRNA structure in
transcriptional regulation by selective RNA–protein interactions. prokaryotes and eukaryotes. Gene 361, 13–37
Mol. Syst. Biol. 10, 719 103 Michel, A.M. et al. (2012) Observation of dually decoded regions of
76 Spies, N. et al. (2013) 30 UTR-isoform choice has limited influence on the human genome using ribosome profiling data. Genome Res. 22,
the stability and translational efficiency of most mRNAs in mouse 2219–2229
fibroblasts. Genome Res. 23, 2078–2090 104 Maier, T. et al. (2009) Correlation of mRNA and protein in complex
77 Jenal, M. et al. (2012) The poly(A)-binding protein nuclear biological samples. FEBS Lett. 583, 3966–3973
1 suppresses alternative cleavage and polyadenylation sites. Cell 105 Lundberg, E. et al. (2010) Defining the transcriptome and proteome in
149, 538–553 three functionally different human cell lines. Mol. Syst. Biol. 6, 450
78 Martin, G. et al. (2012) Genome-wide analysis of pre-mRNA 30 end 106 Schwanhausser, B. et al. (2011) Global quantification of mammalian
processing reveals a decisive role of human cleavage factor I in the gene expression control. Nature 473, 337–342
regulation of 30 UTR length. Cell Rep. 1, 753–763 107 Tian, Q. et al. (2004) Integrated genomic and proteomic analyses
79 Jan, C.H. et al. (2011) Formation, regulation and evolution of of gene expression in mammalian cells. Mol. Cell. Proteomics 3,
Caenorhabditis elegans 30 UTRs. Nature 469, 97–101 960–969
80 Smibert, P. et al. (2012) Global patterns of tissue-specific alternative 108 Vogel, C. et al. (2010) Sequence signatures and mRNA concentration
polyadenylation in Drosophila. Cell Rep. 1, 277–289 can explain two-thirds of protein abundance variation in a human cell
81 Ulitsky, I. et al. (2012) Extensive alternative polyadenylation during line. Mol. Syst. Biol. 6, 400
zebrafish development. Genome Res. 22, 2054–2066 109 Ingolia, N.T. et al. (2009) Genome-wide analysis in vivo of translation
82 Hafez, D. et al. (2013) Genome-wide identification and predictive with nucleotide resolution using ribosome profiling. Science 324,
modeling of tissue-specific alternative polyadenylation. Bioinformatics 218–223
29, i108–i116 110 Li., J.J. et al. (2014) System wide analyses have underestimated
83 Nunes, N.M. et al. (2010) A functional human poly(A) site requires protein abundances and the importance of transcription in
only a potent DSE and an A-rich upstream sequence. EMBO J. 29, mammals. PeerJ 2, e270
1523–1536 111 Wang, T. et al. (2013) Translating mRNAs strongly correlate to
84 Tian, B. et al. (2005) A large-scale analysis of mRNA polyadenylation proteins in a multivariate manner and their translation ratios are
of human and mouse genes. Nucleic Acids Res. 33, 201–212 phenotype specific. Nucleic Acids Res. 41, 4743–4754

11
TIGS-1175; No. of Pages 12

Feature Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

112 Tebaldi, T. et al. (2012) Widespread uncoupling between 131 Luo, W. et al. (2013) The conserved intronic cleavage and
transcriptome and translatome variations after a stimulus in polyadenylation site of CstF-77 gene imparts control of 30 end
mammalian cells. BMC Genomics 13, 220 processing activity through feedback autoregulation and by U1
113 Auboeuf, D. et al. (2005) A subset of nuclear receptor coregulators act snRNP. PLoS Genet. 9, e1003613
as coupling proteins during synthesis and maturation of RNA 132 Bava, F.A. et al. (2013) CPEB1 coordinates alternative 30 -UTR
transcripts. Mol. Cell. Biol. 25, 5307–5316 formation with translational regulation. Nature 495, 121–125
114 Bentley, D.L. (2014) Coupling mRNA processing with transcription in 133 Au, K.F. et al. (2013) Characterization of the human ESC
time and space. Nat. Rev. Genet. 15, 163–175 transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. U.S.A.
115 Hsin, J.P. and Manley, J.L. (2012) The RNA polymerase II CTD 110, E4821–E4830
coordinates transcription and RNA processing. Genes Dev. 26, 2119–2137 134 Sharon, D. et al. (2013) A single-molecule long-read survey of the
116 Brown, S.J. et al. (2012) Chromatin and epigenetic regulation of human transcriptome. Nat. Biotechnol. 31, 1009–1014
pre-mRNA processing. Hum. Mol. Genet. 21, R90–R96 135 Sugnet, C.W. et al. (2004) Transcriptome and genome conservation of
117 Dujardin, G. et al. (2013) Transcriptional elongation and alternative alternative splicing events in humans and mice. Pac. Symp.
splicing. Biochim. Biophys. Acta 1829, 134–140 Biocomput. 2004, 66–77
118 Hazelbaker, D.Z. et al. (2013) Kinetic competition between RNA 136 Sorek, R. et al. (2004) How prevalent is functional alternative
polymerase II and Sen1-dependent transcription termination. Mol. splicing in the human genome? Trends Genet. 20, 68–71
Cell 49, 55–66 137 Resch, A. et al. (2004) Evidence for a subpopulation of conserved
119 Pinto, P.A. et al. (2011) RNA polymerase II kinetics in polo alternative splicing events under selection pressure for protein
polyadenylation signal selection. EMBO J. 30, 2431–2444 reading frame preservation. Nucleic Acids Res. 32, 1261–1269
120 Benson, M.J. et al. (2012) Heterogeneous nuclear ribonucleoprotein L- 138 Sorek, R. et al. (2014) Minimal conditions for exonization of
like (hnRNPLL) and elongation factor, RNA polymerase II, 2 (ELL2) intronic sequences: 50 splice site formation in Alu exons. Mol. Cell
are regulators of mRNA processing in plasma cells. Proc. Natl. Acad. 14, 221–231
Sci. U.S.A. 109, 16252–16257 139 Mortazavi, A. et al. (2008) Mapping and quantifying mammalian
121 Huang, D.W. et al. (2009) Systematic and integrative analysis of transcriptomes by RNA-seq. Nat. Methods 5, 621–628
large gene lists using DAVID bioinformatics resources. Nat. Protoc. 140 Zhang, H. et al. (2005) Biased alternative polyadenylation in human
4, 44–57 tissues. Genome Biol. 6, R100
122 Huang, Y. et al. (2012) Mediator complex regulates alternative mRNA 141 Ji, Z. et al. (2009) Progressive lengthening of 30 untranslated
processing via the MED23 subunit. Mol. Cell 45, 459–469 regions of mRNAs by alternative polyadenylation during mouse
123 Nagaike, T. et al. (2011) Transcriptional activators enhance embryonic development. Proc. Natl. Acad. Sci. U.S.A. 106,
polyadenylation of mRNA precursors. Mol. Cell 41, 409–418 7028–7033
124 Martinson, H.G. (2011) An active role for splicing in 30 -end formation. 142 Sandberg, R. et al. (2008) Proliferating cells express mRNAs with
Wiley Interdiscip. Rev. RNA 2, 459–470 shortened 30 untranslated regions and fewer microRNA target sites.
125 Berget, S.M. (1995) Exon recognition in vertebrate splicing. J. Biol. Science 320, 1643–1647
Chem. 270, 2411–2414 143 Ji, Z. and Tian, B. (2009) Reprogramming of 30 UTR untranslated
126 Shi, Y. et al. (2009) Molecular architecture of the human pre-mRNA 30 regions of mRNAs by alternative polyadenylation in generation
processing complex. Mol. Cell 33, 365–376 of pluripotent stem cells from different cell types. PLoS ONE 4,
127 Lee, K.M. and Tarn, W.Y. (2014) TRAP150 activates splicing in e8419
composite terminal exons. Nucleic Acids Res. Published online 144 Wethmar, K. (2014) The regulatory potential of upstream open
October 17, 2014. (http://dx.doi.org/10.1093/nar/gku963) reading frames in eukaryotic gene expression. Wiley Interdiscip.
128 Katz, Y. et al. (2010) Analysis and design of RNA sequencing Rev. RNA 5, 765–778
experiments for identifying isoform regulation. Nat. Methods 7, 145 Vanderperre, B. et al. (2013) Direct detection of alternative open
1009–1015 reading frames translation products in human significantly
129 Kaida, D. et al. (2010) U1 snRNP protects pre-mRNAs from expands the proteome. PLoS ONE 8, e70698
premature cleavage and polyadenylation. Nature 468, 664–668 146 Atkins, J.F. and Raymond, F., eds (2010) Recoding: Expansion of
130 Tian, B. et al. (2007) Widespread mRNA polyadenylation events in Decoding Rules Enriches Gene Expression, Springer
introns indicate dynamic interplay between polyadenylation and 147 Atkins, J.F. and Baranov, P.V. (2010) The distinction between
splicing. Genome Res. 17, 156–165 recoding and codon reassignment. Genetics 185, 1535–1536

12
View publication stats

You might also like