Upload
fayola
View
31
Download
0
Embed Size (px)
DESCRIPTION
Interrogating the transcriptome in all its diversity. Joel H Graber. Why were so many predictions of the number of genes in a mammalian genome wrong?. Nature Genetics , June 2000, v25 , n2. Mammalian genomes contain far more transcript variants than protein variants. - PowerPoint PPT Presentation
Citation preview
Interrogating the transcriptome in all its diversity
Joel H Graber
• Nature Genetics, June 2000, v25, n2.
Why were so many predictions of the number of genes in a mammalian genome wrong?
Mammalian genomes contain far more transcript variants than protein variants
• Average protein products per locus = 1.7• Average distinct transcripts per locus = 5.7
Genome Biology (2009) 10:201.
A processed, protein coding mRNA molecule includes distinct functional regions
Protein coding sequence5’-untranslated
region (5’-UTR)3’-untranslated
Region (3’-UTR)
Genomic sequence
Pieces of a (Eukaryotic) Protein -Coding Gene(on the genome)
5’
3’
3’
5’
~ 1-100 Mbp
5’
3’
3’
5’……
……
~ 1-1000 kbp
exons (cds & utr) / introns(~ 102-103 bp) (~ 102-105 bp)
Polyadenylation site (~10-100 bp)
promoter (~103 bp)
enhancers (~10-100 bp) other regulatory sequences (~ 10-100 bp)
Alternate mRNA processing can lead to multiple transcript and/or protein products
……
3 transcripts1 protein product
Carolyn demonstrates gene regulation
Transcription control
mRNA degradation
mRNA localization
Protein degradation
Translation control
Protein = water in pool
mRNA = water in hose
DNA = water in pipes
A somewhat more formal view of regulation in the various stages of gene expression
Systematic changes to mRNA processing can significantly change the regulatory program of a cell
• Changes can be in a single gene or systemic
• Regulatory control during transcript generation– Transcription initiation site– Splicing pattern– 3’-processing (polyadenylation and cleavage) site– RNA editing
• Subsequent isoform-specific regulatory control– Stability– Translational efficiency– Localization
A brief history of transcript measurement
Implications of transcript variation for gene expression measurement
• Most large scale expression studies report one level per gene per sample– Microarrays:
• One reported value of expression per probeset; • Duplicate probesets are either averaged or discarded
– mRNAseq• RPKM (reads per kilobase of transcript per million reads)
• For many genes, summarization to one expression level in a given cell type is inadequate
Every time we find a new way to measure RNA, we find previously unknown types
Mattick et al, Trends Genet 2009
Classes of alternative transcripts
• Alternative splicing
• Alternative transcript initiation sites
• Alternative cleavage and polyadenylation (3’-processing)
• Combinations of one or more of these
The cascade of alternative mRNA processing in gene regulation
mRNA processing selections during mRNA generation can have a profound effect on downstream regulation of the resulting transcript
Processing and specifically alternative processing are controlled by cis-elements and transfactors
• mRNA processing signals are typically constrained in both sequence content and positioning
• Activity of specific sites is a function of the strength of the local signals and the cell/environment specific concentrations/activities of transfactors
Alternative splicing
Alternative splicing can occur in several ways
http://www.wormbook.org/
Splicing signals and interacting factors
Cis elements required for splicing
Vertebrates
BP
ESE
ESE? ESE?
UA-rich UA-rich
ESE
Yeast
Plants
GUAAGU
GUAUGU
GUAAGU
AG
AG
CURAY
UACUAAC
CURAY
NCAG
YAG
UGYAG
GU
GU
YYYY10-15
62 6479
10099 42
70 9558 100
49 100 4453 57
5‘ss 3‘ss
5‘ss – 5‘ splice site (donor site)3‘ss – 3‘ splice site (acceptor site)BP – branch point (A is branch point base)YYYY10-15 – polypyrimidine track
Y – pyrimidineR – purineN – any base
PWM representations of splice site signals (mice)
Frequency of bases in each position of the splice sites
Donor sequences: 5’ splice site
exon intron%A 30 40 64 9 0 0 62 68 9 17 39 24%U 20 7 13 12 0 100 6 12 5 63 22 26%C 30 43 12 6 0 0 2 9 2 12 21 29%G 19 9 12 73 100 0 29 12 84 9 18 20
A G G U A A G U
Acceptor sequences: 3’ splice site
intron exon%A 15 10 10 15 6 15 11 19 12 3 10 25 4 100 0 22 17%U 51 44 50 53 60 49 49 45 45 57 58 29 31 0 0 8 37%C 19 25 31 21 24 30 33 28 36 36 28 22 65 0 0 18 22%G 15 21 10 10 10 6 7 9 7 7 5 24 1 0 100 52 25
Y Y Y Y Y Y Y Y Y Y Y N Y A G G
Polypyrimidine track (Y = U or C; N = any nucleotide)
Example 1: Insulin-like growth factor 1 (Igf1)
• AKA somatomedin C or mechano growth factor• Produced primarily by the liver as an endocrine hormone• Primary action is mediated by binding to IGF1R• Natural activator of the AKT pathway• A primary mediator of the effects of growth hormone• Expression has been
– Negatively correlated with lifespan– Positively correlated with body size
• Its regulatory control remains poorly understand after 30y
IGF1 is subject to extensive alternative mRNA processing
~83,000 nt
IGF1 mRNA data indicates at least 15 or more transcript isoforms
Salient features of IGF1 expression
• Mature, circulating IGF1 protein is a cleavage product, coded entirely in exons 3 and 4
• Exon 5 contains an additional peptide cleavage product, with demonstrated independent functionality
• Exons 1 and 2 are mutually exclusive, and likely not the only upstream, transcript initiating exons
• Exon 5 can be skipped, included or 3’-terminal
• Exon 6’s reading frame changes depending on whether it is spliced from exon 4 or 5
IGF1 has two possible terminal exons (5 and 6)
~22,000 nt
IGF1 Exon 6, if included can vary between ~200 and ~6400 nt
Alternative polyadenylation
Alternative 3’-processing can arise in several ways with varying consequences
Adapted from Yan J, et al.,Genome Research. 2005; 15(3):369-75.
PAPOL68 kD 73 kD160 kD
25 kD
30 kD
100 kDCPSF 50 kD
64 kD
77 kD
hnRNP H
Symplekin
PolyA site selection depends on sequence elements and abundance/stochiometry of trans-factors
AAUAAAUGUA
G-rich
U-rich
UG-rich
5’
3’
PAS
DSE64 kD
77 kD
50 kD
CSTF
Up to >80 proteins in complex
NMF defines patterns of signals that control 3’-processing (cleavage and polyadenylation)
Example 2: Insulin-like growth factor 2 mRNA binding protein 1 (Igf2bp1)
• Contains four K homology domains and two RNA recognition motifs
• Binds to the 5’-UTR of IGF2 mRNA, regulating translation• Can act as an oncogene if misregulated• Evolutionarily conserved, with critical role in mRNA
localization and translational control
Consequences: Igf2bp1 has transforming potential only when expressed in its truncated isoform
~50,000 nt
~6,500 nt
Mayr and Bartel, Cell 2009
AAA… AAA…
5’ 3’
Inclusion (or exclusion) of regulatory sequences in the 3’-UTR fine tune expression and response
• Spicher et al, Mol Cell Biol 1998
Example 3: Regulated control of polyA site selection for anitbodies during B-cell maturation
Alternative transcription initiation
Alternative transcription initiation can arise in several ways with varying consequences
CAGE tags showed an unexpectedly high frequency in the 3’-UTR
3’-UTR CAGE tags occur in evolutionarily conserved contexts with a common local sequence
The definition of a gene becomes much more fluid: Ins2-IGF2
• Two genes with spurious connection?• One large genes with distinct, disjoint transcripts?
Cleaved 3’-UTR RNA products (uaRNAs) are often tissue-specific and can localize differentially
Next time: Details of measuring transcript differences in
large-scale