47
Interrogating the transcriptome in all its diversity Joel H Graber

Interrogating the transcriptome in all its diversity

  • Upload
    fayola

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Interrogating the transcriptome in all its diversity. Joel H Graber. Why were so many predictions of the number of genes in a mammalian genome wrong?. Nature Genetics , June 2000, v25 , n2. Mammalian genomes contain far more transcript variants than protein variants. - PowerPoint PPT Presentation

Citation preview

Page 1: Interrogating the  transcriptome  in all its diversity

Interrogating the transcriptome in all its diversity

Joel H Graber

Page 2: Interrogating the  transcriptome  in all its diversity

• Nature Genetics, June 2000, v25, n2.

Why were so many predictions of the number of genes in a mammalian genome wrong?

Page 3: Interrogating the  transcriptome  in all its diversity

Mammalian genomes contain far more transcript variants than protein variants

• Average protein products per locus = 1.7• Average distinct transcripts per locus = 5.7

Genome Biology (2009) 10:201.

Page 4: Interrogating the  transcriptome  in all its diversity

A processed, protein coding mRNA molecule includes distinct functional regions

Protein coding sequence5’-untranslated

region (5’-UTR)3’-untranslated

Region (3’-UTR)

Genomic sequence

Page 5: Interrogating the  transcriptome  in all its diversity

Pieces of a (Eukaryotic) Protein -Coding Gene(on the genome)

5’

3’

3’

5’

~ 1-100 Mbp

5’

3’

3’

5’……

……

~ 1-1000 kbp

exons (cds & utr) / introns(~ 102-103 bp) (~ 102-105 bp)

Polyadenylation site (~10-100 bp)

promoter (~103 bp)

enhancers (~10-100 bp) other regulatory sequences (~ 10-100 bp)

Page 6: Interrogating the  transcriptome  in all its diversity

Alternate mRNA processing can lead to multiple transcript and/or protein products

……

3 transcripts1 protein product

Page 7: Interrogating the  transcriptome  in all its diversity

Carolyn demonstrates gene regulation

Transcription control

mRNA degradation

mRNA localization

Protein degradation

Translation control

Protein = water in pool

mRNA = water in hose

DNA = water in pipes

Page 8: Interrogating the  transcriptome  in all its diversity

A somewhat more formal view of regulation in the various stages of gene expression

Page 9: Interrogating the  transcriptome  in all its diversity

Systematic changes to mRNA processing can significantly change the regulatory program of a cell

• Changes can be in a single gene or systemic

• Regulatory control during transcript generation– Transcription initiation site– Splicing pattern– 3’-processing (polyadenylation and cleavage) site– RNA editing

• Subsequent isoform-specific regulatory control– Stability– Translational efficiency– Localization

Page 10: Interrogating the  transcriptome  in all its diversity

A brief history of transcript measurement

Page 11: Interrogating the  transcriptome  in all its diversity

Implications of transcript variation for gene expression measurement

• Most large scale expression studies report one level per gene per sample– Microarrays:

• One reported value of expression per probeset; • Duplicate probesets are either averaged or discarded

– mRNAseq• RPKM (reads per kilobase of transcript per million reads)

• For many genes, summarization to one expression level in a given cell type is inadequate

Page 12: Interrogating the  transcriptome  in all its diversity

Every time we find a new way to measure RNA, we find previously unknown types

Mattick et al, Trends Genet 2009

Page 13: Interrogating the  transcriptome  in all its diversity

Classes of alternative transcripts

• Alternative splicing

• Alternative transcript initiation sites

• Alternative cleavage and polyadenylation (3’-processing)

• Combinations of one or more of these

Page 14: Interrogating the  transcriptome  in all its diversity

The cascade of alternative mRNA processing in gene regulation

mRNA processing selections during mRNA generation can have a profound effect on downstream regulation of the resulting transcript

Page 15: Interrogating the  transcriptome  in all its diversity

Processing and specifically alternative processing are controlled by cis-elements and transfactors

• mRNA processing signals are typically constrained in both sequence content and positioning

• Activity of specific sites is a function of the strength of the local signals and the cell/environment specific concentrations/activities of transfactors

Page 16: Interrogating the  transcriptome  in all its diversity

Alternative splicing

Page 17: Interrogating the  transcriptome  in all its diversity

Alternative splicing can occur in several ways

http://www.wormbook.org/

Page 18: Interrogating the  transcriptome  in all its diversity

Splicing signals and interacting factors

Page 19: Interrogating the  transcriptome  in all its diversity

Cis elements required for splicing

Vertebrates

BP

ESE

ESE? ESE?

UA-rich UA-rich

ESE

Yeast

Plants

GUAAGU

GUAUGU

GUAAGU

AG

AG

CURAY

UACUAAC

CURAY

NCAG

YAG

UGYAG

GU

GU

YYYY10-15

62 6479

10099 42

70 9558 100

49 100 4453 57

5‘ss 3‘ss

5‘ss – 5‘ splice site (donor site)3‘ss – 3‘ splice site (acceptor site)BP – branch point (A is branch point base)YYYY10-15 – polypyrimidine track

Y – pyrimidineR – purineN – any base

Page 20: Interrogating the  transcriptome  in all its diversity

PWM representations of splice site signals (mice)

Page 21: Interrogating the  transcriptome  in all its diversity

Frequency of bases in each position of the splice sites

Donor sequences: 5’ splice site

exon intron%A 30 40 64 9 0 0 62 68 9 17 39 24%U 20 7 13 12 0 100 6 12 5 63 22 26%C 30 43 12 6 0 0 2 9 2 12 21 29%G 19 9 12 73 100 0 29 12 84 9 18 20

A G G U A A G U

Acceptor sequences: 3’ splice site

intron exon%A 15 10 10 15 6 15 11 19 12 3 10 25 4 100 0 22 17%U 51 44 50 53 60 49 49 45 45 57 58 29 31 0 0 8 37%C 19 25 31 21 24 30 33 28 36 36 28 22 65 0 0 18 22%G 15 21 10 10 10 6 7 9 7 7 5 24 1 0 100 52 25

Y Y Y Y Y Y Y Y Y Y Y N Y A G G

Polypyrimidine track (Y = U or C; N = any nucleotide)

Page 22: Interrogating the  transcriptome  in all its diversity

Example 1: Insulin-like growth factor 1 (Igf1)

• AKA somatomedin C or mechano growth factor• Produced primarily by the liver as an endocrine hormone• Primary action is mediated by binding to IGF1R• Natural activator of the AKT pathway• A primary mediator of the effects of growth hormone• Expression has been

– Negatively correlated with lifespan– Positively correlated with body size

• Its regulatory control remains poorly understand after 30y

Page 23: Interrogating the  transcriptome  in all its diversity

IGF1 is subject to extensive alternative mRNA processing

~83,000 nt

Page 24: Interrogating the  transcriptome  in all its diversity

IGF1 mRNA data indicates at least 15 or more transcript isoforms

Page 25: Interrogating the  transcriptome  in all its diversity

Salient features of IGF1 expression

• Mature, circulating IGF1 protein is a cleavage product, coded entirely in exons 3 and 4

• Exon 5 contains an additional peptide cleavage product, with demonstrated independent functionality

• Exons 1 and 2 are mutually exclusive, and likely not the only upstream, transcript initiating exons

• Exon 5 can be skipped, included or 3’-terminal

• Exon 6’s reading frame changes depending on whether it is spliced from exon 4 or 5

Page 26: Interrogating the  transcriptome  in all its diversity

IGF1 has two possible terminal exons (5 and 6)

~22,000 nt

Page 27: Interrogating the  transcriptome  in all its diversity

IGF1 Exon 6, if included can vary between ~200 and ~6400 nt

Page 28: Interrogating the  transcriptome  in all its diversity

Alternative polyadenylation

Page 29: Interrogating the  transcriptome  in all its diversity

Alternative 3’-processing can arise in several ways with varying consequences

Adapted from Yan J, et al.,Genome Research. 2005; 15(3):369-75.

Page 30: Interrogating the  transcriptome  in all its diversity

PAPOL68 kD 73 kD160 kD

25 kD

30 kD

100 kDCPSF 50 kD

64 kD

77 kD

hnRNP H

Symplekin

PolyA site selection depends on sequence elements and abundance/stochiometry of trans-factors

AAUAAAUGUA

G-rich

U-rich

UG-rich

5’

3’

PAS

DSE64 kD

77 kD

50 kD

CSTF

Up to >80 proteins in complex

Page 31: Interrogating the  transcriptome  in all its diversity

NMF defines patterns of signals that control 3’-processing (cleavage and polyadenylation)

Page 32: Interrogating the  transcriptome  in all its diversity

Example 2: Insulin-like growth factor 2 mRNA binding protein 1 (Igf2bp1)

• Contains four K homology domains and two RNA recognition motifs

• Binds to the 5’-UTR of IGF2 mRNA, regulating translation• Can act as an oncogene if misregulated• Evolutionarily conserved, with critical role in mRNA

localization and translational control

Page 33: Interrogating the  transcriptome  in all its diversity

Consequences: Igf2bp1 has transforming potential only when expressed in its truncated isoform

~50,000 nt

~6,500 nt

Mayr and Bartel, Cell 2009

AAA… AAA…

5’ 3’

Page 34: Interrogating the  transcriptome  in all its diversity

Inclusion (or exclusion) of regulatory sequences in the 3’-UTR fine tune expression and response

• Spicher et al, Mol Cell Biol 1998

Page 35: Interrogating the  transcriptome  in all its diversity

Example 3: Regulated control of polyA site selection for anitbodies during B-cell maturation

Page 36: Interrogating the  transcriptome  in all its diversity

Alternative transcription initiation

Page 37: Interrogating the  transcriptome  in all its diversity

Alternative transcription initiation can arise in several ways with varying consequences

Page 38: Interrogating the  transcriptome  in all its diversity
Page 39: Interrogating the  transcriptome  in all its diversity
Page 40: Interrogating the  transcriptome  in all its diversity
Page 41: Interrogating the  transcriptome  in all its diversity
Page 42: Interrogating the  transcriptome  in all its diversity

CAGE tags showed an unexpectedly high frequency in the 3’-UTR

Page 43: Interrogating the  transcriptome  in all its diversity

3’-UTR CAGE tags occur in evolutionarily conserved contexts with a common local sequence

Page 44: Interrogating the  transcriptome  in all its diversity

The definition of a gene becomes much more fluid: Ins2-IGF2

• Two genes with spurious connection?• One large genes with distinct, disjoint transcripts?

Page 45: Interrogating the  transcriptome  in all its diversity

Cleaved 3’-UTR RNA products (uaRNAs) are often tissue-specific and can localize differentially

Page 46: Interrogating the  transcriptome  in all its diversity

Next time: Details of measuring transcript differences in

large-scale

Page 47: Interrogating the  transcriptome  in all its diversity