Alternative splicing in Alternative splicing in human tissueshuman tissues
John Castle
July 20, 2007Rosetta / Merck
Seattle, USA
AcknowledgementsAcknowledgements
Rosetta/MerckJason JohnsonChris Armour Ronghua ChenPhil Garrett-EngeleAmit KulkarniLee LimChris RaymondJyoti Shah
CollaboratorsTom Cooper (Baylor)
(Data not shown)Guey-Shin WangDuyen Tran
Chaolin Zhang (summer intern from CSHL)Publications citedSpellman and Smith, 2006Boutz et al., 2007 (Ares and Black labs)Zhu, B. et al., 2005 (Gulick lab)
Needs for alternative splicing at a Needs for alternative splicing at a pharmaceutical companypharmaceutical company
Biomarker identificationIdentify transcript structures expressed in a tissue,
including novel isoformsReport regulation of known isoforms for use in pathway
analysis or as drug targets
RequirementsRequirementsRobust, high-confidence, high-throughput profiling
– Array patterns, amplification, sequencing
Visualization and analysis tools for profiling dataInterpretation, including isoform function and pathways
Discovery-mode junction arrays (Refseqs only):multiple samples help, the events you don’t monitor hinder interpretation
Junction Probe
X06989
NM_201414
NM_000484
Predictions from residuals (model-observed)
Junction
Tis
sues
Log probe intensity
Junction
Melanoma
Lung Carcinoma
Brain
Fetal BrainBrain-Amygdala
400300200100
Brain samples
6 7 95
RT-PCR Primers
mixture withnon-Refseqnot predicted(no probe)
Gene = APP
Some probe strategies and limitation for Some probe strategies and limitation for microarray patternsmicroarray patterns
Exon arrays – Easiest to design and interpret– Miss substantial fraction of known AS events 5’SS, 3’SS, intron retentions
Junction arrays– Less control over probe position– Single probe events difficult to interpret (e.g. 3-nt 3’SS vs. 300 nt inserted exon), complicating validation– More known and unknown events detected, but accordingly more difficult to interpret– May want to use exon-to-intron junction probes as controls for “half-hybridization” in intensity-based experiments
Junction + exon arrays– 2x expense, but more markers of same event– Different probe lengths affect intensities (not ratios)
Splicing event arrays (mutually exclusive probes)– Coverage of more splicing events from ESTs– Need extra probes to estimate gene-level expression– Can’t detect novel splicing events– Unmonitored splicing events/mixtures more easily confound analysis
Junction + exon + mutually exclusive probes for known events– Expensive– Most complex data analysis– Best balance of discovery and monitoring (one data set for both)– Enables reciprocal analysis
All methods– False predictions created by saturated probes, dim probes, constant probes, specific & non-specific cross-hybridization– Splicing predictions can be confounded by strong gene-level regulation or complex isoform mixtures– Read-out differentially regulated splicing events – difficult to accurately quantitate isoforms– More samples help
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
1 10 *12atg1 10 *12atg
Brain
Fetal Brain
Colon
400300200100
Junction probe for double exon drop
Junction + Exon + Isoform-monitoring probesJunction + Exon + Isoform-monitoring probes
1 10 *12atg1 8 *7atg
Analysis tools for alternative splicing Analysis tools for alternative splicing profiling dataprofiling data
Confounding factors include• Inaccurate estimates of the gene expression• Spurious measurements, such as from cross-
hybridization, bad probes, and high background
Needed output• A confidence value (e.g., a p-value)• Expression changes, including both:
– Exon regulation– Splicing event regulation (e.g., % of total gene expression)
• Several measurement types, including:– Novel transcript script structure– Individual regulated exon (may be a minor component)– Differentially expressed alternative splicing event
SyntheticSynthetic data show how dark and saturated data show how dark and saturated can lead to compressed ratioscan lead to compressed ratios
These probes compress ratios and can erroneous values:• Probes with high background • Probes always dim• Probes near saturation
If not filtered, these appear to be novel splicing predictions.
Real data showing single probe predictions Real data showing single probe predictions with high cross-hyb potentialwith high cross-hyb potential
Gene level consistent across samples, but relatively low
All alternative isoforms show low expression in both samples and no regulation.
One probe shows differential expression with intensities much higher than the gene expression, suggesting cross-hybridization.
We have an 0-for-8 validation rate of these predictions.
What transcript structure is expressed?What transcript structure is expressed?
Large prediction, but change in isoform ratios is meaningless.
The RefSeq isoform (exon 17) is not expressed, just alternate form (#2).
looks like a splicing prediction, but isn’t
Probes with high background and constant-intensity probes create false predictions.
The cDNA lacking exon 17 should be used for screening.
No alternative splicing
Exon 10 is a novel cassette exon. The use of multiple probes increases the confidence of this prediction.
The exon 10 is upregulated ~10-fold.
However, the low intensities on probes monitoring exon 10 suggest the +exon 10 isoform is a minor component.
A junction probe spanning exon 9-to-11 would enable a determination of the relative abundance.
Example of a potential biomarkerExample of a potential biomarker
Isoform composition change
-75
-50
-25
0
25
50
75
-75 -50 -25 0 25 50 75
RT-PCR semi-quantative (+/- 15)
Mic
roar
ray
R = 0.90R2 = 0.82
Validation of alternative splicing Validation of alternative splicing profiling of sample pairsprofiling of sample pairs
Eliminate suspect measurements, such as high background, from non-expressing genes, or that are not consistent with alternative splicing.
Validate by RT-PCR only changes in size greater than 3-nt.
Our Version 2 algorithm achieves >80% validation for call rate and has a good correlation (left). The algorithm is a reciprocal measurement, similar to Ule & Darnells and ExonHit’s, with basic additional filtering.
Version 3 (being tested) includes a p-value and additional filters.
Interpretation of splicing changesInterpretation of splicing changesCurrently in pathways:
- transcription factor to targetsNeed to add systems biology at a splicing level, such as:
- splicing factor to target exons- isoform specific nodes in pathways- isoform specific phenotypes & functions
Disease markers? Casual? Necessary?
KEGG MAPK Pathway
Need: controlled Need: controlled vocabulary for vocabulary for alternative splicing!alternative splicing!
miR-124
PTB nPTB
MEF2 genes
KTN1Additional
genes
Splicing changes, mediated through CU-rich elements
3’ UTR hexamer
Identified through shRNA experiments targeting PTB and nPTB, monitored on microarrays.
PTB and nPTB pathwayPTB and nPTB pathway
Spellman and Smith, 2006Boutz et al. Genes Dev. 2007
NeuronsGlial cells
Boutz et al. Genes Dev. 2007 (Ares and Black labs)
Expression of PTB and nPTB is anti-Expression of PTB and nPTB is anti-correlated across human tissuescorrelated across human tissues
Ratio to reference
pool
nPTB is enriched in CNS tissues; PTB is depleted.
nPTB
PTB
Brain tissues
PTB and nPTB gene expressionPTB and nPTB gene expression
miR-124
PTB nPTB
MEF2 genes
KTN1Additional
genes
Splicing changes, mediated through CU-rich elements
3’ UTR hexamer
Identified through shRNA experiments targeting PTB and nPTB, monitored on microarrays.
PTB associated alternative splicingPTB associated alternative splicing
NeuronsGlial cells
Boutz et al. Genes Dev. 2007 (Ares and Black labs)
Zhu, B. et al. J. Biol. Chem. 2005
MEF2A, MEF2C & MEF2D have a 21 or 24-nt cassette exon, labeled exon β
MEF2 transcripts are alternatively spliced; changes MEF2 transcripts are alternatively spliced; changes lead to increased transcriptional activitylead to increased transcriptional activity
Exon β inclusion leads to increased activity in all MEF2 genes.
Zhu, B. et al. J. Biol. Chem. 2005
Brain
Adipose
Heart
MEF2A is alternatively spliced in normal tissuesMEF2A is alternatively spliced in normal tissues
Pool
The exon is 10-fold upregulated in heart. A junction probe would enable a determination as to whether it is a major form.
Brain
Adipose
Heart
Zhu, B. et al. J. Biol. Chem. 2005
MEF2D is alternatively spliced in normal tissuesMEF2D is alternatively spliced in normal tissues
The exon 21-nt long and the associated exon-probe (middle probe) performs poorly. The presence of junction probes permits the measurement.
Brain
Adipose
Heart
Zhu, B. et al. J. Biol. Chem. 2005
The mutually exclusive junction probe allows an estimate of the isoform change for exon 8. Change in isoform composition, versus brain:
Heart -80
Adipose -82
MEF2C is alternatively spliced in normal tissuesMEF2C is alternatively spliced in normal tissues
Paul L. Boutz et al. Genes Dev. 2007;
PTB nPTBMEF2Cexon β
KTN1exon 41
40 42 40 41 42
PTB knock-down leads to exon drop
PTB and nPTB gene expression correlate with target PTB and nPTB gene expression correlate with target alternative splice changes across normal tissuesalternative splice changes across normal tissues
PTB levels correlate with KTN1 exon 41 PTB levels correlate with KTN1 exon 41 expressionexpression
PTB associated alternative splicing,PTB associated alternative splicing, motif identification motif identification
miR-124 transcript targetsmiR-124 transcript targets
Transfect miR-124 and microarray profile
Lim et al, Nature, 2005
Hexamer p-valuegtgcct 1.60E-80tgcctt 3.30E-48agtgcc 8.50E-20gcctta 2.90E-17tgtgcc 4.90E-13gccttt 1.30E-10aagtgc 2.80E-10atgtgc 0.0001ccttac 0.003
… gene regulation in our body atlas
… 3’ UTR hexamers, using hypergeometric
overlap statistics
Compare genes downregulated to …
miR-124 targets include PTB; which was found down-regulated upon miR-124 transfection and whose 3’ UTR contains 5 hexamers; but not nPTB.
Pentamer enrichment adjacent regulated Pentamer enrichment adjacent regulated cassette exonscassette exons
cassette exon3’
pre-mRNA
5’
upstream (udif)
upstream intronic fraction (uif)
exon downstream intronic fraction (dif)
duif
dexon5exon3exon5
Motif enrichment in the 200-nt intronic region immediately upstream of cassette exons upregulated in frontal lobe identifies TCTCT.
TCTCT
CTCTC
CTTTC
TTTCT
CTGCT
TGCTT
TCTCT enrichment in intronic regions TCTCT enrichment in intronic regions upstream of upregulated cassette upstream of upregulated cassette
exons in human body tissuesexons in human body tissuesPTB TCTCT enrichment
Upregulation Downregulation-
Log10 ratio
The intronic region upstream of MEF2A The intronic region upstream of MEF2A exon β contains a conserved TCTCTexon β contains a conserved TCTCT
TCTCT-associated exons are upregulated with TCTCT-associated exons are upregulated with decreasing PTB but not nPTB expressiondecreasing PTB but not nPTB expression
Interpretation of splicing changesInterpretation of splicing changesCurrently in pathways:
- transcription factor to targetsAdd: systems biology at a splicing level
- splicing factor to target exons associations- isoform specific nodes in pathways- isoform specific phenotypes & functions
PTB MEFC2 w/ exon β
KEGG MAPK Pathway
Need: controlled vocabulary Need: controlled vocabulary for alternative splicingfor alternative splicing
Conclusions from PTB storyConclusions from PTB storyPTB and nPTB expression is anti-correlated across 50
human tissues.PTB expression is anti-correlated with the expression of a
set of known cassette exons.The TCTCT pentamer is enriched in the intronic region
upstream of the cassette exons.Low PTB expression results in TCTCT enrichment; there is
less correlation with nPTB.
Pathway interpretation suggested: PTB inhibits expression of cassette exons with TCTCT-rich upstream introns; nPTB is unable to inhibit at least a subset of them
Needs for alternative splicing at a Needs for alternative splicing at a pharmaceutical companypharmaceutical company
Biomarker identificationIdentify transcript structures expressed in a tissue,
including novel isoformsReport regulation of known isoforms for use in pathway
analysis or as drug targets
RequirementsRequirementsRobust, high-confidence, high-throughput profiling
– Array patterns, amplification, sequencing
Visualization and analysis tools for profiling dataInterpretation, including isoform function and pathways
Back-ups
cassette exon3’
pre-mRNA
5’
upstream (udif)
upstream intronic fraction (uif)
exon downstream intronic fraction (dif)
duif
TCTCT is enriched in the ~50-nt upstream of up-TCTCT is enriched in the ~50-nt upstream of up-regulated cassette exons regulated cassette exons
DownUp
Conclusions from PTB storyConclusions from PTB storyPTB and nPTB show anti-correlated expression across 50
human tissues.PTB and a set of known cassette exons show anti-
correlated expression.The TCTCT hexamer is enriched in the intronic region -70
nt to -30 nt upstream of the cassette exons.Low PTB expression results in TCTCT enrichment; there is
less correlation with nPTB.
Pathway interpretation suggested: PTB inhibits expression of cassette exons with TCTCT-rich upstream introns; nPTB is unable to inhibit them
Novel exon 11 is predicted by a cross-species sequence analysis.
These microarray data suggest that the exon is expressed and regulated.
Validation of bioinformatics-predicted Validation of bioinformatics-predicted transcript structurestranscript structures
Junction probes enable monitoring of Junction probes enable monitoring of alternative 3’ and 5’ splice sitesalternative 3’ and 5’ splice sites
Sample B
Sample A
Sample A
Sample B
1 19 *20atg
1 10 *12NM_123456
Isoform 2
Cassette Exon
Junction probeExon probe
*
atg
atg
1 10 *12atg
1 10 *12atg
1 10 *12atg
1 10 *12atg
1 10 *12atg
Some probe strategies for microarray patternsSome probe strategies for microarray patterns
1 10 *12atg
All junctions
Use of all junction probes reveals Use of all junction probes reveals potential novel transcript structurespotential novel transcript structures
Exons 14 and 17 are not previously known to be cassette exons.
Exon 3 is a known cassette exon
Exon 8 is a known mutually exclusive exon
Examine probe intensity (brain) for probes spanning all possible junctions (e.g. exons 12, exons 13, exons 1 4, …)
1 atg
All junctions
Bottom line – no predicted changes in alternative splicing, rather a single isoform – of two known - is significantly upregulated.
probes w/high predicted
background
mutually exclusive exon
Exon predicted to cross-hyb
Sample B
Sample A
Sample B
Sample A
110A
*atg10B
Mutually exclusive exon pair
Real data showing several probes with high Real data showing several probes with high backgroundbackground