Quantification of Splice-Forms and Variants
- Quantified 615 datasets based on the Gencode v7 annotation
- For every transcript, normalized RPKM values and number of deconvoluted reads
- Sensitivity is a function of sequencing depth
Correlation coeff.0.87 (Pearson andSpearman)
- Discussion at the end if/what to do before uploading
LoF DefinitionsLOF = loss of function of a complete transcript
X
SNP that introduces (directly) stop codon
Indels that disrupt/shift reading frame
SNP that disrupts splice site
Larger deletions that remove 1st exon or >50% of transcript
X
X
“partial” LoF affects just some protein-coding transcripts ina locus
“full” LoF affects all protein-coding transcripts annotated
LoF scope
LoF types
[MacArthur et al. 2012]
267
565
337
116X
Splice
Large deletion
Stop
Frameshift indel
LoF Estimates[MacArthur et al. 2012]
12
23
38
24
X
SpliceLarge deletion
Stop
Frameshift indel
in a single individual
across populations
Compare RNA-Seq evidence to LoF predictions
X
XXX
predicted disruption of splice site
X
Large deletion
Frameshift indel
main difference Geuvadis <> 1000 Genomes: RNA-Seq vs. DNA-Seq
} directly from mappings / coverage by mappings
indirectly called from mappings
Confirmation LoF SNPs in Geuvadis
- Take phase1 samples where polymorphisms have been found by exome sequencing
- Additionally call SNPs by RNA-Seq (exzessive mappings)
Sufficient coverage in DNA
Sufficient coverage
in RNA
>2 milliongenotype calls
possible in bothExperiments
Example:(not Geuvadis)
~5000 differences, i.e. on average >2 out of 1000 calls differ
~1000 cases where RNA is homozygous and DNA not could be explainable by allele-specific expression~4000 cases where DNA is homozygous and RNA not (!!!) remove FPs from computational or experimental artifacts (PCR artifacts?)
Stop
HomozygoteCommon Allele
100%
0% or 100%0% or 50%
50%
rela
tive
abun
danc
e di
strib
utio
n 1s
t fo
rm
relative abundance distribution 2nd form
A/A A/G G/G A/A A/G G/G1st 2nd
Allele-specific RNA Processing
[Montgomery2010 dataset]
LoF and Alternative Splicing (AS)
“28.7% LoF events in a single individual affect only a subset of the known transcripts from the affected gene,
Emphasizing the need to consider alternative splicing”
[MacArthur et al. 2012]
X
X
(1) classification of AS influencesin LoF based on a certain annotation
(2) extension of an annotation byRNA-Seq evidence
5’ frame2
2
2
0
1
0
3’ frame
?
activation of latent splice sites
--
^
^
1234567
1,2,3,4,5,6,7
1,2,3,4,5,6,7
1,2,3,6
[ ^1,2,3,4,5,6,7
-
- ^ - ]1,2,3,4,5,7
1,2,3,4,5,7
1,4-
6
3,5,73,5,6
7
25
14
1,3,5,6
^1,2,3,6
1,2,3,6
bubble
(1) classification of AS: AStalavista
(2) AS discovery by RNA-Seq
--
^
^1,2,3,4,5,6,7
1,2,3,4,5,6,7
1,2,3,6
[ ^1,2,3,4,5,6,7
-
- ^ - ]1,2,3,4,5,7
1,2,3,4,5,7
1,4-
6
3,5,73,5,6
7
25
14
1,3,5,6
^1,2,3,6
1,2,3,6
Novel exon junctionssupported by RNA-Seq
add to graph, novel events
extend annotated CDSs
My Points
• Quantifications: do you want a normalization before uploading or is this in the responsibility of the analyzing group?
• Quantifications:
• Timeline for studies—main paper Oct-end of the year.
• Separate publications possible if there is sufficient material for a separate story?
• What would be the constraints for a separate publication on Geuvadis data?