12
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

Embed Size (px)

Citation preview

Page 1: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

Geuvadis Analysis Meeting

16/02/2012Micha SammethCNAG – Barcelona

Page 2: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

Quantification of Splice-Forms and Variants

- Quantified 615 datasets based on the Gencode v7 annotation

- For every transcript, normalized RPKM values and number of deconvoluted reads

- Sensitivity is a function of sequencing depth

Correlation coeff.0.87 (Pearson andSpearman)

- Discussion at the end if/what to do before uploading

Page 3: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

LoF DefinitionsLOF = loss of function of a complete transcript

X

SNP that introduces (directly) stop codon

Indels that disrupt/shift reading frame

SNP that disrupts splice site

Larger deletions that remove 1st exon or >50% of transcript

X

X

“partial” LoF affects just some protein-coding transcripts ina locus

“full” LoF affects all protein-coding transcripts annotated

LoF scope

LoF types

[MacArthur et al. 2012]

Page 4: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

267

565

337

116X

Splice

Large deletion

Stop

Frameshift indel

LoF Estimates[MacArthur et al. 2012]

12

23

38

24

X

SpliceLarge deletion

Stop

Frameshift indel

in a single individual

across populations

Page 5: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

Compare RNA-Seq evidence to LoF predictions

X

XXX

predicted disruption of splice site

X

Large deletion

Frameshift indel

main difference Geuvadis <> 1000 Genomes: RNA-Seq vs. DNA-Seq

} directly from mappings / coverage by mappings

indirectly called from mappings

Page 6: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

Confirmation LoF SNPs in Geuvadis

- Take phase1 samples where polymorphisms have been found by exome sequencing

- Additionally call SNPs by RNA-Seq (exzessive mappings)

Sufficient coverage in DNA

Sufficient coverage

in RNA

>2 milliongenotype calls

possible in bothExperiments

Example:(not Geuvadis)

~5000 differences, i.e. on average >2 out of 1000 calls differ

~1000 cases where RNA is homozygous and DNA not could be explainable by allele-specific expression~4000 cases where DNA is homozygous and RNA not (!!!) remove FPs from computational or experimental artifacts (PCR artifacts?)

Stop

Page 7: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

HomozygoteCommon Allele

100%

0% or 100%0% or 50%

50%

rela

tive

abun

danc

e di

strib

utio

n 1s

t fo

rm

relative abundance distribution 2nd form

A/A A/G G/G A/A A/G G/G1st 2nd

Allele-specific RNA Processing

[Montgomery2010 dataset]

Page 8: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

LoF and Alternative Splicing (AS)

“28.7% LoF events in a single individual affect only a subset of the known transcripts from the affected gene,

Emphasizing the need to consider alternative splicing”

[MacArthur et al. 2012]

X

X

(1) classification of AS influencesin LoF based on a certain annotation

(2) extension of an annotation byRNA-Seq evidence

5’ frame2

2

2

0

1

0

3’ frame

?

activation of latent splice sites

Page 9: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

--

^

^

1234567

1,2,3,4,5,6,7

1,2,3,4,5,6,7

1,2,3,6

[ ^1,2,3,4,5,6,7

-

- ^ - ]1,2,3,4,5,7

1,2,3,4,5,7

1,4-

6

3,5,73,5,6

7

25

14

1,3,5,6

^1,2,3,6

1,2,3,6

bubble

(1) classification of AS: AStalavista

Page 10: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

(2) AS discovery by RNA-Seq

--

^

^1,2,3,4,5,6,7

1,2,3,4,5,6,7

1,2,3,6

[ ^1,2,3,4,5,6,7

-

- ^ - ]1,2,3,4,5,7

1,2,3,4,5,7

1,4-

6

3,5,73,5,6

7

25

14

1,3,5,6

^1,2,3,6

1,2,3,6

Novel exon junctionssupported by RNA-Seq

add to graph, novel events

extend annotated CDSs

Page 11: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

My Points

• Quantifications: do you want a normalization before uploading or is this in the responsibility of the analyzing group?

• Quantifications:

• Timeline for studies—main paper Oct-end of the year.

• Separate publications possible if there is sufficient material for a separate story?

• What would be the constraints for a separate publication on Geuvadis data?

Page 12: Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona

Acknowledgements

Thasso Griebel (PhD):Error Models, Pipelining

Paolo Ribeca(PhD),Santiago Marco:GEM mapper + conversion

Emanuele Raineri (PhD):SNP calling