31

Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Proteomics

Laurent Gatto1

http://cpu.sysbiol.cam.ac.uk

CSAMA � 27 June 2014

[email protected]

Page 2: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Outline

Proteomics and MS data

Ranges infrastructure

Application: spatial proteomics

Page 3: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Mass-spectrometry (LC-MSMS)

Precursor iondissociation

Detector

Precursor ions MS1

Fragmented ions

MSMS - MS2

AnalyserSourceSeparation

Page 4: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

MS1 and MS2 spectra

400 500 600 700 800

0.0e

+00

5.0e

+07

1.0e

+08

1.5e

+08

2.0e

+08

MS1 scan @ 21:3 min

m/z

inte

nsity 200 400 600 800

0e+

002e

+06

4e+

066e

+06

8e+

061e

+07

MS2 scan, precursor m/z 460.79

200 400 600 800 1000

0e+

001e

+06

2e+

063e

+06

4e+

065e

+06

6e+

06

MS2 scan, precursor m/z 488.8

m/z

Page 5: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

MS1 and MS2 spectra

400

600

800

1000

1200

21.10

21.15

21.20

21.25

21.30

21.35

0.0e+00

5.0e+07

1.0e+08

1.5e+08

2.0e+08

m/zretention time200

400

600

800

1000

1200

21.06

21.07

21.08

21.09

21.10

21.11

0.0e+00

5.0e+07

1.0e+08

1.5e+08

2.0e+08

m/zretention time

Page 6: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Proteomics data

I raw dataI quantitationI identi�cation

I protein database

Status package

Raw (mz*ML) X mzRmzTab X MSnbase

mgf X MSnbasemzIdentML X mzID (mzR)mzQuantML (?mzR)

Page 7: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Proteomics data

I raw dataI quantitationI identi�cation

I protein database

Status package

Raw (mz*ML) X mzRmzTab X MSnbase

mgf X MSnbasemzIdentML X mzID (mzR)mzQuantML (?mzR)

Page 8: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Example

library("MSnbase")

rx <- readMSData(f, centroided = TRUE)

rx <- addIdentificationData(rx, g)

rx <- rx[!is.na(fData(rx)$pepseq)]

plot(rx[[10]], reporters = TMT6, full=TRUE)

Page 9: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Example

0e+00

2e+05

4e+05

6e+05

8e+05

300 600 900 1200M/Z

Inte

nsity

Precursor M/Z 600.36

0e+00

2e+05

4e+05

6e+05

8e+05

126.080 126.591 127.102 127.613 128.124 128.635 129.146 129.657 130.168 130.679 131.190

Page 10: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Example

library("MSnbase")

rx <- readMSData(f, centroided = TRUE)

rx <- addIdentificationData(rx, g)

rx <- rx[!is.na(fData(rx)$pepseq)]

plot(rx[[10]], reporters = TMT6, full=TRUE)

plot(rx[[4730]], rx[[4929]])

Page 11: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Example

0 500 1000 1500

−1.

0−

0.5

0.0

0.5

1.0

m/z

inte

nsity

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

prec scan: 6975, prec mass: 545.017, prec z: 3, # common: 34

●●●

●●●

●●

●●

●●●

●●

●●

●● ●●

● ● ●

●●●

prec scan: 7198, prec mass: 545.017, prec z: 3, # common: 35

Page 12: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Example

library("MSnbase")

rx <- readMSData(f, centroided = TRUE)

rx <- addIdentificationData(rx, g)

rx <- rx[!is.na(fData(rx)$pepseq)]

plot(rx[[10]], reporters = TMT6, full=TRUE)

plot(rx[[4730]], rx[[4929]])

qt <- quantify(rx, reporters = TMT6, method = "max")

## qt <- readMSnSet(f2)

nqt <- normalise(qt, method = "vsn")

boxplot(exprs(nqt))

Page 13: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Example

● ● ●

TMT6.126 TMT6.127 TMT6.128 TMT6.129 TMT6.130 TMT6.131

1015

2025

Page 14: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

More

library("BiocInstaller")

biocLite("RforProteomics")

Page 15: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

I raw dataI quantitationI identi�cation

I protein database

Page 16: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Ranges infrastructure

Page 17: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Pbase package

library("Pbase")

p <- Proteins(db)

p <- addIdentificationData(p, id)

aa(p) ## AAStringSet

pranges(p) ## IRangesList

i <- which(acols(p)[, "EntryName"] == "EF2_HUMAN")

plot(p[i])

plot(p[i], from = 155, to = 185)

Page 18: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

100

200

300

400

500

600

700

800

NH COOH

pept

ides

230

240

250NH COOH

P13

639

W A F T L K Q F A E M Y V A K F A A K G E G Q L G P A E R A K K V E

pept

ides

Page 19: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Spatial proteomics

I The cellular sub-divisionallows cells to establish arange of distinctmicroenvironments, eachfavouring di�erentbiochemical reactions andinteractions and, therefore,allowing each compartmentto ful�l a particularfunctional role.

I Localisation andsequestration of proteinswithin subcellular niches is afundamental mechanism forthe post-translationalregulation of proteinfunction. Spatial proteomics is the systematic

study of protein localisations.

Page 20: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Spatial proteomics

Disruption of thetargeting/tra�cking process altersproper sub-cellular localisation,which in turn perturb the cellularfunctions of the proteins.

I Abnormal proteinlocalisation leading to theloss of functional e�ects indiseases (Laurila et al.2009)

I Disruption of thenuclear/cytoplasmictransport (nuclear pores)have been detected in manytypes of carcinoma cells(Kau et al. 2004).

Page 21: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Figure: Immuno�uorescence:ZFPL1, Golgi (left) and FHL2,mainly localized to actin �lamentsand focal adhesion sites. Alsodetected in the nucleus (right).(from the Human Protein Atlas)

Cell lysatefraction

centrifugation

iTRAQMS/MS

LOPIT(PCA,

PLS-DA)

label-freeMS/MS

(χ )2PCP

Pure fraction

catalogue

Invariantrich

fraction(clustering)

Subtractiveproteomics

(enrichment)

Figure: Mass spectrometry-basedapproaches based on densitygradient subcellular fractionation.

Page 22: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Cell membrane lysis

Mechanical or bu�er-induced lysis of the plasma membrane with

minimal disruption to intracellular organelles followed by subcellular

fractionation.

Page 23: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Density gradient separation

Page 24: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Quantitation by LC-MSMS

Page 25: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Data

Fraction1 Fraction2 . . . Fractionm markers

p1 q1,1 q1,2 . . . q1, m unknown

p2 q2,1 q2,2 . . . q2, m loc1

p3 q3,1 q3,2 . . . q3, m unknown

p4 q4,1 q4,2 . . . q4, m lock...

......

......

...

pn qn,1 qn,2 . . . qn, m unknown

Page 26: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

0.2

0.3

0.4

0.5

Correlation profile − ER

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

Correlation profile − Golgi

Fractions

1 2 4 5 7 81112

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − mit/plastid

Fractions

1 2 4 5 7 81112

0.15

0.20

0.25

0.30

0.35

Correlation profile − PM

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − Vacuole

Fractions

1 2 4 5 7 81112

●●

●●

●●

●●

●●●● ●

●●

●●

●●

−10 −5 0 5

−5

05

Principal component analysis

PC1

PC

2

ERGolgimit/plastidPM

vacuolemarkerPLS−DAunknown

Figure: From Gatto et al. (2010), data from Dunkley et al. (2006).

Page 27: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

2009 vs 2013

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●

● ●

●●

● ●●

●●

●●

●● ●

●● ●

●●●

ER/GolgimitochondrionPMunknown

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60S

Figure: pRoloc package. Semi-supervised approach Breckels et al.(2013). Data from Tan et al (2009).

Page 28: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

−6 −4 −2 0 2 4

−4

−2

02

4

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●●●●●●

111111

●●●●●●222222 ●

●●●3333

●●●●●

●●●4 4

4444

44

●●

●●

● ●

55

5

55

5 5

●● ●●●

● ●

● 66 666

6 6

6

6

●●●●●●●●●

● ●● 777 7 77 7

777 7

7

●●●●● ●●●

88888 8

88

●●●● ●●●

9999 999

9

1 Dynein2 Vesicles − Clathrin 3 13S condensin4 T complex5 Nucleus lamina

6 Vesicles − COPI/II7 eIF3 complex8 ARP2/3 complex9 COP9 signalosome

Page 29: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Dynamic

Figure: pRolocGUI package.

Page 30: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Dual localisation

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Fraction

Inte

nsity

1 1 2 2 4 4 5 5 7 7 8 8 11 11

●●

●● ●

● ●

●●

● ● ●

● ●

● ●

●● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●● ●

●●

● ●

● ●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

AT5G61790 (ER membrane)AT4G32400 (Plastid)AT2G45470 (PM)

−6 −4 −2 0 2 4 6

−4

−2

02

4PC1 (64.36%)

PC

2 (2

2.34

%)

●●

● ●

●●

● ●

●●●

●●

●● ●

●● ●●●

●●

●●

●● ●

●●

●●

●●

●●

●●●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●●

●● ●

● ●

●●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●● ●

● ●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●

●●

●● ●●●●

●●

●●

● ●

●●

●●

●●●●●

●●

●●●●

● ●● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●●

● ●●●

● ●

●●

●●●

●●●

●●

●●● ●●●

●●

●●

●● ●

●●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

●●

● ●

●● ●●●

●●●

●●●●

●●●●

●●

●●

●●

●● ●●●

●●●

●●

●● ●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

● ●

●●

● ●

●●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNunknownvacuole

●●●● ●● ● ●● ● ●

●●

●●

● ● ●●●●

Figure: Proteins may be present simultaneously in several organelles(dual localisation, tra�cking) vs. no man's land. (Gatto et al. 2014)

Page 31: Proteomics - Bioconductor · 2.0e+08 MS1 scan @ 21:3 min m/z intensity 200 400 600 800 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 MS2 scan, precursor m/z 460.79 200 400 600 800 1000 0e+00

Acknowledgement

CPU

I Lisa Breckels

I Sebastien Gibb

I Thomas Naake

I Kathryn Lilley (CCP) Computational Proteomics UnitCambridge Centre for ProteomicsCambridge System Biology CentreDepartment of BiochemistryUniversity of Cambridgehttp://cpu.sysbiol.cam.ac.uk

@lgatt0

Software Sustainability Institutehttp://software.ac.uk