24
Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration Workshop

SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn

The Center for Integrative

Bioinformatics (CIBI)

SeqAn and OpenMSIntegration Workshop

Page 2: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Julianus Pfeuffer, Alexander Fillbrunn

Mass-spectrometry data analysis in KNIME

Page 3: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

OpenMS• OpenMS – an open-source C++ framework for computational mass

spectrometry

• Jointly developed at ETH Zürich, FU Berlin, University of Tübingen

• Open source: BSD 3-clause license

• Portable: available on Windows, OSX, Linux

• Vendor-independent: supports all standard formats and vendor-formats through proteowizard

• OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools

– Building blocks: One application for each analysis step

– All applications share identical user interfaces

– Uses PSI standard formats

• Can be integrated in various workflow systems

– Galaxy

– WS-PGRADE/gUSE

– KNIME

Kohlbacher et al., Bioinformatics (2007), 23:e191

Page 4: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

OpenMS Tools in KNIME

• Wrapping of OpenMS tools in KNIME via GenericKNIMENodes(GKN)

• Every tool writes its CommonToolDescription (CTD) via its command line parser

• GKN generates Java source code for nodes to show up in KNIME

• Wraps C++ executables and provides file handling nodes

Page 5: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Installation of the OpenMS plugin

• Community-contributions update site (stable & trunk)– Bioinformatics & NGS

• provides > 180 OpenMS TOPP tools as Community nodes – SILAC, iTRAQ, TMT, label-free, SWATH, SIP, …

– Search engines: OMSSA, MASCOT, X!TANDEM, MSGFplus, …

– Protein inference: FIDO

Page 6: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Peak

DataMaps

Annotated

Maps

Data Flow in Shotgun Proteomics

HPLC/MSSample

Sig.

Proc.

Data Reduction

Diff.

Quant.

Identification

Differentially

Expressed

Proteins

100 GB

1 GB50 MB

50 MB 50 kB

Raw

Data

Page 7: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Quantification StrategiesQuantitative Proteomics

Relative Quantification

Labeled

In vivo

14N/15N SILAC

In vitro

iTRAQ TMT 16O/18O

Label-Free

SpectralCounting MRM Feature-Based

Absolute Quantification

AQUA SISCAPA

After: Lau et al., Proteomics, 2007, 7, 2787

Page 8: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Quantitative Data – LC-MS Maps

• Spectra are acquired with rates up to dozens per second

• Stacking the spectra yields maps

• Resolution:

– Up to millions of points per spectrum

– Tens of thousands of spectra per LC run

• Huge 2D datasets of up to hundreds of GB per sample

• MS intensity follows the chromatographic concentration

Page 9: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

LC-MS Data (Map)

10

Quantification(15 nmol/µl, 3x over-expressed, …)

Page 10: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Label-Free Quantification (LFQ)

• Label-free quantification is probably the most natural way of quantifying – No labeling required, removing further sources of

error, no restriction on sample generation, cheap

– Data on different samples acquired in different measurements – higher reproducibility needed

– Manual analysis difficult

– Scales very well with the number of samples, basically no limit, no difference in the analysis between 2 or 100 samples

Page 11: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

LFQ – Analysis Strategy

1. Find features in all maps

Page 12: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

1. Find features in all maps

2. Align maps

LFQ – Analysis Strategy

Page 13: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

1. Find features in all maps

2. Align maps

3. Link corresponding features

LFQ – Analysis Strategy

Page 14: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

1. Find features in all maps

2. Align maps

3. Link corresponding features

4. Identify features

GDAFFGMSCK

LFQ – Analysis Strategy

Page 15: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

1. Find features in all maps

2. Align maps

3. Link corresponding features

4. Identify features

5. Quantify

GDAFFGMSCK

1.0 : 1.2 : 0.5

LFQ – Analysis Strategy

Page 16: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Feature-Based Alignment

• LC-MS maps can contain millions of peaks

• Retention time of peptides and metabolites can shift between

experiments

• In label-free quantification, maps thus need to be aligned in

order to identify corresponding features

• Alignment can be done on the raw maps (where it is usually

called ‘dewarping’) or on already identified features

• The latter is simpler, as it does not require the alignment of

millions of peaks, but just of tens of thousands of features

• Disadvantage: it replies on an accurate feature finding

Page 17: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Feature-Based Alignment

~350,000 peaks

~ 700 features

Page 18: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Feature Finding

• Identify all peaks belonging to one peptide

• Key idea:

– Identify suspicious regions (e.g. highest peaks)

– Fit a model to that region and identify peaks explained by it

Page 19: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Feature Finding

• Extension: collect all data points close to the seed

• Refinement: remove peaks that are not consistent with the model

• Fit an optimal model for the reduced set of peaks

• Iterate this until no further improvement can be achieved

Page 20: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Map 1

Map 2

Map k

rt

m/z

T1

T2

Tk

Consensus map

• Dewarp k maps onto a comparable coordinate system

• Choose one map (usually the one with the largest number of features) as reference map (here: map 2 -> T2 = 1)

Multiple Alignment

rt

Page 21: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

LFQ with OpenMS in KNIME

• Identification• Feature finding and mapping• Map alignment• Feature linking• Statistical analysis with R Snippets• Visualization with KNIME plotting nodes

Page 22: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Preprocessing of single maps

Page 23: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Combining information of maps

Page 24: SeqAn and OpenMS Integration Workshop · 2017. 5. 23. · Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) SeqAn and OpenMS Integration

Statistical post-processing and visualization