32
Visual Programming for Metabolomics KNIME Stephan Beisken

Visual Programming for Metabolomics KNIME Stephan Beisken

Embed Size (px)

Citation preview

Page 1: Visual Programming for Metabolomics KNIME Stephan Beisken

Visual Programming for Metabolomics

KNIME

Stephan Beisken

Page 2: Visual Programming for Metabolomics KNIME Stephan Beisken

Visual Programming

• “Visual programming languages enable physicians

and other computer users with little knowledge of programming to develop computer software. The physician uses a visual paradigm to "draw" the computer interface and then attaches short segments of computer code to buttons, menus, and list boxes.”

Ebell, M. H. (1993). Visual programming languages. M.D. Computing : Computers in Medical Practice, 10(5), 305–11.

Page 3: Visual Programming for Metabolomics KNIME Stephan Beisken

Motivation

• Simplify your (working) life

• Data processing and analysis requires various different tools to work together in sequence

• Data input and output

• Spreadsheets

• Data transformation

• Transposition, aggregation, string manipulation

• IsaCreator

• Formatting of tables

Page 4: Visual Programming for Metabolomics KNIME Stephan Beisken

Agenda

• Introduction

• Tutorial

• Installation and Extensions

• Overview of the Workbench

• Nodes and Table Models

• Exercises

• Introductory Examples

• MassCascade

• OpenMS

• XCMS

• Slides, software, workflows, and data for takeaway

Page 5: Visual Programming for Metabolomics KNIME Stephan Beisken

Disclaimer

• Workflows are great

• It does not have to be KNIME, there are many other solutions

• Every method that captures information in a consistent manner and enables reproducibility is great• Transparency

• Ability to share data and ‘everything’ that was done to the data

Page 6: Visual Programming for Metabolomics KNIME Stephan Beisken

Who is already a KNIME user?

Page 7: Visual Programming for Metabolomics KNIME Stephan Beisken

Introduction

• KNIME: Konstanz Information Miner

• http://www.knime.org/

• Developed at University of Konstanz in Germany

• Desktop version available free of charge (open source)

• Modular platform for building and executing workflows using predefined components: nodes

• Core functionality available for tasks such as data mining, analysis, and manipulation

• Extra features and functionality available in KNIME through extensions from various groups (community) and vendors

• Written in Java based on the Eclipse SDK platform

Page 8: Visual Programming for Metabolomics KNIME Stephan Beisken

Workflow Concepts

• Workflow execution

• Can execute complex, multi-step operations on input data

• Can be run be “non-experts” using predefined parameter templates ensuring optimal results

• Can be set up for specific measurement systems

• Can be shared across researchers

Page 9: Visual Programming for Metabolomics KNIME Stephan Beisken

Functionality

• Data manipulation and analysis

• File & database I/O, sorting, filtering, grouping, joining, pivoting

• Data mining and machine learning

• R, WEKA, KNIME, interactive plotting

• Cheminformatics

• Conversions, similarity, clustering, (Q)SAR analysis, etc.

• Scripting integration

• R, Perl, Python, Matlab, Octave, Groovy

• Reporting and much more

• Bioinformatics, HTS & image analysis, network & text mining

• Marketing, big data and business analytics

Page 10: Visual Programming for Metabolomics KNIME Stephan Beisken

Modules (Community Extensions)

• http://tech.knime.org/community

• Chemoinformatics• CDK (EMBL-EBI), RDKit (Novartis), Indigo (GGA),

• ErlWood (Eli Lilly), Enalos (NovaMechanics)

• ChEMBL and ChEBI (EMBL-EBI)

• Bioinformatics

• OpenMS (Tübingen, ETH Zurich)

• MassCascade (EMBL-EBI)

• HCS (MPI), NGS (Konstanz), Image analysis

• Integration• Python, Perl, R, Groovy, Matlab (MPI), PDB web services

client (Vernalis), REST and SOAP web service support

Page 11: Visual Programming for Metabolomics KNIME Stephan Beisken

Workflow Platforms

Page 12: Visual Programming for Metabolomics KNIME Stephan Beisken

Applications

Page 13: Visual Programming for Metabolomics KNIME Stephan Beisken

Applications cont.

Page 14: Visual Programming for Metabolomics KNIME Stephan Beisken

Applications cont.

Page 15: Visual Programming for Metabolomics KNIME Stephan Beisken

Applications cont.

Page 16: Visual Programming for Metabolomics KNIME Stephan Beisken

Applications cont.C

alib

ratio

nR

egre

ssio

n

Page 17: Visual Programming for Metabolomics KNIME Stephan Beisken
Page 18: Visual Programming for Metabolomics KNIME Stephan Beisken

Advantages Disadvantages

• Intuitive to use

• No or little programming experience required

• Good for prototyping

• Lots of functionality

• Very modular and flexible

• Active community

• Extensible

• Visual Feedback

• Steep learning cure

• Resource greedy

• No (free) server edition

• Slower execution than standalone scripts

Page 19: Visual Programming for Metabolomics KNIME Stephan Beisken

Installation

• Download and unzip KNIME• No further setup required

• ./knime.ini contains arguments for launch

• Install new modules (nodes) from update sites

• Explorer and installation wizard provided

• Workflows and data are stored in a workspace• ~/<user>/knime/workspace

• C:\Users\<user>\knime\workspace

• Preferences in: File Preferences KNIME

Page 20: Visual Programming for Metabolomics KNIME Stephan Beisken

Workbench

workflow editor

consoleoutline

tabs

Node description

node repository

workflow projects

favorite nodes

public server

Auto-layout Execute Execute all nodes

Page 21: Visual Programming for Metabolomics KNIME Stephan Beisken

Nodes

Title

Icon

Input port(s) – on the left of icon

Output port(s) – on the right of icon

Status display (‘traffic lights’)

• Red (not ready)• Amber (ready)• Green (executed)

• Blue bar during execution (with percentage or flashing)

Sequence numberRight-click menu

To configure and execute the node, display the output views, edit the node, and display data for the ports

• Node: Basic processing unit of a workflow

• performs a particular task

Page 22: Visual Programming for Metabolomics KNIME Stephan Beisken

Dialogs• Double-click opens configuration dialogs

• Explicit column types

Page 23: Visual Programming for Metabolomics KNIME Stephan Beisken

TablesColumn specificationsTable rows Various renderers Column types

Page 24: Visual Programming for Metabolomics KNIME Stephan Beisken

Exercises: Preliminaries

• Pre-installed KNIME Desktop 2.9.1

• Workflows

• starters, xcms, openms, masscascade

• Data

• FAAH knockout LC/MS data

• ESB tomato LC/MS QC data

• ChEBI SDFile, KEGG SDFile

• Plug-Ins (more in About KNIME Installation Details)

• R (interactive)

• Erl Wood, CDK

• OpenMS, MassCascade

Page 25: Visual Programming for Metabolomics KNIME Stephan Beisken

Exercises: Installation

• Open your KNIME directory

• ~/Desktop/knime_2.9.1

• ./knime.exe

• Memory allocation

• ./knime.ini

Page 26: Visual Programming for Metabolomics KNIME Stephan Beisken

Exercises: Starters

• More examples available from the Examples repository

Page 27: Visual Programming for Metabolomics KNIME Stephan Beisken

Exercises: MassCascade

https://bitbucket.org/sbeisken/masscascadeknime/wiki/ExampleWorkflows

Page 29: Visual Programming for Metabolomics KNIME Stephan Beisken

Exercises: OpenMS

http://ftp.mi.fu-berlin.de/OpenMS/release-documentation/OpenMS_tutorial.pdf

Page 30: Visual Programming for Metabolomics KNIME Stephan Beisken

Final Remarks

• Workflows can make exploratory or repetitive data tasks easier and save time

• Extensive data pre-processing functionality

• Extensions for statistics, machine learning, bio-, and cheminformatics

• Integration of R (XCMS) and spectrometry extensions can help you to build elaborate pipelines and share work

• Can help to organize one’s thoughts.

• It’s actually quite a bit of fun.

Page 31: Visual Programming for Metabolomics KNIME Stephan Beisken

Resources

• KNIME Forum

• http://www.knime.org/

• KNIME Learning Hub

• http://www.knime.org/learning-hub

• Quickstart Guide

• http://tech.knime.org/files/KNIME_quickstart.pdf

• Happy to Help

[email protected]

Page 32: Visual Programming for Metabolomics KNIME Stephan Beisken

Q&A