40
grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen [email protected] 28.04.2010

Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

Embed Size (px)

Citation preview

Page 1: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

grant: 01IG09006

The MoSGrid Portal – A workflow-enabled Grid Portal for

Molecular Simulations

Sandra GesingCenter for Bioinformatics, University of Tübingen

[email protected]

Page 2: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 2

Outline

• Motivation• MoSGrid (Molecular Simulation Grid)• The MoSGrid portal• Domain specific workflows• MSML (Molecular Simulation Markup Language)• Future work

MoSGrid Portal

Page 3: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 3

Motivation• Numerous applications for molecular simulations and

docking, e.g. • Materials science• Structural biology • Drug design• Sophisticated tools and algorithms support scientists• High-performance computing facilities are available

MoSGrid Portal

Page 4: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 4

Motivation

Drawbacks of using molecular simulations and docking• Usability of tools is limited• Complexity of methods• Lack of graphical user interfaces• Complexity of infrastructures• Many end users lack computer science background

⇒ Need for self-explanatory and intuitive user interfaces ⇒ A portal for molecular simulations and docking

MoSGrid Portal

Page 5: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 5

Portals

• Single point of entry• Possibility to customize views and tools• Store user preferences• No installation of software on the user’s side • No firewall issues

MoSGrid Portal

Page 6: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 6

Unifying Diversity

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781

taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Slide copied from: Stuart Owen „Workflows with Taverna“

MoSGrid Portal

Page 7: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 7

MoSGrid

Molecular Simulation Grid (D-Grid project)Goal

Providing users with Grid access to molecular simulation tools and docking tools via a workflow-enabled portal

• Implementation of high-performance computing• Workflows• Annotations of results• Data mining• Use of the D-Grid-infrastructure

MoSGrid Portal

Page 8: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 8

MoSGrid Partners

• Universität zu Köln• Eberhard-Karls-Universität Tübingen• Universität Paderborn• Konrad-Zuse-Zentrum für Informationstechnik

Berlin• Technische Universität Dresden• Technische Universität Dortmund• Bayer Technology Services GmbH, Leverkusen• Origines GmbH, Martinsried• GETLIG&TAR, Falkensee• BioSolveIT, Sankt Augustin• COSMOlogic GmbH&Co. KG, Leverkusen

MoSGrid Portal

Page 9: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 9

MoSGrid in a Nutshell

XtreemFS

CloudFile

System

PortalWS-PGRADE

Grid resourcesUNICORE 6

Result

RecipeStructure Result

High-level middleware service level

gUSE

Workflow

MoSGrid Portal

Page 10: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 10

Credential Management

• User management based on Liferay features- Community management- Organization management• X.509 user certificates • SAML (Security Assertion Markup Language)- Minimize credential data transfers - Set of maximum hops for trust delegation- Usable for single sign-on infrastructures (e.g.,

Shibboleth)

MoSGrid Portal

Page 11: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 11

Credential Management

MoSGrid Portal

Page 12: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 12

WS-PGRADE

MoSGrid Portal

Page 13: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 13

WS-PGRADE

MoSGrid Portal

Page 14: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 14

WS-PGRADE

MoSGrid Portal

Page 15: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 15

gUSE Architecture

User interfaceWS-PGRADE

Grid resources middleware layer

UNICORE 6

Applicationrepository

Information system

LoggingSubmitters

Workflow storage

Workflowengine

High-level middleware service layer

gUSE

grid User Support Environment

MoSGrid Portal

Page 16: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 16

gUSE Submitter

Interface GridService• actionJobSubmit• actionJobAbort• actionJobOutput• actionJobStatus• actionJobResource

JOBn

Workflowengine

JOB1

JOB2JOB3 JOB4

SubmitterGridService

MoSGrid Portal

Page 17: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 17

gUSE Submitter for UNICORE

JOBn

Workflowengine

JOB1

JOB2JOB3 JOB4

UNICORE submitter(UCC lib)

UNICORE Atomic

Services

Uspace

gUSE UNICORE 6 Resources

4 - Upload data

1 - Security2 - Registry3 - Submit job5 - Start job

actionJobSubmit

MoSGrid Portal

Page 18: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 18

ASM (Application Specific Module)• Library for managing WS-PGRADE workflows • Listing of users and workflows in the local repository• Import of Workflows in the user space• Upload/download of input and output files• Setting the parameters of a job in a workflow• Submission of workflows• Monitoring of workflows• Deletion of workflows

• Usable in portlets und Java tools ⇒ Implicit use of gUSE submitter

MoSGrid Portal

Page 19: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 19

• XtreemFS is an object-based grid and cloud filesystem• Ability to minimize data transfer• Low latency,

local availability through replication

• Grid Security Infrastructure (GSI) support

Distributed Data Management

MoSGrid Portal

Page 20: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 20

• XtreemFS integration • Portlet• UNICORE• GSI support

• Data flow • WS-PGRADE• XtreemFS • Frontend nodes • Compute nodes

• UNICORE mediates data transfers

XtreemFS

UNICORETSI

Distributed Data Management

MoSGrid Portal

Page 21: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 21

Domain Molecular Dynamics

• Study and simulation of molecular motion• Provide a molecular dynamics service on multiple

levels• Direct upload of job descriptions• Workflows and standard recipes for repeating

tasks• Analysis of relevant properties

MoSGrid Portal

Page 22: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 22

Equilibration of Proteins

• Proteins from databases (e.g., the Protein Data Bank, PDB) do not necessarily represent a near-native conformation/configuration

• For all kind of production runs a minimization and an equilibration is an indispensable prerequisite

• Eases the work of experienced users• Lowers the hurdle for novice users

MoSGrid Portal

Page 23: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 23MoSGrid Portal

UseCase: Gromacs_EQ

structure(pdb/gro)

topology(top/itp)

EM.mdp(mdp)

pdb2gmxstructure

(pdb)

editconf

box(pdb)

genbox

Solvated(pdb)

grompp

adj. Top.(top/itp)

topol.tprmdout.m

dp

Page 24: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 24MoSGrid Portal

mdrun

ener.edr traj.trr traj.xtc

md.logstate.cp

tSYSTEM_EM.pdb

grompp

mdrun

topol.tprmdout.m

dp

ener.edr traj.trr traj.xtc

md.logstate.cptSYSTEM_E

Q.pdb

FULL.mdp

(mdp)

g_energyxmgrace

Analysis.jpg

g_energyxmgrace

Analysis.jpg

Page 25: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 25

MD Portlet

MoSGrid Portal

Page 26: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 26

Domain Quantum Chemistry

• Study and simulation of molecular electronic behavior relative to their chemical reactivity

• Survey - MoSGrid Community• First implementation for Gaussian• Then support for• Turbomole• GAMESS-US• Further relevant QC applications

MoSGrid Portal

Page 27: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 27

Domain Quantum Chemistry

Gaussian Jobs• Single input file• Defines molecular geometry and task

• Result• Not structured output• Platform dependent checkpoint file

• Integrated multi-step job option• Not usable for generalized workflows

MoSGrid Portal

Page 28: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 28

Domain Quantum Chemistry

First prototype• Workflow controlled by portlet• Three phases • Pre-processing• Job execution• Post-processing

MoSGrid Portal

Page 29: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 29

Domain Quantum Chemistry

Assisted job creation• Guiding GUI• Most common options

available

Pre-created job description• Upload of Gaussian job

description file

Monitoring of jobsPost-processing and presentation of results

Workflows

MoSGrid Portal

Page 30: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 30

Domain Quantum Chemistry

Preprocessing• Portlet (GUI) supports common options• Automatic generation of job description• Submission

of job

MoSGrid Portal

Page 31: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 31

Domain Quantum Chemistry

Post-processing• Parsing of result file• Python scripts executed by portlet• Relevant information about molecular properties

• Data in CSV-Format saved and accessible

MoSGrid Portal

Page 32: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 32

Domain Docking

• CADDSuite (Computer-aided Drug Design)

MoSGrid Portal

Page 33: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 33

• Galaxy available for local ressources in Tübingen

Domain Docking

MoSGrid Portal

Page 34: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 34

MolDB

• Stores molecules in binary format, which allows for fast export• Automatically creates and stores can. smiles, fingerprints, and functional groups counts for imported molecules• Automatically saves and restores docking-/rescoring-results• DB can be filtered to all stored molecule properties before exporting molecules• Current speed for import/export: ~100 compounds/sec.

MoSGrid Portal

Page 35: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 35

MSML

Molecular Simulation Markup Language• Based on CML (Chemical Markup Language)• Common interpretation by humans and

computers• Follows the minimum information principle• Description:

http://xml-cml.org/convention/dictionary• XSL transformation• Used for validation purposes

validator.xml-cml.orgMoSGrid Portal

Page 36: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 36

Future Work

• WS-PGRADE• Integration of the UNICORE IDB to offer drop-down

boxes of available tools• MD- and QC-Portlet• Adoption to gUSE workflow engine via the ASM

libraries• CADDSuite• Export of workflows from Galaxy to WS-PGRADE

• MSML• Further development

MoSGrid Portal

Page 37: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 37

Involved Projects

SHIWA (SHaring Interoperable Workflows for Large Scale Scientific Simulations on Available DCIs)

• EU project• Duration: 01.07.2010 – 30.06.2012• Tübingen participates via Galaxy workflow export

CompChem Virtual Organization• EGEE project• Available ressources

MoSGrid Portal

Page 38: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 38

Future Projects

SCI-BUS (SCIentific gateway Based User Support)• EU project• Duration: 01.10.2011 – 30.09.2014• Pan-European ressources• Tübingen participates with the extension of the

MoSGrid portal with an interactive molecule editor and a semantic search

MoSGrid Portal

Page 39: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 39

Acknowledgements• Oliver Kohlbacher• Ákos Balaskó• Georg Birkenheuer • Sebastian Breuers• Richard Grunzke• Sonja Herres-Pawlis • Valentina Huber• Miklos Kozlovszky • Jens Krüger• István Márton • Patrick Schäfer• Bernd Schuller• Johannes Schuster• Anna Szikszay Fabri• Klaus-Dieter Warzecha• Martin Wewior

MoSGrid Portal

Page 40: Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen

www.mosgrid.de 40MoSGrid Portal