RCAC Research Computing Presents:DiaGird Overview
Tuesday, September 24, 2013
INFORMATION TECHNOLOGY AT PURDUE
Agenda• What is DiaGrid?• What can DiaGrid do for me today?• How can I make DiaGrid work for me?• User Experience (Prof. Wen Jiang, Biological
Sciences)• Q&A• Posters• Meet the team
2
What is DiaGrid?Carol SongScientific SolutionsRCAC
What is DiaGrid?
5
• Science-as-a-Service• Online applications and tools, no software download or
installation• Interactive, graphics user interface
• Access to large computing resources, workflow engine• 50,000 HT Condor pool for high throughput computing• Nodes in the community clusters for parallel computation
• Instant access: no forms, no wait• Supported infrastructure, 24x7 availability• Feedback for tools, datasets, and other publications• Collaboration and dissemination platform
• Group, Project, Forum, Wiki, Sharing, Publishing, etc.
What can DiaGrid do for me today?Brian RaubScientific SolutionsRCAC
Tools available todayWhat tools can I use today?
• BLASTer• SubmitR• CryoEM• GROMACSIMUM• CESM
7
BLASTer• BLAST - popular tool to
scan genomes for target sequencing
• Searches can contain thousands of sequences– Split input file because
all sequences are independent
• Greatly improves search speed with the help of HTCondor
8
BLASTer
9
SubmitR• Allows users to submit their
R scripts for execution on the community clusters
• Supports different execution modes– Single – one process– Parallel – multiple
processes communicating with each other
– Sweep – many isolated processes with different parameters, inputs or both
10
SubmitR• Users upload and run scripts
without having to know technical details of where and how R is executing
• Supports a wide range of R libraries:– snow/snowfall– Rmpi– rgdal– Biobase– RSQLite
• Request more libraries!– Submit a ticket on DiaGrid to
request libraries currently unavailable
11
CryoEM• Analyzes images taken at cryogenic temperatures within an electron
microscope to reveal much about microscopic structure of samples• First user-developed tool for DiaGrid• Image processing is a good candidate for parallelization
12
CryoEM• DiaGrid staff helped adapt CryoEM to the HUB environment• Split tasks for image breakdown analysis (HTCondor)• Reassembled the images for 3D visualization using MPI
13
GROMACSIMUM• GROMACS – designed to perform molecular dynamics• First DiaGrid tool to modify and improve an existing open source
project– Extends the features of GROMACS GUI and jSimMacs to include new features
for high performance computing
14
GROMACSIMUM• A unified interface for all GROMACS simulation tools. • Advanced project management system.• A powerful molecular design and 3D protein visualization tool.• Access your models and data from anywhere in the world.
15
CESM• Global climate model
coupling many aspects of Earth sciences research
• First DiaGrid tool to provide access to existing Purdue gateway– Purdue developed CESM web
gateway and designed it to support multiple interfaces
– Provides an alternate interface to the CESM gateway service from within the HUB environment
16
How can I make DiaGrid work for me?Rob CampbellScientific SolutionsRCAC
How can I make DiaGrid work for me?
18
What are my options?
• Run an existing tool• Use your existing code• Create a new tool• Let us create - or adapt - a tool for you
How can I make DiaGrid work for me?
19
How can I use code I already have?
Command line? Use Rappture… (“Rapid APPlication infrastrucTURE”)
Graphical User Interface (GUI)? Enable it to run on DiaGrid…
How can I make DiaGrid work for me?
20
What is Rappture?
• Toolkit - makes it easy to develop a GUI for scientific modeling code
• Describe your code’s input & output, Rappture automatically builds GUI
• Rappture API: get input values, save results –bindings for many programming languages
• Embed Rappture in your code - or create a wrapper script around your code
• Users see standard graphical controls plus line graphs, contour plots, 3D isosurfaces, …
How can I make DiaGrid work for me?
21
How will my code run on DiaGrid?
• Needs cluster resources or parallel execution? Tap into HPC resources via “submit”
• Inputs and outputs files? Upload to remote server via sftp,
WebDAV, “importfile”
• Pulls data from external sites or databases? We can open a connection
• Relies on a graphics card for acceleration? Will function with performance
difference
• Requires Windows or Mac? Tools run in a Linux/X11
environment GUI toolkits and Wine are
available
How can I make DiaGrid work for me?
22
What can “submit” do?
• Complex job scheduling made easy• Long runners, parallel processing, parameter sweeps• Splits out sweep runs (derives param. combos)
1. Gathers files2. Transports to HPC resource3. Schedules & watches job(s)4. Returns results
submit
How can I make DiaGrid work for me?
23
How can I create something new?
Use Rappture - or a familiar development environment…
C/C++ Perl Python Ruby TCL More…
C/C++ Fortran Java MATLAB Octave
How can I make DiaGrid work for me?
24
Can you make a tool for me?
Yes! We can…
Adapt your existing code
Or, start with your choice of open source packages
Or, build a tool from scratch based on your specifications
User ExperienceProfessor Wen JiangBiological Sciences