26
SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam [email protected]

SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

  • Upload
    tyne

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam. Bioteam Inc. Independent Consulting Shop Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT Many years of industry & academic experience - PowerPoint PPT Presentation

Citation preview

Page 1: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

SGE TrainingNASA LaRC ASDC

Delivered May 5,6,7 2009Chris Dwan

Bioteam

[email protected]

Page 2: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Bioteam Inc. Independent Consulting Shop

Vendor/technology agnostic Staffed by:

Scientists forced to learn High Performance IT Many years of industry & academic experience

Our specialty: Bridging the gap between Science & IT

[email protected]

Page 3: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Session Goals

[email protected]

Page 4: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Interactive / Small Group Goals 1 - 2 hours 1 – 5 people Users log into systems. Users type examples, run jobs. If code is available, bring it. If specific use cases exist, bring them.

[email protected]

Page 5: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Selected ASDC Systems

[email protected]

Page 6: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Selected ASDC Systems Apple Cluster

Online and in use at SCF since 2007 ~40 dual processor OS X systems (80+ CPUs) Access through manila and corregidor

Magneto ~28 quad core linux servers (100+ CPUs) Online and in production use since 2006

New Magneto (ORR May 15) Large, mixed purpose Linux cluster / file store 176 CPUs dedicated to SCF 576 CPUs dedicated to production Disk based archive: 1.1PB

[email protected]

Page 7: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Apple Cluster Access:

LDAP account manila or corregidor

[email protected]

Page 8: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

NASA LaRC Science Directorate

[email protected]

Picture taken 9/2/08 1.2PB usable space Fibre connected (384+ fibre

ports) 2,560 individual disk drives

16 disks per chassis 10 chassis per rack 16 racks of disks

IBM Linux servers, mixed P6 and x86 CPUs to support legacy codes

Filesystem: IBM GPFS

Page 9: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Operational Readiness ReviewMid May 2009

Stay Tuned

[email protected]

Page 13: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Interactive hosts

[email protected]

Page 14: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Sun Grid Engine

Technical Introduction

[email protected]

Page 15: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Please do not copy, put online or redistribute [email protected]

Most “grids” look like this on paper…

Private Network

Local Area Network

Portal node(s)Dedicated File services

Compute Nodes

Page 16: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Please do not copy, put online or redistribute [email protected]

… and in reality:

Page 17: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Please do not copy, put online or redistribute [email protected]

… and in reality:

Page 18: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Please do not copy, put online or redistribute [email protected]

… and in reality:

Page 19: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Sun Grid Engine Historyhttp://blogs.sun.com/templedf/entry/a_little_history_lesson 1996:

Codine 4.02 Grid Resource Director (GRD) 1.0

2000: SGE 5.2. Sun acquires Gridware Inc.

2001: SGE 5.3. Sun releases source code Last version called GRD

2004: SGE(EE) vs. SGE N1GE vs. SGE

[email protected]

Page 20: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Sun Grid Engine References http://gridengine.sunsource.net/

Generally, the user manuals are awful

http://gridengine.info/ Very useful blog run by Chris Dagdigian

My slides / examples are going to be online in-house.

Deep, in house expertise.

[email protected]

Page 21: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Please do not copy, put online or redistribute [email protected]

Compute Farm Logical View

Cluster Network

User 1User 1 User NUser N

Distributed Resource Manager

Page 22: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Please do not copy, put online or redistribute [email protected]

Grid Engine does the following:

Accept work requests (jobs) from users Puts jobs in a pending area Sends jobs from the pending area to the

best available machine Manages the job while it runs Returns results, logs accounting data

when the job is finished

Page 23: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Please do not copy, put online or redistribute [email protected]

Huh? What you need to know:

Don’t worry about queues or specific machines. All you need to do when submitting a job is describe the resources your job will need to run successfully.

Grid Engine will take care of the rest The ‘default’ settings are good enough for

most cases

Page 24: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Please do not copy, put online or redistribute [email protected]

Most useful SGE commands qsub / qdel

Submit jobs & delete jobs qstat & qhost

Status info for queues, hosts and jobs qacct

Summary info and reports on completed job qrsh

Get an interactive shell on a cluster node Quickly run a command on a remote host

qmon Launch the X11 GUI interface

Page 25: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Examples

[email protected]

Page 26: SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

Live Examples Single job Single job with resource requirements Job dependency Task array job Demand a whole compute node Consumable resources

[email protected]