20
Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May 2 nd 2007

Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Embed Size (px)

Citation preview

Page 1: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Windows Condor Pool at Clemson University

Sebastien Goasguen

School of Computingand Clemson Computing Information

Technology (CCIT)

May 2nd 2007

Page 2: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Clemson basics

• Public land grant institution founded in 1889• ~13,000 undergrads and ~4,500 grads• ~1,300 faculty members

New commitment to computing as the backbone of research and teaching

Page 3: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Clemson Computing DNA• Clemson has put computing at the core of its mission

– New CIO: Jim Bottum– New CTO: Jim Pepin– New School of Computing, search in progress for a school director,

three division leaders and couple months later six assistant professors• Building traditional HPC from scratch

– No prior involvement in HPC support– No trained staff for either system administration or application

• Infrastructure and hardware are there or coming– 20,000 sqft of raised floor, new power coming straight from the nuclear

plant, $3c a kwatt– 10 Gbps connection being worked on, NLR 6 miles away from machine

room. ~$1.5M SCLR approved by board of trustees last week– Above 10 Tflops in the works through various sources. Automotive

research center (Michelin, BMW…), Faculty community cluster and Provost support

• All hands on deck to build the CI campus of tomorrow

Page 4: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Building CI at Clemson

• The Fabric layer– HPC resources ->Clusters– Campus Grid -> Condor– Sharing of resources ->OSG

• The middleware layer– Deploy services interface to our resources -> WS– Increase identity management capabilities for sharing ->

Gridshib• The application layer

– New environments for students– New environments for researchers

• ->”Portal”, “Gateways”, Desktop applications, other…

• A social layer– Raising awareness on campus– Fulfilling expectations of faculty

Page 5: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Teaching CI

Page 6: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Where to start ?

• No HPC resources • No expertise in HPC or grids

• “what works, is reliable and free ?…”

• “Let’s do a windows condor pool, and let’s join OSG”

Page 7: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Randy Martin, David Atkinson, Matt Rector

Page 8: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Results

• Built a ~1,000 machines pool and got usage in 4 months• Learned condor installation, administration, debugging• Experience improved our management of the windows

machines– More efficient lab image distribution. The current in-house

developed method of distribution takes days to distribute image changes to all pcs…Also need to specify machine ads on image.

– Eliminate need for 2am image refreshes on each lab pc. • Outreach to the whole campus

• Got familiar with grid software and operation, used VDT

• Attending Condor week.

Page 9: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Details

• 1085 windows machines, 2 linux machines (central and a OSG gatekeeper), condor reporting 1563 slots

• 845 maintained by CCIT• 241 from other campus depts• >50 locations• From 1 to 112 machines in one location• Student housing, labs, library, coffee shop

• Mary Beth Kurz, first condor user at Clemson:• March 215,000 hours, ~110,000 jobs• April 110,000 hours, ~44,000 jobs

Page 10: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

The world before Condor

• 1800 input files• 3 alternative genetic algorithm designs• 50 replicates desired• Estimated running time on 3.2 GHz machine

with 1 GB RAM: 241 days

Slides from Dr. Kurz

Page 11: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

First submit file attemptMonday noon-ish

• Used the documentation and examples at Wisconsin condor site and created:

Universe   = vanillaExecutable = main.exelog        = re.logoutput     = out.$(Process).outarguments  = 1 llllll-0Queue

• Forgot to specify Windows and Intel and also to transfer the output back (thanks David Atkinson)

• Got a single submit file to run 2 specific input files by mid-afternoon TuesdaySlides from Dr. Kurz

Page 12: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Tuesday 6 pm – submitted 1800 jobs in a Cluster

Universe = vanillaExecutable = MainCondor.exerequirements = Arch=="INTEL" && OpSYS=="WINNT51"should_transfer_files = YEStransfer_input_files = InputData/input$(Process).ftwhenToTransferOutput = ON_EXITlog = run_1/re_1.logoutput = run_1/re_1.stdouterror = run_1/re_1.errtransfer_output_remaps = "1.out = run_1/opt1-output$

(Process).out"arguments = 1 input$(Process)queue 1800

• 200 ran at a time, but that eventually got resolved

Slides from Dr. Kurz

Page 13: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Wednesday afternoon: Love notes

Slides from Dr. Kurz

Page 14: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

OSG

nanoHUB

GLOW

Type of jobs being worked on

GROMACS, molecular dynamics

Cygwin

Java universe

Text mining for medline database

Matlab / Octave in cygwin

LIDAR data analysis, FDTD, Neural Networks

OSG

nanoHUB jobs

GLOW

“Currently, we are conducting large bio text mining on the Condor. The Medline database of U.S. National Library of Medicine includes over 17 million citations of life science journals for biomedical articles back to 1950s. Our research focuses on mining relationship between ten thousands genes, chemicals and hundreds of diseases from Medline database. The Condor provides us a platform for the quick, parallel search the Medline database.”

Dr. Feng Luo, School of Computing

Page 15: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Open Science Grid

Join the national infrastructure

Use the national infrastructure

Contribute resources (hardware and human)

Ease of installation through VDT

Page 16: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Firewall Issues

• Couple years ago after Blaster and Co, Clemson put every machine behind a firewall.

• Globus ephemeral ports closed– Cannot send Globus job from my desktop

osggate

desktop

ssh

Condor-c

Page 17: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

nanoHUB Internals

Globus enabled resources

GT2 or GT4 WSRF

Gateway machine

Initializes trusted proxy

Condor-G submit

Local Virtual Machines

Condor-C submit

PBS Submit

VNC redirect

Sessions managed by InVIGO-Lite

Static set of VMs

VIOLIN Virtual Cluster

Page 18: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Evolution of Science Gateways for Virtual Organizations

Remote Resources

SS

H -

Dir

ect

Acc

ess

Least interactionsLeast interactionsLeast interactionsLeast interactions

gsi

ssh

an

d/o

r W

eb

Serv

ices

Web ServicesWeb ServicesWeb ServicesWeb Services

Social interactionsSocial interactionsSocial interactionsSocial interactions

Info

rmati

on A

ge

Fully

Serv

ice O

rien

ted

A

rch

itect

ure

/Sem

an

tic

Gri

d

Tech

nolo

gie

s

Web ServicesWeb ServicesWeb ServicesWeb Services

Social immersionSocial immersionSocial immersionSocial immersion

Inte

ract

ion A

ge

Page 19: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Next-generation: Socially immersive science gateways

Work being led by Prof. Madhavan with CCIT collaboration

Page 20: Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May

Condor Week 2007

Conclusions

• Clemson has made computing a priority• Condor is the first “CI” project at Clemson• OSG is a close second• Condor has already impacted Clemson researcher• Clemson hopes to contribute to the community• NSF seems happy…

• Thanks to the Condor team !!

• Acknowledgements: Randy Martin, David Atkinson, Matt Rector, Mike Gossett,John Minor, Matt Saltzmann, Mary-Beth Kurz, Feng Luo