27
User Support, Campus Integration, OSG XSEDE Rob Gardner OSG Council Meeting June 25, 2015

User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

User Support, Campus Integration, OSG XSEDERob GardnerOSG Council MeetingJune 25, 2015

Page 2: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

149. Present to Council 1 page document on "Enabling Campus Resource Sharingand use of remote OSG resources in 15 minutes - Rob Gardner, Frank

Page 3: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Enabling Campus Sharing & Use of OSG

● Clemson helping drive this development● Two track strategy to integrate Palmetto

resource and user community○ Track 1: “light, quick”

■ Sumit from Palmetto to OSG, and back■ “Quick Connect” → OSG Connect to Palmetto via

hosted Bosco service (ssh)○ Track 2: “full OSG capabilities”

■ Full HT Condor CE, OASIS+Squid, {StashCache}Working document: goo.gl/9aNkJs

Page 4: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Track 1: OSG Connect to Clemson-Palmetto

● Hosted service @ OSG Connect● Addressed OrangeFS + Condor file locking

Current opp limit on PalmettoCapped at 500 jobs due to PBS Pro limitation that prevents Clemson users in the general pool from (non-owners) preempting OSG users. Expect a fix in next release of PBS Pro so that OSG jobs can claim additional idle cycles on Palmetto

submitted from login.osgconnect.net

Page 5: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Track 1: Submit from Clemson-Palmetto

● Download Campus Connect client from github● Minutes to submission to OSG

Page 6: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Campus Connect Client● lightweight module to

manage submission from a campus login host

● heavy lifting done at hosted schedd

● In Year 4, extend to reach, monitor, and account:○ local campus allocation○ XD allocation○ Full integration with

campus IDM & signupEvaluating at:

Page 7: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Longer term: Hosted Campus CE-ssh● Discussions to establish approach for hosted

CE services on behalf of campuses short of manpower

● Quick(er) on-ramp of a campus HPC cluster without requiring local OSG expertise

● @ the campus: provide ssh access, local accounts for supported VOs

● Normal CE operations handled by OSG staff● Possible “umbrella CE” for small campuses

Page 8: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

152. Pay attention to "Sound Bites" that communicate the scale and reach ofOSG to outside agencies/projects - Rob G, Bo, Clemmie

Page 9: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Open Science Grid: HTC supercomputer

● 2014 stats○ 67% size of XD, 35% BlueWaters○ 2.5 Million CPU hours/day○ 800M hours/year○ 125M/y provided opportunistic

● >1 petabyte data xfer/day● 50+ research groups● thousands of users● XD service provider for XSEDE

Rudi EigenmannProgram Director Division of

Advanced Cyberinfrastructure (ACI)NSF CISE

CASC Meeting, April 1, 2015

Page 10: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Lowering barriers to usability

OSG as a campus research computing cluster

★ Login host★ Job scheduler★ Software (modules)★ Storage★ Tools

Page 11: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Software & tools on the OSG

● Distributed software file system● Special module command

○ identical software on all clusters

● Common tools & libs● Curate on demand

continuously● HTC apps in XSEDE

campus bridging yum repo

$ switchmodules oasis

$ module avail

$ module load R

$ module load namd

$ module purge

$ switchmodules local

http://goo.gl/TlLq1M

Page 12: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Modules now used at most sites

All software accesses by module are monitoredcentrally for support purposes

Page 13: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

User Tools

● $ tutorial ● $ connect

○ on login.osgconnect.net, on campus, or laptop● $ module (software, all OASIS enabled sites)● $ stash-cp (Stash to job, in development)

Page 14: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Education and Training assets

● Helpdesk with community forum and knowledge base

● github seen as strategy for formal management of user documentation○ Markdown tutorials → same place as code○ tutorial write-ups track code samples closely○ auto html and upload to help desk (in seconds)

● Expect to announce helpdesk support.opensciencegrid.org this week

Page 15: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Code and Markdown managed in Github

Content indexed, searchable

Page 16: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

User sees personal history of supportrequests to OSG.. and can drill down to see full interaction history. Staff can make private notes, or link to a Jira issue for technical support tracking.

Of course, all available via email: [email protected]

Can DM tweet to @osgusers which generates a ticket

Page 17: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Uber-like feedback is collected(except we don’t rate users :)

Page 18: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

● Software Carpentry includes a section on scientific programming using Python. IPython Notebook is used for instruction.

● SWC typically asks users to install IPython on laptops; this is a top source of delays and confusion.

● In our DHTC edition of SWC, we already have a multiuser server with login accounts that users retain indefinitely.

● Idea: use this framework to provide a shared IPython, establishing a common baseline for the toolchain.

IPython Notebooks

Page 19: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

IPython Notebook Service

Developed a platform to launch per-user IPython Notebook servers:

1. User visits http://ipython.osgconnect.net and logs in.

Page 20: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

IPython Notebook Service

2. Server launches pre-configured IPython Docker container. Docker provides user and data isolation. Containers can be shut down and re-instantiated on demand.

3. Within moments, a newly provisioned IPython instance is available. Notebook storage is persistent and accessible via login.osgconnect.net.

Page 21: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Education and Training activities● Working with Tim Cartwright and Lauren Michael (ACI-REF)

to support 2015 OSG User School● UChicago-Northwestern roundtable (postponed to “Fall”)● OSG-SWC @ Duke, October 26-29 (tentative dates)

Joint Software Carpentry & Open Science Grid Workshop at Duke University

Distributed high throughput computation is concerned with using many computing resources potentially spread over large geographic distances and shared between organizations. These could be university research computing clusters, national leadership class HPC facilities, or public cloud resources. Incorporating these into science workflows can dramatically benefit your research program. However, to get the most of these systems requires some knowledge and skill in scientific computation. This workshop extends basic instruction on Linux programming from the Software Carpentry series with concepts and exercises on distributed high throughput computation. Participants will use resources of the Scalable Computing Support Center as well as the Open Science Grid, a national supercomputing-scale high throughput computing facility. There will be experts on hand to answer questions about distributed high throughput computing and whether it is a good fit for your science.

Page 22: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Other Campus Outreach Events

● Internet2 Technology Exchange, October 4-7, Cleveland (formal decision next week)○ Distributed High Throughput Computation: a Campus

Roundtable Discussion (Research Track)● Rocky Mountain Advanced Computing

Consortium, HPC Symposium (Aug 11-13, Boulder)○ 30 minute slot shared with XSEDE

● XSEDE15, CLUSTER15 (Campus Bridging)

Page 23: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

OSG as XD Provider to XSEDE

Page 24: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

OSG XD - Last 12 monthsProject Name PI Institution Field of Science Allocation Wall Hours

TG-IBN130001 Donald Krieger University of Pittsburgh Biological Sciences Research 54,881,313

TG-CHE140110 John Stubbs University of New England Chemistry Research 1,047,897

TG-DMR130036 Emanuel Gull University of Michigan Materials Science Research 563,106

TG-PHY120014 Qaisar Shafi University of Delaware Physics and astronomy Research 309,036

TG-CHE140098 Paul Siders University of Minnesota; Duluth Chemistry Research 88,047

TG-CHE130091 Paul Siders University of Minnesota; Duluth Chemistry Startup 58,086

TG-MCB140160 David Rhee Albert Einstein College of Medicine Molecular and Structural Biosciences Startup 39,517

TG-AST140088 Francis Halzen University of Wisconsin-Madison High Energy Physics Startup 30,850

TG-CHE140094 John Stubbs University of New England Chemistry Startup 27,057

TG-OCE130029 Yvonne Chan University of Hawaii; Manoa Ocean Sciences Startup 22,007

TG-IRI130016 Joseph Cohen University of Massachusetts; Boston Information Robotics and Intelligent Systems Startup 20,401

TG-DMR140072 Adrian Del Maestro University of Vermont Materials Science Startup 20,179

TG-OCE140013 Yvonne Chan University of Hawaii; Manoa Ocean Sciences Research 19,861

TG-AST150012 Gregory Snyder Space Telescope Science Institute Mathematical Sciences Startup 18,099

TG-MCB090163 Michael Hagan Brandeis University Molecular and Structural Biosciences Research 10,676

TG-DEB140008 Robert Toonen University of Hawaii; Manoa Biological Sciences Startup 4,147

TG-TRA130011 John Chrispell Indiana University of Pennsylvania Other Campus Champions 1,578

TG-MCB140232 Alan Chen SUNY at Albany Molecular Biosciences Startup 598

TG-SEE140006 Sheila Kannappan University of North Carolina; Chapel Hill Physics and astronomy Educational 46

TG-CDA100013 Mark Reed University of North Carolina; Chapel Hill Mathematical Sciences Campus Champions 6

TG-CCR120041 Luca Clementi San Diego Supercomputer Center Computer and Information Science and Engineering Startup 1

Total 57,162,509

Page 25: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

OSG XD: June XRAC Meeting (Nashville)● OSG pledges 2M CPU-hours (SUs) per quarter● There were 199 requests for XSEDE resources, mostly for

Stampede and Comet● There were no requests for OSG resources● Post meeting, following granted:

○ 50k SU to Kettimuthu/ANL (CS: workflow modeling)○ 100k SU to Qin/Spellman (class on gene networks)○ 1.39M SU to Gull/UMich (PHYS: condensed matter)

● Many NAMD requests○ → start a MD-HTC activity with ACI-REF?

Page 26: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

Conclusions & Outlook● “Clemson on the air”

○ Local submit to OSG validated at scale of 1500 jobs○ Joint use of campus and OSG resources in same work

environment■ Model for other campuses, ACI-REF as channel

○ Quick Connect to share resources is functional● HTC training materials now formally managed● Helping users via XSEDE

○ plan detailed studies of common application scaling properties & potential conversion to HTC workflow

Page 27: User Support, Campus Integration, OSG XSEDE · June 25, 2015. 149. Present to Council 1 page document on ... Education and Training assets Helpdesk with community forum and knowledge

[email protected]

@osgusers