Upload
tovah
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
STAR Grid Activities, OSG and Beyond. D. Olson a for the STAR Collaboration The STAR Grid Team: W. Betts b , L. Didenko b , T. Freeman c , P. Jakl b , L. Hajdu b , E. Hjort a , K. Keahey c , J. Lauret b , D. Olson a , A. Rose a , I. Sakrejda a , A. Sim a a LBNL, b BNL, c ANL. Abstract. - PowerPoint PPT Presentation
Citation preview
STARSTAR
STAR Grid Activities, OSG and Beyond
D. Olsona for the STAR CollaborationThe STAR Grid Team:
W. Bettsb, L. Didenkob, T. Freemanc, P. Jaklb, L. Hajdub, E. Hjorta, K. Keaheyc, J. Lauretb,
D. Olsona, A. Rosea, I. Sakrejdaa, A. Sima
aLBNL, bBNL, cANL
Olson, STAR Grid Activities, ISGC2008 2
STARSTAR9 Apr 2008
Abstract
We will present the ongoing grid efforts of the STAR experiment within the Open Science Grid (OSG) and beyond, as well as the integration of resources in Europe, Asia and South America. STAR is a founding member of the OSG Consortium and has several functioning resources on OSG, its main facilities at BNL/RCF and LBNL/NERSC as well as universities, Wayne & Birmingham. Additional resources are in process of connecting to OSG. Numerous distributed resources used by STAR collaborators are employing grid or grid-inspired technologies. Common examples are the usage of grid job submission tools with the STAR standard workload service called SUMS and the use of data handling and transfer tools across grids. Maximizing on heterogeneity of resources while minimizing in-house platform support efforts, evaluation of the dynamic deployment of reliable data analysis framework via STAR validated software stack with Xen virtual machine is being thoroughly investigated, leveraging advanced VM technologies and research from the CEDPS project.
Olson, STAR Grid Activities, ISGC2008 3
STARSTAR9 Apr 2008
Contents
• Background/History• Open Science Grid Deployments and Usage• Other Distributed Computing Usage• Asian Activities• Workload Scheduling (SUMS)• Virtualization & Cloud Computing• Conclusion
Olson, STAR Grid Activities, ISGC2008 4
STARSTAR9 Apr 2008
Background/History
• STAR has been participating in the U.S. grid activities since the early days of the Particle Physics Data Grid (1999) and a founding member of the Open Science Grid.
• Starting with involvement of LBNL and BNL, activities now include collaborators also at Wayne State, MIT, Univ. Chicago, Birmingham, Sao Paolo, Prague and ANL.
• Additionally– SUN Grid, 2007
– MIT Xgrid, 2006+
– Xen, Amazon EC2, 2007+
STARSTAR
PDSFBerkeley LAB
Brookhaven National Lab
Fermi Lab
University of Birmingham
Wayne State University
STAR Grid STAR Grid = 90% of Grid resources part of the OpenScience Grid
STARSTAR
Amazon.com
MIT X-grid
SunGrid
NPI, Czech Republic
Interoperability / outreach Virtualization VDT extension SRM / DPM / EGEE
STAR is also outreaching other grid resources & projects
Olson, STAR Grid Activities, ISGC2008 7
STARSTAR9 Apr 2008
Resources used by STAR
6 main dedicated sites (STAR software fully installed)• BNL Tier0• NERSC/PDSF Tier1• WSU (Wayne State University) Tier2• BHAM (Birmingham, England) Tier2• UIC (University of Illinois, Chicago) Tier2
Incoming• Prague Tier2
Other resources• FermiGrid - non STAR dedicated ; simulation production 10% level• SunGrid – commercial (free for STAR) ; event generation 1-2% level• MIT Xgrid cluster – analysis mainly ; working on Globus GK for Mac OSX• Amazon.com EC2 cluster (Elastic Computing Cloud) ; event generation for
now ; exercise on Xen based virtualization 1-2% level
Olson, STAR Grid Activities, ISGC2008 8
STARSTAR9 Apr 2008
BeStMan SRMBerkeley Storage Manager
• SRM interface with caching for data transfer• We use for bulk data transfer as well as asynchronous data
placement in job workflow.• Expect to deploy BeStMan-Xrootd interface
http://datagrid.lbl.gov/bestman/
Olson, STAR Grid Activities, ISGC2008 9
STARSTAR9 Apr 2008
OSG usageUsage - Process Hours / Week
Olson, STAR Grid Activities, ISGC2008 10
STARSTAR9 Apr 2008
Proof of Principle Initial Successes and Benefits from OSG• Year 1 OSG Milestone for STAR:
– Migration of 80% or more of the simulation production to OSG based operation• Simulation production - 97% efficiency achieved
– Exceeds expectations (we targeted a satisfactory level between 75% to 85% success)• Site used are not necessarily STAR dedicated (FermiGrid)
– Especially: STAR received help from Fermi resources and the FNAL team in June 2007• several k CPU hours loaned on emergency request• as small as it seems, this help made the difference
– This part of resource loan worked and is an important proof of principle of OSG benefit
Before resubmission.
After resubmission
Efficiency of job executionvia OSG infrastructure.
Olson, STAR Grid Activities, ISGC2008 11
STARSTAR9 Apr 2008
Other grid/distributed activities
• Xgrid at MIT– Adam Kocoloski, Michael Miller
Leve Hajdu– Mac OS X, 50 desktops– Scavenging spare cycles– Doing STAR data analysis via
SUMS so same UI for analysis– Xgrid/Globus job manager in test
• Prague, EGEE Tier2 site– Michal Zerola, Pavl Jakl– High-performance data transfer using multiple srmcp to DPM in
Prague (next slide)
• SUN Grid– Production of STAR Geant simulations on SUN utility computing
resources.
Olson, STAR Grid Activities, ISGC2008 12
STARSTAR9 Apr 2008
Data transfer to Prague:
parallel srmcp to DPM storage element, 700 Mbps – 20 threads
Olson, STAR Grid Activities, ISGC2008 13
STARSTAR9 Apr 2008
STAR Asian institutions
• China– IHEP, Beijing (2)– Institute of Modern Physics, Lanzhou (6)– USTC, Beijing (14)– Shanghai Institute of Applied Physics (11)– Tsinghua University (9)– Institute of Particle Physics, Wuhan (12)
• India– Institute of Physics, Bhubaneswar (4)– Indian Institute of Technology, Mumbai (5)– University of Jammu (15)– Panjab University (5)– University of Rajasthan (3)– Variable Energy Cyclotron Centre, Kolkata (14)
• Korea– Pusan National University (4)– KISTI (in progress as CS collaborator)
Olson, STAR Grid Activities, ISGC2008 14
STARSTAR9 Apr 2008
Asian Activities
• Many collaborators in Asia• Planning for Tier2-like facility at PNU• Discussions with KISTI of possible Tier1-like facility for
Asia region• Anxious to see how we can better interface/integrate
with our Asian collaborators on computational aspects
Olson, STAR Grid Activities, ISGC2008 15
STARSTAR9 Apr 2008
Gloriad
• 10 Gb all the way through NY
• Would allow for immediate full data transfer• Would allow later year ½ dataset transfer
– Possibly more depending on Gloriad expansion
Olson, STAR Grid Activities, ISGC2008 16
STARSTAR9 Apr 2008
SUMS
• STAR Unified Meta Scheduler• A single user interface and
framework for submitting to all STAR resources, local and grid flavors
• Optimizes resource utilization
25K jobs/day
Olson, STAR Grid Activities, ISGC2008 17
STARSTAR9 Apr 2008
Why Xen? Virtualization?
• SIMULATION = EVENT GENERATION IS EASY …– We can all do it …
• BEYOND THAT, the reality– Complex experimental application codes
• Developed over more than 10 years, by more than 100 scientists, comprises ~2 M lines of C++ and Fortran code
– Require complex, customized environments• Rely on the right combination of compiler versions and available
libraries • Dynamically load external libraries depending on the task to be
performed – Environment validation
• To ensure reproducibility and result uniformity across environments• Regression tests cannot be done on all OS flavors due to simple
manpower considerations)
Olson, STAR Grid Activities, ISGC2008 18
STARSTAR9 Apr 2008
Why Xen? Virtualization?
• Solution? Use Virtual Machines (Xen)– Bring your environment with you– Fast to deploy, enables short-term leasing– Excellent enforcement, performance isolation– Very good security isolation– Minimize experiment team’s efforts
• Activity ↔ Development effort leveraged though CEDPS SciDAC partner project
Olson, STAR Grid Activities, ISGC2008 19
STARSTAR9 Apr 2008
Deploying OSG Cluster as Workspaces
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
VWSService
Cluster manager can deploy gatekeeper and
workernodes in ~ 30 min.
Application workload submitted to cluster as to any other OSG CE.
OSG CE image as gatekeeper
Worker node images with application
environment.
Cluster can be retired after workload finishes,
freeing resources for other applications.
Olson, STAR Grid Activities, ISGC2008 20
STARSTAR9 Apr 2008
Virtual Machine activities
• “Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing
easier for developers.”• Work so far:
– Xen image with OSG 0.6.0 CE on SL 4.4– Xen image with OSG 0.6.0 WN on SL 4.4– Use Globus Workspaces to deploy gatekeeper and workernodes
on EC2– Can launch 100 node cluster in ~ 30 min.– Have run Hijing event generator simulations on EC2.– Have prepared Xen image with full STAR software environment
on SL4.4, currently being validated
• Next steps:– Run event reconstruction of simulations on EC2 and Teraport
cloud
Olson, STAR Grid Activities, ISGC2008 21
STARSTAR9 Apr 2008
NerscPDSF
ENC2Amazon.com
WSU
Accelerated display of a workflow job state Y = job number, X = job state
Olson, STAR Grid Activities, ISGC2008 22
STARSTAR9 Apr 2008
VM image build/maintenance
• We are working with rPath, Inc. in an SBIR project to use rBuilder to efficiently build and maintain OS and application images.
• From the inventors of RPM,rBuilder– http://www.rpath.com/rbuilder– “rBuilder is the first and only development tool that simplifies and
automates the creation of software appliances and virtual appliances. rBuilder combines powerful features with innovative packaging techniques to yield a repeatable appliance creation process. “
Olson, STAR Grid Activities, ISGC2008 23
STARSTAR9 Apr 2008
Near term plans• We MUST prepare for real data production on OSG
– And take ANY shortcut necessary to accomplish it BY 2009• onset of DAQ1000, one order of magnitude higher data acquisition rate than today
will require additional resources for real-data processing• Virtualization appears to us as one development helping to easily deploy & run a 2
Million line framework (software) for data mining– UCM job tracking (SBIR with Tech-X) is maturing
• Essential to engage discussion on integration – we MUST monitor our application
• We have to consolidate our sites– More resources are available in STAR but not-fully used (BHAM, UIC for example)
• We will ramp up in infrastructure support to achieve this• We hope leveraging OSG efforts in the US (UIC for example)
– We have efforts in integrating Mac OS-X resources from MIT• Initial work was uniquely started in STAR• Is there a path forward? Depends on priorities …
Olson, STAR Grid Activities, ISGC2008 24
STARSTAR9 Apr 2008
Longer term needs
• Requirements driven by demanding data processing– https://twiki.grid.iu.edu/twiki/bin/view/UserGroup/VOApplicationsRequire
ments#STAR– We will need to efficiently share resources
• Concerned about what happens when LHC has ramped up data taking.
• Will there be any cycles left to be had?
• Additional– STAR is expanding its pool of sites
• Interest in sites possibly shared by EGEE - OSG interoperability (especially China)
• Hoping for help from OSG to understand policy as well as technology issues.
– We believe virtualization is “a” path forward to • Simple deployment of experimental software• Allowing experimental software developer’s team to concentrate on
science and a minimal OS version support• Globus workload management needed
Olson, STAR Grid Activities, ISGC2008 25
STARSTAR9 Apr 2008
Conclusion
• STAR Grid usage is expanding geographically and functionally.
• Upgrades at STAR and RHIC are driving a significant increase in computational needs beginning next year which means we MUST push more workload onto the grid.
• The emergence (and convergence?) of VM, cloud computing and grid make very powerful paradigm for scientific computing.
• We want (and need) to have greater involvement with our Asia-Pacific colleagues which is enabled with new trans-Pacific networks.