Upload
augustine-neal
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
DOSAR Workshop V
September 27, 2007
Michael BryantLouisiana Tech University
Louisiana Tech Site Report
9/27/2007DOSAR Workshop V
2
Louisiana Tech University and LONI
COMPUTING IN LOUISIANA
• At the Center for Applied Physics Studies (CAPS),▫Small 8 node cluster with 28 processors (60 Gigaflops)
Used by our local researchers and the Open Science Grid Dedicated Condor Pool of both 32-bit and 64-bit (w/ compat)
machines running RHEL5• Additional resources at LTU through the Louisiana Optical
Network Initiative (LONI)▫ Intel Xeon 5TF Linux cluster (not yet ready):
128 nodes (512 CPUs), 512 GB RAM 4.772 TF peak performance
▫ IBM Power5 AIX cluster 13 nodes (104 CPUs), 224 GB RAM 0.851 TF peak performance
9/27/2007DOSAR Workshop V
3
Computing Locally at LTU
• Focused on High Energy Physics, High Availability (HA) and Grid computing, and Biomedical Data Mining▫High Energy Physics:
Fermilab (D0), CERN (ATLAS), and ILC: Dr. Lee Sawyer, Dr. Dick Greenwood (Institutional Rep.), Dr. Markus
Wobisch» Joe Steele is now at TRIUMF in Vancouver
Jefferson Lab (G0, Qweak experiments) Dr. Kathleen Johnston, Dr. Neven Simicevic, Dr. Steve Wells, Dr. Klaus
Grimm▫HA and Grid computing :
Dr. Box Leangsuksun Vishal Rampure Michael Bryant (me)
9/27/2007DOSAR Workshop V
4
Louisiana Tech Researchers
• 40Gb/sec bandwidth state-wide• Next-generation network for research• Connected to the National LambdaRail (NLR, 10Gb/sec)
in Baton Rouge• Spans 6 universities and 2 health centers
The Louisiana Optical Network Initiative (LONI) is a high speed computing and networking resource supporting scientific research and the development of new technologies, protocols, and applications to positively impact higher education and economic development in Louisiana.
9/27/2007DOSAR Workshop V
5
Louisiana Optical Network Initiative
- http://loni.org
• 1 x Dell 50 TF Intel Linux cluster housed at the state's Information Systems Building (ISB)▫ “Queen Bee” named after Governor Kathleen Blanco who pledged $40 million
over ten years for the development and support of LONI.▫ 680 nodes (5,440 CPUs), 688 GB RAM
Two quad-core 2.33 GHz Intel Xeon 64-bit processors 8 GB RAM per node
▫ Measured 50.7 TF peak performance▫ According to the June, 2007 Top500 listing*, Queen Bee ranked the
23rd fastest supercomputer in the world.
• 6 x Dell 5 TF Intel Linux clusters housed at 6 LONI member institutions▫ 128 nodes (512 CPUs), 512 GB RAM
Two dual-core 2.33 GHz Xeon 64-bit processors 4 GB RAM per node
▫ Measured 4.772 TF peak performance
• 5 x IBM Power5 575 AIX clusters housed at 5 LONI member institutions▫ 13 nodes (104 CPUs), 224 GB RAM
Eight 1.9 GHz IBM Power5 processors 16 GB RAM per node
▫ Measured 0.851 TF peak performance
9/27/2007DOSAR Workshop V
6
LONI Computing Resources
* http://top500.org/list/2007/06/100
Combined total of 84 Teraflops
National Lambda Rail
Louisiana Optical Network
IBM P5 Supercomputers
LONI Members
Dell 80 TF Cluster
NEXT ???
9/27/2007DOSAR Workshop V
7
LONI: The big picture…by Chris Womack
•Goal: enable domain scientists to focus on their primary research problem, assured that the underlying infrastructure will manage the low-level data handling issues.
•Novel approach: treat data storage resources and the tasks related to data access as first class entities just like computational resources and compute tasks.
•Key technologies being developed: data-aware storage systems, data-aware schedulers (i.e. Stork), and cross-domain meta-data scheme.
•Provides and additional 200TB disk, and 400TB tape storage
9/27/2007DOSAR Workshop V
8
UNO
Tulane
LSU
ULL
LaTech
High Energy PhysicsBiomedical Data Mining
Coastal ModelingPetroleum Engineering
Synchrotron X-ray Microtomography Computational Fluid Dynamics
Biophysics
Molecular BiologyComputational Cardiac Electrophysiology Petroleum Engineering
Geology
Participating institutions in the PetaShare project, connected through LONI. Sample research of the participating researchers pictured (i.e. biomechanics by Kodiyalam & Wischusen, tangible interaction by Ullmer, coastal studies by Walker, and molecular biology by Bishop).
9/27/2007DOSAR Workshop V
10
LONI and the Open Science Grid
ACCESSING RESOURCES ON THE GRID
• Located here at Louisiana Tech University• OSG 0.6.0 production site• Using our small 8 node Linux cluster
▫ Dedicated Condor Pool using 20 of the 28 CPUs▫ 8 nodes (28 CPUs), 36 GB RAM
2 x Dual 2.2 GHz Xeon 32-bit processors, 2GB RAM per node 2 x Dual 2.8 GHz Xeon 32-bit processors, 2GB RAM per node 2 x Dual 2.0 GHz Operton 64-bit processors, 2GB RAM per node 1 x Two quad-core 2.0 GHz Xeon 64-bit processors, 16GB RAM 1 x Two quad-core 2.0 GHz Xeon 64-bit processors, 8GB RAM
• We would like to…▫ Expand to Windows Co-Linux Condor Pool▫ Combine with IfM and CS clusters
• Plan to move to OSG ITB when the LONI 5TF Linux cluster at LTU becomes available
9/27/2007DOSAR Workshop V
11
OSG Compute Element: LTU_OSG
• Located at the Center for Computation & Technology (CCT) at Louisiana State University (LSU) in Baton Rouge, La.
• OSG 0.6.0 production site• Using the LONI 5TF Linux cluster at LSU (Eric)
▫ PBS opportunistic single-processor queue▫ Only 64 CPUs (16 nodes) available from the 512 CPUs total
128 nodes, 512 GB RAM Two dual-core 2.33 GHz Xeon 64-bit processors 4 GB RAM per node
▫ The 16 nodes are shared with other PBS queues
• Played a big role in the DZero reprocessing effort▫ Dedicated access to LONI cluster during reprocessing▫ 384 CPUs total were used simultaneously
• Continuing to run DZero MC production at both sites
9/27/2007DOSAR Workshop V
12
OSG Compute Element: LTU_CCT
9/27/2007DOSAR Workshop V
13
Reprocessing at LTU_CCT
LTU_CCT (LONI)
9/27/2007DOSAR Workshop V
14
Reprocessing at LTU_CCT (cont.)
LTU_CCT (LONI)
9/27/2007DOSAR Workshop V
15
DZero MC Production for LTU*
Weekly production by site
Cumulative production by site* LTU_CCT and LTU_OSG are combined
8.5 million events total
9/27/2007DOSAR Workshop V
16
LONI OSG CEs and PanDA Scalability + High Availability
CURRENT STATUS AND FUTURE PLANS
• Upgraded to OSG 0.6.0 • Upgraded to RHEL5• Added two new Dell Precision Workstations (16 CPUs, two quad-
core 2.0GHz Xeon 64-bit processors, 16GB and 8GB)• Connected to LONI 40Gbps network in June (finally!)
Allows us to run D0 MC again• Running DZero MC production jobs (sent using Joel’s AutoMC
daemon)• Installed standalone Athena 12.0.6 on caps10 for testing ATLAS
analysis
9/27/2007DOSAR Workshop V
17
Current Status of LTU_OSG
• Switched to the LONI 5TF (Eric) cluster from SuperMike/Helix• Upgraded to OSG 0.6.0• Running DZero MC production jobs (sent using Joel’s AutoMC
daemon)• Running ATLAS production test jobs
▫ Problems so far: Pacman following symlinks! (/panasas/osg/app -> /panasas/osg/grid/app
on headnode) Conflict with 32-bit Python install on 64-bit OS (https:// not supported) OSG_APP Python path was wrong Incorrect Tier2 DQ2 URL
▫ 3 successful tests, need a few more before running full production
9/27/2007DOSAR Workshop V
18
Current Status of LTU_CCT
• Create OSG CEs at each of the six LONI sites• Possibly creating a LONI state-wide grid
▫Tevfik Kosar is building a campus grid at LSU• Begin setting up PetaShare storage at each LONI site• PanDA scalability tests on Queen Bee
▫Proposing to PanDA team and LONI allocation committee• Involving other non-HEP projects to DOSAR using PanDA (see talk
tomorrow)• Applying HA techniques to PanDA and the Grid (see talk
tomorrow)
9/27/2007DOSAR Workshop V
19
What’s next?
9/27/2007DOSAR Workshop V
20
QUESTIONS / COMMENTS?