Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University

Embed Size (px)

DESCRIPTION

ATLAS ATLAS is a particle detector being constructed as part of the Large Hadron Collider (LHC) LHC is in Geneva, Switzerland at CERN Collides Protons at 14 TeV with 40 MHz proton bunch crossing rate The goal is produce ~100 Hz event recording 1.5 PB raw data / year

Citation preview

Computing Issues for the ATLAS SWT2 What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University of Oklahoma and Langston University One of five Tier 2 centers in the U.S. Builds, operates and maintains computing resources for U.S. ATLAS ATLAS ATLAS is a particle detector being constructed as part of the Large Hadron Collider (LHC) LHC is in Geneva, Switzerland at CERN Collides Protons at 14 TeV with 40 MHz proton bunch crossing rate The goal is produce ~100 Hz event recording 1.5 PB raw data / year Monte-Carlo Simulations in ATLAS Why simulate? Trigger refinement Reconstruction refinement Understanding new phenomena Four step process Generation Simulation Digitization Reconstruction MC Generation Generation step (Pythia is one example) Work from a description of what type of event to generate Physics process Initial Energy Selection Criteria Software uses MC techniques to generate candidate events If events dont meet criteria, more events generated Simulation Uses generator output (particles/momenta) Physical description of detector (materials/environment) Based on GEANT Produces accurate depiction of how the particles move through the various parts of the detector, losing/depositing energy in the process This step consumes the most CPU time Digitization Introduces detector measurements Simulates behavior of detector electronics Input is taken from the simulation step Output is description of the event as seen by the measurement systems Output looks like real data from detector Reconstruction Reconstructs physics objects from the raw data Same code is used for MC data and real data Input is taken from digitization step (or detector) Output is the data physicists use for analysis Computing Event simulation is computationally intensive Simulation step takes a few minutes up to an hour per event Large need for simulated data during the lifetime of the experiment. Yields large need for compute cycles Computing (cont.) At full luminosity raw data will be ~1.5 PB/year Data for analysis will be order of magnitude smaller (but retained) Most ATLAS physicists need access to this data Yields large need for storage resources ATLAS Tier Model ATLAS is coping with tiered structure. Tier 0 (CERN) Tier 1 National Facility (Brookhaven National Laboratory) Tier 2 Regional Facility (SWT2, NET2, AGLT2, MWT2, SLAC) Tier3 University Facility (e.g. DPCC, HPC) Purpose of a Tier 2 Perform Monte-Carlo based simulations of collision events Perform Reprocessing, converting raw data to physics description Provide resources for user analysis Distributed Computing in ATLAS Data handling is central issue Move jobs or data Production versus Analysis Production is CPU intensive Analysis is I/O intensive How to manage user access Grid Computing How To Satisfy demands Balance cost / performance SMP vs Commodity processor SMPs Great for memory intensive multi-threaded applications Expensive Computing Clusters Less expensive Less performance for multi-threaded applications Can be improved by spending money on interconnect network UTA Computing Clusters DPCC (Joint project between Physics and CSE) Installed 2003 ~80 Node cluster (160 processors) 50TB Storage UTA_SWT2 Installed 2006 160 Node Cluster (320 processors) 16 TB Storage UTA_CPB Being installed now 50 Node (200 processors) 60TB Storage What Makes a Computing Cluster Head node(s) Allows interactive login, compilation Worker nodes Provide computing cycles, may not be accessible Network Can be most important aspect depending on computing model Storage access NFS, Global file systems Batch system Controls access to worker nodes by scheduling work Networking Cost vs. Performance for application Main issue is communication latency Low cost (Ethernet) Suitable for single threaded applications Can be used for multi-threaded applications Higher latency High cost (Myricom, Infiniband) Low latency Best suited for improving multi-threaded apps DPCC Resources 80 worker nodes Dual 2.4 / 2.66 GHz Xeon processors 2GB RAM 60/80 GB local Disk 50 TB RAID storage 11 raid servers (3 x 1.5 TB RAID Cards) 1000 Mb/s Ethernet internal network DPCC Diagram UTA_SWT2 Resources Wholly operated by UTA for ATLAS 160 Worker nodes Dual 3.2 GHz Xeon EM64T processors 4GB RAM 160 GB local disk 16 TB Storage DataDirect S2A3000 SAN 6 I/O servers running IBRIX Dual internal networks Management (100 Mbs) Storage (1000 Mbs) UTA_SWT2 UTA_CPB Being constructed now 50 Worker nodes Dual dual-core 2.4 GHz Opteron (2216) 8 GB RAM 80 GB Local Storage 60 TB Storage 6 I/O servers with attached storage 10 Dell MD1000 Storage Units (7.5 TB RAW) Single 1000 Mb/s Network Will likely be supplemented this year with additional storage UTA_CPB Summary 520 dedicated cores + ~100 additional available 90 TB disk space + ~30TB additional available