Santa Fe 6/18/03 Timothy L. Thomas 1 “UCF” Computing Capabilities at UNM HPC Timothy L. Thomas...

Preview:

Citation preview

Santa Fe 6/18/03 Timothy L. Thomas 1

“UCF”Computing Capabilities at

UNM HPC

Timothy L. ThomasUNM Dept of Physics and Astronomy

Santa Fe 6/18/03

Timothy L. Thomas 4

Santa Fe 6/18/03

Timothy L. Thomas 5

Santa Fe 6/18/03

Timothy L. Thomas 6

Santa Fe 6/18/03 Timothy L. Thomas 7

Santa Fe 6/18/03 Timothy L. Thomas 9

Analysis Workshop, 8/8/2002 Charles F. Maguire, Vanderbilt4

Run2 Remote Site PISA Simulation Statistics

Vanderbilt Farm Projects 1 (hadrons), 6 (deuterons), 9 (pizero), and 10 (electrons) 10000 CPU-hours, >600 GBytes, ~1.5 person-month effort

UNM Farm Projects 0 (EMCal HIJING), 7 (EMCal + Muon Hijing), 9 (pizero) 33000 CPU-hours, >500 GBytes, ~1.5 person-month effort

LLNL Farm Projects 4 (EMCal Hijing), 5 (HBT), 23 (high pt pizero) ~14000 CPU-hours, >300 GBytes, ~1.5 person-month

SUNYSB Farm Projects 11, 13, 14, 17, 23 (electron working group requests) ~1500 CPU-hours, 100 GBytes (?), ~0.5 person-month

WIS Farm Project 8 (Phi->K+K-, Phi->e+e-) ~6000 CPU-hours, ~500 GBytes, ~1.0 person-month

Grand Totals: ~65000 CPU-hours (90 CPU-months),~2 TBytes, ~6 person-months

I have a 200K SU (150K LL CPU hour) grant from the NRAC of theNSF/NCSA, with which UNM HPC (“AHPCC”) is affiliated.

Peripheral Data Vs Simulation

Simulation: Muons From Central Hijing (QM02 Project07)

Data: Centrality by Perp > 60

(Stolen from Andrew…)

Simulated Decay MuonsQM’02 Project07 PISA files (Central HIJING)Closest cuts possible from PISA file to match data(PT parent >1 GeV/c, Theta Porig Parent 155-161) Investigating possibility of keeping only muon and parent hits for reconstruction.17100 total events distributed over Z=±10, ±20, ±38 More events available but only a factor for smallest error bar

Zeff ~75 cm

"(IDPART==5 || IDPART==6) && IDPARENT >6 &&IDPARENT < 13 && PTHE_PRI >155 && PTHE_PRI < 161 && IPLANE == 1 && IARM == 2 && LASTGAP > 2002 && PTOT_PRI*sin(PTHE_PRI*acos(0)/90.) > 1."

Not in fit

(Stolen from Andrew…)

Now at UNM HPC:

• PBS• Globus 2.2.x• Condor-G / Condor• (GDMP)

…all supported by HPC staff.

In Progress:

A new 1.2 TB RAID 5 disk server, to host:• AFS cache PHENIX software• ARGO file catalog (PostgreSQL)• Local Objectivity mirror• Globus 2.2.x (GridFTP and more…)

Pre-QM2002 experiencewith globus-url-copy…

• Easily saturated UNMbandwidth limitations(as they were at that time)

• PKI infrastructure andsophisticated error-handlingare a real bonus over bbftp.(One bug, known at the timeis being / has been addressed.)

(at left: 10 streams used)

KB/sec

Santa Fe 6/18/03

Timothy L. Thomas 15

LLDIMU.HPC.UNM.EDU

Santa Fe 6/18/03

Timothy L. Thomas 16

Santa Fe 6/18/03

Timothy L. Thomas 17

Santa Fe 6/18/03

Timothy L. Thomas 18

Santa Fe 6/18/03 Timothy L. Thomas 19

Resources

Filtered event can be analyzed, but not ALL PRDF event

Many trigger has overlap.

Assume 90KByte/event and 0.1GByte/hour/CPUSignal

Trigger Lumi[nb^-

1]#Event[

M]Size[Gbyt

e]CPU[hour

]100CPU[da

y]

mu-mu

mu

e-mu

ERT_electron 193 13.0 1170 11700 4.9 1

MUIDN_1D_&BBCLL1 238 34.0 3060 30600 12.8 1 1 1

MUIDN_1D&MUIDS_1D&BBCLL1

59 0.2 18 180 0.1 1

MUIDN_1D1S&BBCL1 254 4.8 432 4320 1.8 1 1 1

MUIDN_1D1S&NTCN 230 18.0 1620 16200 6.8 1

MUIDS_1D&BBCLL1 274 10.7 963 9630 4.0 1 1 1

MUIDS_1D1S&BBCLL1 293 1.3 117 1170 0.5 1

MUIDS_1D1S&NTCS 278 5.0 450 4500 1.9 1

ALL PRDF 350 6600.0 33,000 330,000 137.5

Rough calculation of real-data processing (I/O-intensive) capabilities:

10 M events, PRDF-to-{DST+x}, both mut & mutoo; assume 3 sec/event (*1.3 for LL), 200200 KB/event.

One pass: 7 days on 50 CPUs (25 boxes), using 56% of LL local network capacity.

My 200K “SU” (~150K LL CPU hours) allocation allows for 18 of these passes (4.2 months)

3 MB/sec Internet2 connection = 1.6 TB / 12 nights (MUIDN_1D1S&NTCN)

(Presently) LL is most effective for CPU-intensive tasks: simulations can easily fill the 512 CPUs; e.g, QM02 Project 07.

Caveats: “LLDIMU” is a front-end machine; LL worker node environment is different from CAS/RCS node ( P.Power…)

Recommended