Increase the event rate into Level 3 Increase the event rate onto storage

Possible DAQ UpgradesDAQ1k… DAQ2k… DAQ10k!?

Tonko LjubičićSTAR/BNL

(for the “3L Group” — Landgraf, LeVine & Ljubičić)(Lange would fit nicely too, )

– Increase the event rate into Level 3– Increase the event rate onto storage

… but make it cheap (unlikely)

… and make it simple (unlikely)

… and do it without additional manpower (ridiculous)

… and do it while STAR is taking data (problematic)

• We have the TPC (or similar) i.e. a tracking device with many channels

• We want a Level 3 trigger (based upon tracks)• We have a good cluster finder so we save only

the 2D hitpoints• The final storage (tapes) is under RCF’s control• Assumed Requirements:

– At least 1000 Hz Level 3 rate (central, Au+Au)– At least 100 Hz storage rate (central, Au+Au)

Assumptions…

DAQ Components

• Event Builder and event buffer

• Level 3 CPU farm

• DAQ frontend (Cluster Finder, Formatter)

• Detector Frontend (FEE)

• Network interconnect:– Between DAQ frontend, L3, EVB– Between FEE and DAQ frontend

DAQ Components (cont’d)(current)

• EVB: 1 Sun, 70 MB/s, 700 GB buffer 10 Hz central AuAu raw, 50 Hz clusters only

• L3: 48 500 MHz Alphas 50 Hz central AuAu• DAQ RB: 144X3 slow I960CPUs 50 Hz

central AuAu• TPC FEE: 100 Hz• Network:

– Main: Myrinet, 100 MB/s/link– FEEDAQ: 1.25 Gb/s 100 evts/s

Upgrades (EVB)• Cluster of Linux CPUs connected via Gigabit ethernet switch to RCF• Each has:

– Large (and cheap) disk buffers (i.e. 4 X 120 GB IDE)– 512 MB memory (not that much)– 1 Gigabit Ethernet card (cheap)– 1 Myrinet card (for internal DAQ) (1 k$)– 1 CPU of any slow variety (not CPU-intensive)– Good, fast motherboard (I/O intensive)

• Need about 5-10 of them• Advantages:

– Scalability – adding more nodes increases rates linearly– Paralellism is simple – round robin on an event-by-event basis, all nodes are

equal– Robustness – all are the same, trivial automatic recovery in case of failure– Cost – IDE disks are soooo much cheaper than SCSI

• Cost: 4 k$ per cluster (nicely equipped). Now!– Compare to current 50 k$ for a single Sun workstation: for the cost of one Sun

we get 10 X (!) the throughput!

Upgrades (TPC FEE)

• ALICE developed a FEE chip for their own TPC (ALTRO)

• 8 channel analog/digital hybrid with ADCs and DSP on chip

• pedestal subtraction, gain correction, baseline restoration, zero-suppression and event buffering (8 buffers) on chip

• (up to) 20 MHz sampling clock• Decoupled readout clock of (up to) 40 MHz• Available now (?)• Needs more evaluation but looks promising!• Expect more details from the Berkeley guys in the

near future (Bieser, Crawford)

Upgrades (DAQ frontend)• Inputs data from detector FEE, finds clusters, formats them, calculates pedestals,

buffers data, ships to L3/EVB, etc. – versatile • Works on a M X N (2D) plaque suitable for most detectors (i.e. TPC padrow is 182 X

512, SVT is 240 X 128, etc.) – “detector blind”• Current example:

– Intel I960HD CPU, 66 MHz internal, 33 MHz external bus takes ~ 7 ms for a central Au+Au event per padrow need speedup of ~10 X (but hope for more, )

• Possible choices:– DSPs (“easy” to program; many, many to choose from)– FPGAs (tough to program, fast!, many to choose from)– Embedded FPGA cores or hybrids (i.e. Xilinx Virtex II Pro)

• Combination of both FPGA & CPU• Versatile – many have fast links (i.e. 3.25 Gb/s !) on chip!• Extremely complex!• Expensive! (at least now…)

• A lot of R&D:– Evaluate possible hardware choices (above)– Adapt the cluster finder software to the different hardware– Need very specific manpower – possible cooperation with Instrumentation Division– Very critical item – need to start work NOW! (R&D funding)

DAQ Interconnects

• Complex issue depends on:– Where will the Cluster Finder be? On the detector? In the DAQ

room?– What is done in FEE vs. Cluster Finder? Does FEE zero-

suppress (ala ALICE FEE) or it is left to the DAQ frontend (like now)?

– Data aggregation and scheduling? How does one pack this data? Multiplexing scheme? Data routing?

– How many fibers one needs? At which speed? Which topology?– Does one use commercially available switches/protocols (i.e.

Gigabit, 10 Gb???) or use custom built (like we do now)?– One needs to ship a Sector’s worth of data to a single L3 Node –

how? Which network? Which topology? – Cost !?– Need to start thinking NOW!

Level 3 (tracking)

• This is tough:– Currently takes 40 ms/sector with a pretty fast (500 MHz 21264 Alpha)

CPU need to speed up at least 50 times!– How to get 50 X (some ideas):

• Faster CPU in 6 years (~ 4X)• Concentrate on primary tracks (~2 X)• Know the vertex (~2 X). Need vertex detector!!!• Tune the code (~ 2 X)• Only tracks that exit the volume i.e. pass trough the last padrow in the TPC

(implied rapidity-Pt cut) (~2 X)• Use as seeds track hits in other detectors (EMC? TOFRPC?) (~ 2X)• Parallelize, parallelize, parallelize!

– i.e. each CPU node is a 4way SMP with each CPU working on one track in parallel (~4 X)

• Could be done! (With a lot of magic wand waiving…)• Cost!? Assume 4 X 4way SMP per sector @ 24 sectors that’s 96

4way SMP machines. @10 k$ machine that’s ~ 1 M$. Doable.

Level 3 (cont’d)• How to reduce cost and make it sweeter?• Let’s look at Offline vs. Level 3 CPU farms similarities:

– Both need super fast CPUs– A lot of them!– Offline needs a fast connection to the data source (i.e. HPSS tapes) but Level 3 already has

(or can easily be made to have) a connection to HPSS!• Differences:

– Offline needs disks and a lot of memory – L3 doesn’t– Offline needs different code structures and perhaps OS setup

• Skin Changing Local Grid– Level 3 nodes “become” reconstruction nodes when not in use in DAQ (“change skin”)– Level 3 generally boots diskless (for L3) and this system is under complete control of the L3

Group. L3 code doesn’t even need to know that there are disks in the node!– Offline needs disks and all the code (kernel/OS/reconstruction) images on those disks are

under complete control of Offline.– Switch from the Level 3 “skin” to Reconstruction is done via a reboot command with an

appropriate parameter (i.e. “boot –l3” or “boot –offl”). (The simplest, cleanest but slowest way)

• Advantages:– Major cost saving

• Disadvantages:– Can’t run the whole system at the same time (but one could run certain partitions depending

on the required load!)

Summary• EVB rates no problem (up to 500 MB/s) for STAR-DAQ however the RCF

side is a different issue (see M. Messer’s talk)

• Detector FEE + DAQ Frontend + Level 3 needs a complete rehaul and we must start from scratch

• If we maintain any of the existing systems we can not go above 50 Hz

• 1000 kHz (or more) into Level 3 is doable but a lot of work needs to be done to optimize it

• We need to know what are we looking for in L3 since a completely general and exhaustive tracking will probably not be possible

• Most of the Level 3 cost could be shared between Offline Reconstruction if we use the Skin Changing scheme

Conclusion

• Doable• Need R&D effort (funding, manpower)

immediately for:– TPC FEE overhaul– DAQ frontend studies; hardware and software

adaptations– Interconnect/network studies for the FEEDAQ data

transfer as well as DAQL3• Need strong support from the collaboration – the

effort needed is too large to be done in “our spare time”

• We should change the name to SuperSTAR

Documents

Increase the event rate into Level 3 Increase the event rate onto storage