Upload
rosalind-horn
View
212
Download
0
Embed Size (px)
Citation preview
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 1
O2 Project : Status Report
Pierre VANDE VYVRE
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 2
O2 ProjectRequirements
DetectorInput to Online System
(GByte/s)
Peak Output to Local Data Storage
(GByte/s)
Avg. Output to Computing
Center (GByte/s)
TPC 1000 50.0 8.0
TRD 81.5 10.0 1.6
ITS 40 10.0 1.6
Others 25 12.5 2.0
Total 1146.5 82.5 13.2
- Handle >1 TByte/s detector input- Produce (timely) physics result- Online reconstruction to reduce data volume- Minimize “risk” for physics results- Common hw and sw system developed by the
DAQ, HLT, Offline teams
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 3
O2 Project
• PLs: P. Buncic, T. Kollegger, P. Vande Vyvre
Computing Working Group (CWG) Chair
1. Architecture S. Chapeland
2. Tools & Procedures A. Telesca
3. Dataflow T. Breitner
4. Data Model A. Gheata
5. Computing Platforms M. Kretz
6. Calibration C. Zampolli
7. Reconstruction R. Shahoyan
8. Physics Simulation A. Morsch
9. QA, DQM, Visualization B. von Haller
10. Control, Configuration, Monitoring V. Chibante
11. Software Lifecycle A. Grigoras
12. Hardware H. Engel
13. Software framework P. Hristov
Project Organization
O2
TechnicalDesignReport
O2 CWGs
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre
Hardware Architecture
4
2 x 10 or 40 Gb/s
FLP
10 Gb/s
FLP
FLP
ITS
TRD
Muon
FTP
L0L1
FLPEMC
FLPTPC
FLP
FLPTOF
FLPPHO
Trigger Detectors
~ 2500 DDL3sin total
~ 250 FLPsFirst Level Processors
EPN
EPN
DataStorage
DataStorage
EPN
EPN
StorageNetwork
FarmNetwork
10 Gb/s
~ 1250 EPNsEvent Processing Nodes
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 5
CWG1 : ArchitectureStatus & plans
• Working on the TDR
• Architecture is converging– Now includes asynchronous processing and offloading to other
sites– Discussing requirements and interfaces to DCS system and
values
• Some requirements needs refining to scale
the system (in particular the data storage)– Run 3 operating mode (e.g. running time per year)– Physics simulation needs– Will be addressed with Andrea Dainese (Physics requirements)
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 6
CWG1 : ArchitectureO2 Architecture and data flow
• Includes now the intermediate data storage.
• Second global processing step possibly not on EPN
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 7
CWG1 : ArchitectureAsynchronous and iterative processing
(and/or offloading sites)
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 8
CWG2 – Tools, guidelines and procedures
• Activities started in March 2013
• Evaluation procedure completed and approved
• Proposed tools for the organization of the working groups
• Reports and presentations templates created
• Tools and policies identified and
assigned to CWG2 or other CWGs
• Tools evaluations:– Issue tracking systems → JIRA proposed and accepted– Version control systems → Git proposed and accepted– Website creation tools → Drupal proposed and accepted
• C++ Coding conventions– Naming and Formatting → circulated and accepted– Coding guidelines → circulated and under discussion
Status report: achievements
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 9
• Ongoing activities– Tools evaluations:
• Code and API documentationAn update of the coding conventions for the comments will then be needed
• Future plans– Policies:
• Licensing (Copyright and distribution of ALICE O2 software)
– Tool to help following the coding conventions in collaboration with CWG11
CWG2 – Tools, guidelines and proceduresStatus report: ongoing activities and future plans
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 10
CWG3 : DataflowData-flow Simulation Setup
• Current focus on FLP- EPN data-flow
• Implemented w/ OMNET++, using full TCP/IP simulation
• Heavy computing needs (weeks for some of the simulation)– Downscaling applied for some simulations:
• Reduce network bandwidth and buffer sizes and check• Simulate a slice of the system
• Simulation scenarios:– Different topologies (central switch; spine-leaf)– Different network bandwidths (10 Mb/s – 40 Gb/s)– Different levels of detail (many-to-one, many-to-many, few nodes up to full
scale)– Different data distribution schemes (single vs. multiple time frames, level of
parallelism )
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 11
CWG3 : DataflowSimulation Output
• Many parameters/metrics under
investigation– TCP/IP parameters (link sharing,
congestion, router buffers, latency and throughput)
– FLP and EPN buffer requirements
• Preliminary results look promising– Network traffic under control with
available technology (e.g. 40Gb/s)– FLP/EPN buffer requirements
reasonable (i.e. affordable)
• Issues:– Many iterations (parameter variations)
required– Long simulation time (hours to day(s)
depending on setup/level of detail)
S. ChapelandC. Delort
40 Gbps250x1250
40 Mbps250x288
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 12
CWG3 : DataflowSystem Dimensioning
• FLP- EPN network
(Two Layers Switch Design,
Non-blocking configuration)
• Ad-hoc programs
• Dimension and optimize
I. Legrand
Nu
mb
er
of
switc
he
sNumber of ports
24 Ports Switch
32 Ports Switch
36 Ports Switch
48 Ports Switch
Ma
xim
um
Nu
mb
er
of
No
de
s C
on
ne
cte
d
No
of
Po
rts
for
the
Co
re S
witc
he
s
Maximum number of connected nodes for a two layers system
No of Ports for the Edge Switches
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 13
CWG3 : DataflowNext Steps
• Realistic data size and time distributions
• Dynamic changes in network (e.g. H/W failure)
• Different technologies (e.g. Infiniband)
• Different data distribution algorithms (e.g. Pull, traffic shaping)
• Investigate buffer usage in more detail
• Optimize network throughput, minimize overall cost
• Lab verification:– Small scale setup to verify simulation results– HLT development cluster available for prototyping/tests– ~70 nodes w/ ~1000 cores, 1Gb/s Ethernet, Infiniband QDR
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 14
CWG4 – Data modelTime frame based data model
• The group proposes a time frame - based data model to:– Formalize the access to data types produced by both detector FEE and data
processing stages by prepending a generic Multiple Data Header– Provide strict memory management while minimizing the need for copying data for
processing purposes (data service instead of “copy around”)– Use efficient data layouts allowing for fast navigation among data types and sources
and usage of data from vectorized algorithms
• Ongoing investigation and prototyping of efficient AOD formats– Flat vs. hierarchical object structures and the impact on processing speed and data
compression– Investigation on I/O and compression and the output of synchronous reconstruction to
be discussed with CWG7 (reconstruction)
• Future work: integration simulation and benchmark– Realistic raw time frame simulation (CWG8) + time frame aggregation (CWG4) + FLP to
EPN flow (CWG3) + concurrency model and platforms (CWG5) down to EPN reconstruction
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 15
• All data blocks produced by both FEE cards or arbitrary processing tasks on FLP (e.g. cluster finding) to be described as generic MDB blocks. A MDH is foreseen to point to several correlated “events” coming asynchronously on different links on the same FLP.
• Processing of MDB blocks is transparent to the node type (FLP, EPN)• EPN’s will process MDB blocks but not required to produce MDB at their turn
but rather the persistent event format.
CWG4 – Data modelThe new generic data block
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 16
The time frame data
• Time frames start and end with O2 “heartbeat” MDH (events) and embed all data blocks collected by a given FLP. The corresponding frames will have to be aggregated on a EPN node in a folder-like structure easy to browse by reconstruction algorithms. The fast (synchronous) persistent reconstruction format will have to achieve the required overall compression.
• Note that the HBE summary may be attached to the “end HBE” to allow for asynchronous dispatching of blocks before the frame is fully aggregated by the FLP
CWG4 – Data modelThe time frame data
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 17
CWG6 : Calibration
• Minimum requirement for online calibration: safe data reduction– Factor ~20 for TPC, relying on (first level) calibration and (standalone or global)
reconstruction
• Calibration processes should deal with new data flow– FLPs: see all events, for part of a detector– EPNs: see only time-frames, for all detectors
• Identification (detector-by-detector) of the procedures to be run on FLPs
and/or EPNs ongoing– Calibration input– CPU requirements– Memory requirements– Detector interdependencies– Statistics (including handling of Time Frame data format)
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 18
CWG6 : Calibration
• Calibrations asynchronously produced (i.e. with a process external wrt data
taking) could be needed especially in case high statistics required– To be used at analysis level
• Evaluation of different scenarios for online (output needed as data
come)/quasi online (data processed with some delay wrt data taking, but
output available by the beginning of the next fill)/ offline (only data
reduction performed online)– Includes evaluation of possibility to reserve dedicated machines to calibrations for
which a fast feedback is needed
• Equivalent of OCDB to be defined (together with CWG3)– Time dependent calibrations (following time frames)– Synchronization
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 19
CWG7 : Reconstruction
• Different scenarios of reconstruction flow (depend on speed of different processes):– minimal - to insure data storage– maximal – physics analysis grade reconstruction
At the moment only rough estimates of some components timing is possible
• ITS– finalizing code for detailed implementation of upgrade geometry, global tracking adapted– work on implementation of ITS standalone tracker– preliminary schema of clusterization and cluster data compression
to be finalized once pixel chip architecture will be defined
• TPC– schematic understanding of reconstruction, calibration process
• TRD– inter-dependencies with TPC are understood, need to verify reliability of online tracklets
The status of calibration reconstruction is summarized in the CWG6/7 joint “Conceptual Design Note”, to be converted to the chapter in the TDR
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 20
Calibration/reconstruction flow
20
Raw data
ClusterizationCalibration
TPC track finding
ITS track finding/fittin
gVertexing
Standalone
MFT track finding/fittin
g
MOUN track finding/fittin
g
…
TRD seededtrack finding
and matching with TPC
Compressed data storage
Final TPC calibration
(constrainedby ITS, TRD)
TPC-ITSmatching
Matching toTOF, HMPID,calorimeters
Final ITS-TPC
matching,outward refitting
MUON/MFT matching
Global trackinward fitting
V0, Cascadefinding
Event building:(vertex, track, trigg
association)
AOD storage
All FLPs
One EPN
MC Reference TPC map
Adjusted accounting for current luminosity
Average TPC map
FIT multiplicity
Rescaled TPC map
Adjusted with multiplicity
PID calibrations
DCS data
Step 1 Step 2 Step 3 Step 4
Exact partitioning of some components between real-time, quasi-online and offline processing depends on (unknown) component CPU performance
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 21
Most critical problem: TPC SCD calibrationCloser Look to ITS-TPC-TRD
• Current understanding of TPC-TRD dependency: TRD T0 calibration is enough for TRD track finding with optimal position resolution
calibration performed on FLPs using position of pulse start TPC standalone tracking with rescaled “average” SCD map correction is sufficient
for seeded track finding in the TRD Constrains from TRD and ITS tracks matched to TPC are enough for SCD
fluctuations calibration (at ~200 Hz rate) TRD Vdrift and ExB calibration (used for PID only) is done using finally
refitted TPC (+ITS) tracks
Raw data
TRD T0 calibration
TPC vdrift + track finding
ITS track finding/fitting
Vertexing
Standalone
TRD track finding with online tracklets and seeding from TPC
Final TPC calibration ofSCD fluctuations
(constrains by ITS, TRD)TPC-ITS matching
All FLPsTRD vdrift and ExB
calibration
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 22
CWG8: Physics Simulation
Geant4 v10 Physics Validation Central production Validation by physics observables
Tests with multi-threading Short term (within a few months)
Performance tests with Geant4 VMC 3.00 + ALICE geometry Long term (next 1-2 years)
MT tests with AliRoot Requires migration of AliRoot VMC Application
Fast Simulation Framework Full and parameterized First prototype autumn 2014
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 23
CWG8: Geant4 VMC 3.00
First Geant4 VMC version providing support for Geant4 multi-threading
mode Beta version (3.00.b01) released on 14 March 2014 By I. Hrivnacova, IPNO (CNRS/IN2P3, Univ-Paris-Sud), with participation
of A. Gheata, CERN (migration of G4ROOT)
Single source code for both sequential and multi-threading modes VMC applications which were not migrated for MT can be built and run with the
same Geant4 VMC as migrated applications
MT mode is activated automatically when Geant4 VMC is built against
Geant4 MT libraries
All (5) VMC examples were migrated to MT and can be run in this mode
both with Geant4 native and Root navigation
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 24
CWG8: Geant4 VMC 3.00 (2)
New set of classes for ROOT IO management which take care of locking critical
operations (registering ROOT object to trees etc.) is introduced in new mtroot
package http://root.cern.ch/drupal/content/mtroot
The instructions for migration VMC applications to MT are available from VMC Web
site: http://root.cern.ch/drupal/content/multi-threaded-processing
Besides MT, there are added VMC application main programs together with CMake
configuration files which allow to run VMC without dynamic loading of libraries This allowed to evaluate a performance penalty due to use of shared libraries a dynamical
loading on the VMC tests The penalty of dynamic loading of shared libraries vs. static was ~12 % in sequential and
~22 % in multi-threading mode
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 25
CWG9 – QA, DQM & VisualizationStatus & Plans
• Run 2 – Event Display review and refactoring
• Status : Ongoing • Responsibility under transfer to Warsaw Group • Meetings and demo of new architecture + collaboration
with HLT on new communication protocol• PHD student from Warsaw to join in April
– Proposal for the online reconstruction and calibration
• Status : Started, on hold• Preliminary architecture
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 26
CWG9 – QA, DQM & VisualizationStatus and plans
• Run 3– System requirements and system functionalities document
• Status : Done
– Detectors needs survey • Status : Ongoing, almost finished
– Definition of the future architecture and design • Status : Ongoing
– Prototypes and feasibility tests • Status : To be done in the near future
– Technical Design Report redaction • Status : Ongoing
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 27
CWG10 – Control, Configuration and Monitoring
• Activities started in April 2013
• Software Requirements Specifications– https://twiki.cern.ch/twiki/pub/ALICE/Cwg10/CWG10SoftwareRequirementsSpecifications.pdf
• Requirements (Number of processes)– https://twiki.cern.ch/twiki/pub/ALICE/Cwg10/NumberOfProcessesEstimate.pdf
• Ongoing activities– Writing TDR content: chapter 4
• First draft almost finished
• Future plans– Continue writing TDR (Chapters 5 and 6)– Prototypes for key performance requirements
• Number of control commands, monitoring data volume, configuration distribution
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 28
CWG10 – Control, Configuration and MonitoringRoles Hierarchy
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 29
CWG5: Computing Platforms
• Speedup of CPU Multithreading:– Task takes n1 seconds on 1 core, n2 seconds on x cores Speedup is n1/n2 for x cores, Factors are n1/n2 and x/1
– With Hyperthreading: n2‘ seconds on x‘ threads on x cores. (x‘ >= 2x) Will not scale linearly, needed to compare to full CPU performance.
• Factors are n1 / n2‘ and x / 1 (Be carefull: Not x‘ / 1, we still use only x cores.)
• Speedup of GPU v.s. CPU:– Should take into account full CPU power (i.e. all cores, hyperthreading).– Task on the GPU might also need CPU resources.
• Assume this occupies y CPU cores.– Task takes n3 seconds on GPU.– Speedup is n2‘/n3, Factors are n2‘/n3 and y/x. (Again x not x‘.)
• How many CPU cores does the GPU save:– Compare to y CPU cores, since the GPU needs that much resources.– Speedup is n1 / n3, GPU Saves n1 / n3 – y CPU cores.
Factors are n1 / n3, y / 1, and n1 / n3 - y.
• Benchmarks: Track Finder, Track Fit, DGEMM (Matrix Multiplication – Synthetic)
The Conversion factors
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 30
CWG5: Computing PlatformsTrack finder
Westmere 6-Core 3.6 GHz
1 Thread 4735 ms Factors:
6 Threads 853 ms 5.55 / 6
12 Threads (x = 4, x‘ = 12) 506 ms 9,36 / 6
Nehalem 4-Core 3,6 GHz (Smaller Event than others)
1 Thread 3921 ms Factors:
4 Threads 1039 ms 3,77 / 4
12 Threads (x = 4, x‘ = 12) 816 ms 4,80 / 4
Dual Sandy-Bridge 2 * 8-Core 2 GHz
1 Thread 4526 ms Factors:
16 Threads 403 ms 11,1 / 16
36 Threads (x = 16, x‘ = 36) 320 ms 14,1 / 16
Dual AMD Magny-Cours 2 * 12-Core 2,1 GHz
36 Threads (x = 24, x‘ = 36) 495 ms
3 CPU Cores + GPU – All Compared to Sandy Bridge System
Factor vs x‘ (Full CPU) Factor vs 1 (1 CPU Core)
GTX580 174 ms 1,8 / 0,19 26 / 3 / 23
GTX780 151 ms 2,11 / 0,19 30 / 3 / 27
Titan 143 ms 2,38 / 0,19 32 / 3 / 29
S9000 160 ms 2 / 0,19 28 / 3 / 25
S10000 (Dual GPU with 6 CPU cores 85 ms 3,79 / 0,38 54 / 6 / 48
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 31
CWG12 – Computing HardwareFLP I/O Bandwidth
• I/O bus performance
• PCIe Gen2 x 8– Using C-RORC as data generator– ASUS ESC4000: > 3 GB/s per slot– With TPC tracking code running on GPUs
Total I/O of 17 GB/s
• PCIe Gen3 x 8: – Xilinx Virtex-7 XC7VX330T as Data generator– Supermicro X9SRE-F: ~ 5-6 GB/s per slot
• Current generation of I/O bus could
be used for the upgrade
H. Engel
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 32
CWG13 Software Framework Development
• CWG13 just starting
• Design and development of a new modern framework targeting Run3
• Should work in Offline and Online environment– Has to comply with O2 requirements and architecture
• Based on new technologies– Root 6.x, C++11
• Optimized for I/O– New data model
• Capable of utilizing hardware accelerators– FPGA, GPU, MIC…
• Support for concurrency and distributed environment
• Will be based on ALFA - common software foundation developed jointly between
ALICE & GSI/FAIR
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 33
CWG13 Software Framework Development
• CWG13 just starting
• Design and development of a new modern framework targeting Run3
• Should work in Offline and Online environment– Has to comply with O2 requirements and architecture
• Based on new technologies– Root 6.x, C++11
• Optimized for I/O– New data model
• Capable of utilizing hardware accelerators– FPGA, GPU, MIC…
• Support for concurrency in an heterogeneous
and distributed environment
• Will be based on ALFA - common software foundation
jointly developed between ALICE & GSI/FAIR
ALFACommon Software Foundations
O2
SoftwareFramework
FairRoot
PandaRoot
CbmRoot
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 34
CWG13 Software Framework DevelopmentALICE + FAIR = ALFA
• Expected benefits– Development cost optimization– Better coverage and testing of the code– Documentation, training and examples.– ALICE : work already performed by the FairRoot team
concerning features (e.g. the continuous read-out), which are part of the ongoing FairRoot development.
– FAIR experiments : ALFA could be tested with real data and existing detectors before the start of the FAIR facility.
• The proposed architecture will rely on a data-flow
based model.
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 35
O2 TDR Editorial Committee
• Members- Latchezar Betev - Predrag Buncic- Sylvain Chapeland- Frank Cliff - Peter Hristov- Thorsten Kollegger- Ken Read- Jochen Thaeder- Barth von Haller- Pierre Vande Vyvre
• Physics requirement chapter: Andrea Dainese
• General structure, ToC and tools defined.
• Next meetings– 1st April– 5-7 May TDR working days
• All the WGs are working on their respective sections of the TDR
ALICE Plenary | March 24, 2014 | Pierre Vande Vyvre 36
O2 ProjectInstitutes
• Institutes– FIAS, Frankfurt, Germany– IIT, Mumbay, India– Jammu University, Jammu, India– IPNO, Orsay, France– IRI, Frankfurt, Germany– Rudjer Bošković Institute, Zagreb, Croatia– SUP, Sao Paulo, Brasil– University Of Technology, Warsaw, Poland– Wiegner Institute, Budapest, Hungary– CERN, Geneva, Switzerland
• Looking for more groups and people– Need people with computing skills and from detector groups
• Active interest from– Creighton University, Omaha, US– KISTI, Daejeon, Korea– KTO Karatay University, Turkey– Lawrence Berkeley National Lab., US– LIPI, Bandung, Indonesia– Oak Ridge National Laboratory, US– University of Houston, US– University of Texas, US– Wayne State University, US– King Mongkut's University of Technology Thonburi, Bangkok, Thailand