Upload
lee-bradley
View
214
Download
1
Embed Size (px)
Citation preview
EPICS Archiving Appliance Test at ESS
J. Bobnar, S.Gysin
www.europeanspallationsource.seNovember 25, 2014
2
Goal
• Asses the feasibility of the EPICS Archive Appliance (AA) for European Spallation Source.– Measure performance and compare to requirements– Propose new features for the services
http://epicsarchiverap.sourceforge.net/
3
Requirements: Capacity Planning
Description # records records archived
bytes/record
record/sec
bytes/sec GB/day
Rack estimation (ESS Bilbao Ion source) 28,400 2,840 14.3 1.00 40,612 3.31
SNS (BEAUtY) 340,000 85,000 30 0.02 52,298 4.21
FRIB (estimates) 200,000 200,000 8 0.20 320000 26
SLAC : Archive appliance test : test-arch 102,255 30 0.03 80,406 6.47
Jaka: Medical Accelerator (BEAUtY) 150,000 150,000 30 0.22 994,205 80
LHC logging (MDB) 3,625,990 292
Description # recordsrecords archived bytes/record record/sec GB/day
SNS 340,000 85,000 30 0.02 4.21ESS (2x SNS) 680,000 170,000 30 0.02 8.42
For ESS we decided to double the capacity of SNS:
4
But … there will be spikes in the rate the data is archived
– Waveforms are significantly larger (~5kB/record)– Post Mortem buffers:
• ~15 GB/beam stop• 1 beam stop/hour = 24 beam stops/day = 360 GB/day
(commissioning)
– Data on demand• 10 event/day• 1000 channels• ~2MB per channel per event = 20 GB/day
– EPICS V4 data types
5
Short, Medium and Long term archiving
• Examples: • SLAC Archiver Appliance: 1 hour, 1 day, 1 year• FRIB planned: 1 week, 1 month, forever• LHC – Timber Logging System: MDB: 7 days, LDB > 20 years• SNS Archiving Service: no division• DESY: 1 month, forever
• ESS requirements:• Short term: 10 days (8.4 GB/day) • Medium term: 100 days (20% of short term = 1.9 GB/day)• Long term: forever (20 % of medium term = 0.19 GB/day)
6
Rate of retrieval
• Depends on– The archive rate – Reduction algorithm – Number of clients simultaneously reading data– Hardware
• Retrieval from short term storage– Not slower than 1000 points/sec
7
Test setup
• 2 dedicated machines on a dedicated network, both running CODAC version of the Scientific Linux 4.31. Archive Appliance computer:
• Intel Xeon 8 core (16 threads) CPU, • 16 GB RAM • Solid State Drive
– Performance: ~240 MB/s for reading (random) and ~280MB/s for writing (sequential)
2. ESS Control Box with IOC• 30000 scalar double-type PVs• 200 waveform (aSub) long-type PVs of length 1000• Both at 10 Hz.
• Units: “number of samples per second” – N/s = number of PVs * 10 Hz
8
Test results: Scalars, JVM needs optimal setup
• Adaptive heap memory (-Xms < -Xmx)– 20 000 N/s -> all is well– 30 000 N/s -> event drop rate 0.04% – > 30 000 N/s -> higher drop rate– performance degrader: management of the Java Heap
Memory size by the virtual machine (CPU was at 100 % all the time)
• Fixed heap size (8 GB for the engine):– 100.000 N/s without a problem
9
Test results: Scalars
• Saving 10 seconds worth of data (1M samples) – 2 seconds
• With ETL running (transfer between short and medium term storage)– Between 8 and 11 seconds– Probable Cause:
• The same physical drive was used for the short and medium term storage
10
Test results: Scalars
• Increased the sampling rate to 300,000 N/s• Saving 10 seconds worth of data (3 M samples) – 3.5 and 4 seconds
• However:– Event drops at start up– With ETL running, time increased by an order of
magnitude, and drop rate was very high.• CPU time remained the same• IO seems to be the bottle neck
11
Test results: wave forms
• 200 PVs of length 1000 at 10 Hz– 2000 N/s, 1N ≈ 8kB
• Saving 10 seconds worth of data– 200 and 300 milliseconds
• When ETL was running the time increased to 1 sec• Archiving the same amount of data but in a
waveform is 15 times faster than in scalar PVs -> number of PVs matter.
12
Test results: rate of retrieval scalars
• Data stored:– 100.000 N/s– 8 hours 54 GB– Short term: 2 files for the last hour– Medium term: 1 file for the rest
• Retrieval rate:– Short intervals (minutes; less than 800 data points available)
• 100 – 150 ms
– Longer intervals (hours; more than 800 data points available)• 200 – 400 ms
– Even longer intervals (1 day, 2 days)• 700 – 800 ms, ~1500 ms
– No problems with large number of PVs (file fragmentation)
13
Test results: rate of retrieval waveforms
• Retrieval rate:– 1 hour interval (reduction: 36000 -> 800 samples)
• ~ 3500 ms• Every additional hour adds approximately 3000 ms
– 1 day interval (reduction: 864000 -> 800 samples)• > 1 min
• Room for improvement in reduction algorithm and in the client
More tests planned with longer acquisition period.
14
Conclusion
• SNS archives 0.02 samples per second per PV. At 80.000 archived PVs that means 1600 N/s.
• One EPICS Archiver Appliance: can archive 100.000 N/s which is 60-times more.
• To reduce retrieval time we recommend running several instances of AA and distribute the PVs among them
• The retrieval rate (for scalars) is good and meets the requirements:– for most common time interval (i.e. 1 day or less) < 1 second.
We also have a list of recommendation for AA and for the AA users. To be published after completion of the tests.