14
EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin www.europeanspallationsource.se November 25, 2014

EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin November 25, 2014

Embed Size (px)

Citation preview

Page 1: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

EPICS Archiving Appliance Test at ESS

J. Bobnar, S.Gysin

www.europeanspallationsource.seNovember 25, 2014

Page 2: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

2

Goal

• Asses the feasibility of the EPICS Archive Appliance (AA) for European Spallation Source.– Measure performance and compare to requirements– Propose new features for the services

http://epicsarchiverap.sourceforge.net/

Page 3: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

3

Requirements: Capacity Planning

Description # records records archived

bytes/record

record/sec

bytes/sec GB/day

Rack estimation (ESS Bilbao Ion source) 28,400 2,840 14.3 1.00 40,612 3.31

SNS (BEAUtY) 340,000 85,000 30 0.02 52,298 4.21

FRIB (estimates) 200,000 200,000 8 0.20 320000 26

SLAC : Archive appliance test : test-arch 102,255 30 0.03 80,406 6.47

Jaka: Medical Accelerator (BEAUtY) 150,000 150,000 30 0.22 994,205 80

LHC logging (MDB) 3,625,990 292

Description # recordsrecords archived bytes/record record/sec GB/day

SNS 340,000 85,000 30 0.02 4.21ESS (2x SNS) 680,000 170,000 30 0.02 8.42

For ESS we decided to double the capacity of SNS:

Page 4: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

4

But … there will be spikes in the rate the data is archived

– Waveforms are significantly larger (~5kB/record)– Post Mortem buffers:

• ~15 GB/beam stop• 1 beam stop/hour = 24 beam stops/day = 360 GB/day

(commissioning)

– Data on demand• 10 event/day• 1000 channels• ~2MB per channel per event = 20 GB/day

– EPICS V4 data types

Page 5: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

5

Short, Medium and Long term archiving

• Examples: • SLAC Archiver Appliance: 1 hour, 1 day, 1 year• FRIB planned: 1 week, 1 month, forever• LHC – Timber Logging System: MDB: 7 days, LDB > 20 years• SNS Archiving Service: no division• DESY: 1 month, forever

• ESS requirements:• Short term: 10 days (8.4 GB/day) • Medium term: 100 days (20% of short term = 1.9 GB/day)• Long term: forever (20 % of medium term = 0.19 GB/day)

Page 6: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

6

Rate of retrieval

• Depends on– The archive rate – Reduction algorithm – Number of clients simultaneously reading data– Hardware

• Retrieval from short term storage– Not slower than 1000 points/sec

Page 7: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

7

Test setup

• 2 dedicated machines on a dedicated network, both running CODAC version of the Scientific Linux 4.31. Archive Appliance computer:

• Intel Xeon 8 core (16 threads) CPU, • 16 GB RAM • Solid State Drive

– Performance: ~240 MB/s for reading (random) and ~280MB/s for writing (sequential)

2. ESS Control Box with IOC• 30000 scalar double-type PVs• 200 waveform (aSub) long-type PVs of length 1000• Both at 10 Hz.

• Units: “number of samples per second” – N/s = number of PVs * 10 Hz

Page 8: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

8

Test results: Scalars, JVM needs optimal setup

• Adaptive heap memory (-Xms < -Xmx)– 20 000 N/s -> all is well– 30 000 N/s -> event drop rate 0.04% – > 30 000 N/s -> higher drop rate– performance degrader: management of the Java Heap

Memory size by the virtual machine (CPU was at 100 % all the time)

• Fixed heap size (8 GB for the engine):– 100.000 N/s without a problem

Page 9: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

9

Test results: Scalars

• Saving 10 seconds worth of data (1M samples) – 2 seconds

• With ETL running (transfer between short and medium term storage)– Between 8 and 11 seconds– Probable Cause:

• The same physical drive was used for the short and medium term storage

Page 10: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

10

Test results: Scalars

• Increased the sampling rate to 300,000 N/s• Saving 10 seconds worth of data (3 M samples) – 3.5 and 4 seconds

• However:– Event drops at start up– With ETL running, time increased by an order of

magnitude, and drop rate was very high.• CPU time remained the same• IO seems to be the bottle neck

Page 11: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

11

Test results: wave forms

• 200 PVs of length 1000 at 10 Hz– 2000 N/s, 1N ≈ 8kB

• Saving 10 seconds worth of data– 200 and 300 milliseconds

• When ETL was running the time increased to 1 sec• Archiving the same amount of data but in a

waveform is 15 times faster than in scalar PVs -> number of PVs matter.

Page 12: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

12

Test results: rate of retrieval scalars

• Data stored:– 100.000 N/s– 8 hours 54 GB– Short term: 2 files for the last hour– Medium term: 1 file for the rest

• Retrieval rate:– Short intervals (minutes; less than 800 data points available)

• 100 – 150 ms

– Longer intervals (hours; more than 800 data points available)• 200 – 400 ms

– Even longer intervals (1 day, 2 days)• 700 – 800 ms, ~1500 ms

– No problems with large number of PVs (file fragmentation)

Page 13: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

13

Test results: rate of retrieval waveforms

• Retrieval rate:– 1 hour interval (reduction: 36000 -> 800 samples)

• ~ 3500 ms• Every additional hour adds approximately 3000 ms

– 1 day interval (reduction: 864000 -> 800 samples)• > 1 min

• Room for improvement in reduction algorithm and in the client

More tests planned with longer acquisition period.

Page 14: EPICS Archiving Appliance Test at ESS J. Bobnar, S.Gysin  November 25, 2014

14

Conclusion

• SNS archives 0.02 samples per second per PV. At 80.000 archived PVs that means 1600 N/s.

• One EPICS Archiver Appliance: can archive 100.000 N/s which is 60-times more.

• To reduce retrieval time we recommend running several instances of AA and distribute the PVs among them

• The retrieval rate (for scalars) is good and meets the requirements:– for most common time interval (i.e. 1 day or less) < 1 second.

We also have a list of recommendation for AA and for the AA users. To be published after completion of the tests.