7
ATONR Database Tests – 17th Feb. Schedule and Goals & Results Florbela Viegas, CERN ADP 22/03/22 1

ATONR Database Tests – 17th Feb. Schedule and Goals & Results Florbela Viegas, CERN ADP 18/09/20151

Embed Size (px)

Citation preview

ATONR Database Tests – 17th Feb.Schedule and Goals & Results

Florbela Viegas, CERN ADP

19/04/23 1

19/04/23 2

1. Total ATONR unavailability: DCS test: 2 hour blackout, observe disconnection and reconnection

behaviour. Certify that previous observed problems are solved. Muon tests : Several connect and disconnect cycles. Correct and certify

that problems observed in reconnection are solved.2. ATONR Standby Database Failover

Certify that all programs connect correctly and transparently to the standby

Standby is write-enabled. Certify that no gaps in information are observed:

Rainer Bartoldus will watch and record HLT DCS will watch and certify PVSS archive (no gaps in archive).

3. GPN disconnection test by TDAQ admins , to be made after certifications above are done.

Behaviour of all programs should be observed.

Tests and Goals

19/04/23 3

08h00 – 10h15 ATONR is shutdown. No DB service. 10h15 – ATONR is back up (DCS 2 hour blackout test done). 10h15– 12h00 – Muon tests will be made. IT DBAs will bring up

and down ATONR several times as needed. 12h00 – 12h30 - Full ATONR shutdown. 12h30 – 14h00 - Online Standby will be available. 14h00-14h30 – Online Standby will be shutdown. No DB service 14h30 – ATONR will be available again. End of DB tests. 15h00 – GPN tests may start.

Assumptions: Shutdown of ATONR at 8h00 does not interfere with Muon tests.

All state changes will be logged in P1 ELOG

Proposed Schedule

19/04/23 4

1. On the DAQ side with a run ongoing, we saw that stopless recovery actions for TGC RODs failed while the DB was disconnected, which is not understood since to our expert's statement no parameters are loaded from the DB during this procedure. A

After the DB was back available, things worked back ok. No such behaviour seen for the other muon subdets.

2. MDT Jtag initialization: Failed while the DB was disconnected as expected, since we retrieve parameters from there. We were however glad that the init procedure, this is a custom build DLL using C++ to access the DB we then call from PVSS, did NOT hang but handled correctly the DB unavailability.

3. PVSS CtrlRDBAccess extension, where the main emphasis was on, which we use extensicvely for the MDT and TGC custom Oracle DB write:

--> MDT: Processes reconnected to the DB ok after the DB was back up, writing data resumed without need for a manual intervention in all cases (alignment, bfield, temperature data etc.). Full success.

--> TGC: Unfortunately the situation is not as good, here numerous PVSS processes were left with invalid DB handles which then prevented to reconnect to the DB once it was available again. This has the consequence that any writing of data, eg for TGC HV conditions, to Oracle TGC DB failed until control scripts in PVSS were manually restarted. And, this was not detected by the current TGC DCS alarms. Here is clearly work.

--> command timeout: In the past, we observed that when we lost connection to the DB PVSS control scripts got into a blocking state for the time the database was unavailable; test of a new version of the CtrlRDBAccess extension we use for all PVSS write to Oracle custom tables was succesful, we validated a command timeout of 45 secs on DB insertion to work as expected. This new feature has in the meantime been implemented in all MDT cases incl. the alignment.

--> problems with a DB disconnect leading to control script crashes: this we had observed in the past, the issue has disappeared with their latest current versions of

CtrlRDBAccess we are using.

In total it was a very useful exercise Info by Stephanie Zimmerman

Muon Observations

19/04/23 5

DCS Observations

19/04/23 6

HLT Observations

19/04/23 7

HLT Observations