51
Machine Availability and System Reliability at RHIC WAO-07 Trieste, September 24-28 2007 Fulvia Pilat

Machine Availability and System Reliability at RHIC

Embed Size (px)

DESCRIPTION

Machine Availability and System Reliability at RHIC. Fulvia Pilat. WAO-07 Trieste, September 24-28 2007. RHIC performance. Delivered luminosity increased by >2 orders of magnitude in 6 years. Delivered per run to PHENIX. FOM= LP 4. Enhanced Design Parameters. - PowerPoint PPT Presentation

Citation preview

Machine Availability and System Reliability at RHIC

WAO-07 Trieste, September 24-28 2007

Fulvia Pilat

WAO-07 Fulvia Pilat

RHIC performance

Delivered per runto PHENIX.

Delivered luminosity increased by >2 orders of magnitude in 6 years.

FOM=LP4

WAO-07 Fulvia Pilat

Enhanced Design Parameters

Calendar time in store affects ability to project performance.

WAO-07 Fulvia Pilat

Enhanced Design Parameters (~2009)

Parameter unit Achieved Enhanced design

Au-Au operationsEnergy GeV/n 100 100

No of bunches … 103 111

Bunch intensity 109 1.1 1.0

Average L 1026cm-2s-

112 8

p- p operationsEnergy GeV 100 100 (250)

No of bunches … 111 111

Bunch intensity 1011 1.4 2.0

Average L 1030cm-2s-

120 60

(150)Polarization P % 60 70

3x

+10%

goalexceeded

WAO-07 Fulvia Pilat

Enhanced and RHIC-II luminosity

Electron orStochasticcooling

WAO-07 Fulvia Pilat

Time at store: trend and goal

Trend

Goal: back to mid 50% in Run-8 60% time at store in Run-9

WAO-07 Fulvia Pilat

OutlineOperation stats, performance

Factors determining time at storeMachine development (short term investment)

APEX: Accelerator Physics EXperiments program (longer term investment)

Scheduled Maintenance talk Sampson todayMachine set-upSystems downtime and failureMode of operation: “pushing the envelope”

WAO-07 Fulvia Pilat

RHIC Retreat 2007-July 16-17Session on Availability and Reliabiliy

11:00 (15) Pilat Introduction11:15 (25) Ingrassia Operations and Uptime11:40 (20) Kling Turn-around time12:00 (20) Sampson Maintenance models,

organization12:20 (10) Discussion 2:00 (15) Ahrens RHIC abort system2:15 (15) Zhang, Wu Pulsed power systems2:30 (30) Bruno Power supplies3:00 (30) Sandberg Electrical systems3:30 (30) Zaltsman RF: RHIC and injectors4:30 (15) Oerter Controls, hardware4:45 (15) Morris Controls, software5:00 (15) Reich Access controls5:15 (15) Russo BPM, IPM, BBQ in operations5:30 (15) Tuozzolo Cryogenic system5:45 (15) Mapes Vacuum systems

WAO-07 Fulvia Pilat

60% goal

M M

M

M

M

M

WAO-07 Fulvia Pilat

WAO-07 Fulvia Pilat

Failure FlavorsCharged – threshold for log is 6 minutes or more Failure hours that impact the program -- charged to

one OR MORE systems during a failure period. Simultaneous failures result in charged hours less than actual hours

Actual – Severe Duration of a failure that impacts the program often

LONGER than the hours charged.

Actual – Mild Failure that does not impact the program e.g. 1 of 10

AGS Rf Stations trip. Hours recorded but not “charged”

Resets – threshold for log is less than 6 minutes

WAO-07 Fulvia Pilat

“Top 10” Failures by Group & by RunFY07 FY07 FY06 FY06 FY05 FY05 FY04 FY04

R AN K H O U R S R AN K H O U R S R AN K H O U R S R AN K H O U R S

PS_R H IC 1 186.8 1 94.6 1 78.15 2 85.5R f 2 106.9 6 39.9 3 67.8 3 79.6

C ryoge nic 3 92.6 7 41 5 66Pulse dPowe r 4 58.8 8 33 5 43.3 8 32.1

E le ctricalSe rv ice 5 58.7 7 34.2 6 34.8C ontrols 6 39.1 2 67.2 2 69.2 1 134.9

ES& FD _AtR & Expe rime nt 7 38.1 4 49 4 46.5 9 30Acce ssC ontrols 8 36.1 5 43.9 9 32 10 21.7

Q ue nchProte ction 9 31.6 6 42.6 7 34Se rv ice s Wate r 10 23.1 11 22.8 11 20.4

H umanError 11 23 9 29.9 8 32.5

Actual Actual R e se ts (h) R atioSyste m C harge d (h) Se v e re (h) M ild (h) R e se ts (#) @ 3 min pe r Actual/C harge d

PS_R H IC 187 236 5 15 1 1.26R f 107 216 272 44 2 2.02

Pulse dPowe r 59 80 15 70 4 1.36C ontrols 39 70 39 303 15 1.79

ES& FD _AtR & Expe rime nt 38 53 51 33 2 1.39Acce ssC ontrols 36 40 25 ~0 0 1.11

WAO-07 Fulvia Pilat

WAO-07 Fulvia Pilat

WAO-07 Fulvia Pilat

WAO-07 Fulvia Pilat

WAO-07 Fulvia Pilat

Operations Planned Improvements

Multiple Failure, often simultaneous CAS (tech support on shift – 2 now) needs helpTrain Siemens Watch for LOTO

Together with MCR Operators they can perform LOTO when CAS is busy

Get Operators into the field Train Operators to (only) reset “accelerator” power

supplies

OC instructed to call in help for CAS when CAS is making a repair AND another system goes down.OC instructed to call in help from two groups with knowledge of the equipment when the cause of a problem is not clear

WAO-07 Fulvia Pilat

OutlineOperation stats, performance

Factors determining time at storeMachine development (short term investment)

APEX: Accelerator Physics EXperiments program (longer term investment)

Scheduled Maintenance talk Sampson todayMachine set-upSystems downtime and failureMode of operation: “pushing the envelope”

WAO-07 Fulvia Pilat

Turn around time

WAO-07 Fulvia Pilat

WAO-07 Fulvia Pilat

WAO-07 Fulvia Pilat

OutlineOperation stats, performance

Factors determining time at storeMachine development (short term investment)

APEX: Accelerator Physics EXperiments program (longer term investment)

Scheduled Maintenance talk Sampson todayMachine set-upSystems downtime and failureMode of operation: “pushing the envelope”

WAO-07 Fulvia Pilat

Input from systems

Maintenance, set-up and turn-around time, modes of operations all affect the availability but the main factor is system failure. In Retreat presentations please focus on the reliability of your system and think critically about ways to improve it. I would ask each of you to discuss a plan - including timelines and necessary funding - to increase your system reliability. This is an important input towards an integrated plan to improve time at store to be discussed at the Retreat and implemented thereafter.

WAO-07 Fulvia Pilat

After the Retreat reliability

Review Retreat information on operations, maintenance and systemsPrioritize actions – especially systems improvements for reliabilityAnalyze aging infrastructure, systemsUse the recently revisited “Trouble Report Committee” as input and advice on system reliability

WAO-07 Fulvia Pilat

RHIC PS Performance Stats Average RHIC PS Failure Hours/Week

MTBF of RHIC due to any PS Failure

MTBF of an individual PS Failure

WAO-07 Fulvia Pilat

Leading Causes of PS Down Time in Hours

IR - Dynapowers 42.4

Main p.s.’s 36

IR p.s.’s – SCE 150’s 26.7

6000A Quench Switches 20.6

IR p.s.’s – SCE 300’s 16.2

Quench Detectors 14.6

Node Cards 6

Correctors 5.8 success story

Ground Fault 5.1

QPA’s 4.55

New Sextupole p.s.’s 4.5

Bypass chassis 0.3

WAO-07 Fulvia Pilat

Power Supply System Priorities Bipolar 150A, 300A p.s.’s Phase 1 QPA’ s (Quench protection assemblies) Main dipole and quadrupole PS Investigate yellow quad bus ground

fault Improving Dynapower PS cooling Quench detector cleaning and fan

replacements Air Conditioning (for air quality and

temperature)

WAO-07 Fulvia Pilat

Expected MTBF in Run 8?

Run 5 = 30.79 hoursRun 7 = 14.75 hoursRemove 3 major problems from Run 7 = 40 hours

WAO-07 Fulvia Pilat

Run

Power System Failure

Hrs

Total Failur

eHrs

5 15 694

6 26* 700

7 45 881

* excluding arc flash event

Electrical Systems

WAO-07 Fulvia Pilat

Most Significant Causes of ES Downtime-Run # 7

Location Hours Events Equipment

1004 A 18.3 2 Switch & 208 Volt CB

1000 P 15.3 multiple Switch & Circuit Breaker

914 13.8 1 Switch

929 9.5 1 Cooling Tower Fan Motor

4 areas responsible for 90% of downtime in Run-7

WAO-07 Fulvia Pilat

• 18 Electricians Assigned to C-AD this Summer vs. 6 last year

• On going Thermal Inspection of Switches

• Use of torque Wrenches Instituted

• Better understanding of Thermal Effects

• Replace 1000 P 13.8 kV Switches

• Replace Trip units 1000 P Substation

• Replace Switchgear in 914

• Maintenance BMMPS CB’s

Electrical systems: Steps being taken

WAO-07 Fulvia Pilat

Electrical systems: Steps being taken- cont’

•Continuation of Arc Flash Calculations

•Connecting RHIC Bard A/C Units through Isolation Transformers

•21 New Alcove UPS’ s

•8 year Program to improve Electrical Infrastructure ($ 9 million)

•Open Slot for New Power Engineer

WAO-07 Fulvia Pilat

1. Power Dips 8 in Run-6, 6 in Run-7

2. Response to 1006 Arc Flashalmost done

3. 1004 B CB Problem

4. AMMPS Transformer Replacement

ES: Top Concerns from last Year’s Retreat

Additional Steps to Improve Availability

•Increase the number of assigned electricians

•Centralize Spare Parts Location

•Increase Spares Inventory

this shutdown

WAO-07 Fulvia Pilat

RF system: Performance

Number of systems:Booster: 2 AGS: 11 RHIC: 16 Charged failure hours:Booster: 7 AGS: 39 RHIC: 65Actual failure hours:Severe: 216 Mild: 272

Factor affecting the system performance in RHIC RF: beam loading (more than double total intensity than in Run-4).

(Example: large debunching at rebucketing time, losses and beam dumps). Took time to understand and mitigate the beam loading effects.

WAO-07 Fulvia Pilat

07 Gold Bunch Merge

WAO-07 Fulvia Pilat

RF - IMPROVEMENTSComplete system upgrade of low level RF in AGS and RHIC (unified hardware and software, modern system, better ring-2-ring synchro)Window comparators to provide fast shutdown for storage systemsNew beam permit chassis to speed up the responseLow power circulatorsNew tubesOngoing work on window for storage systemContinue development of ferrite tuner for acceleration system

WAO-07 Fulvia Pilat

Abort kickers - Failure Modes

Prefires One module discharges unilaterally The other four fire in response ASAP Not synchronized with abort gap

Unconditioned Triggers All five modules discharge together Not synchronized with the abort gap

Spontaneous Capacitor Discharges As if a “stop charge” occurred with no

associated trigger – stop charge turns off the charging mechanism

Damaging if not noticed

WAO-07 Fulvia Pilat

Run 7 Prefires 12 yellow 18 blue

broken down by PFN module involved

1

2

3

4

5

4-M

ar

14

-Ma

r

24

-Ma

r

3-A

pr

13

-Ap

r

23

-Ap

r

3-M

ay

13

-Ma

y

23

-Ma

y

2-J

un

12

-Ju

n

22

-Ju

n

time (calendar)

mo

du

le i

nv

olv

ed

blue

yellow

RHIC abort kickers pre-fires in Run-7 broken out by ring and by module

WAO-07 Fulvia Pilat

Abort kickers: observations, improvements

• B2 and B4 use thyratron CX1575C. They will be replaced by CX3575C.

• Y5 had 7 pre-fire at beginning, but stayed clean after 4/4.

• Y1 stayed clean during entire RUN• Y5, B2, and B4 had 7 pre-fires each, contributed to

70% of total pre-fires.

What may help?• Condition high voltage system at higher voltage

than operation level (Engineering control? Routine procedure?)

• Keep modulators on• Pre-conditioning before beam operation • Keep operating voltage as low as possible

WAO-07 Fulvia Pilat

RHIC abort kickers: R&D

• Charge up high voltage modulators on command 4ms before beam abort to avoid pre-fire during long DC hold up

• A preliminary study was performed on 2003

• Project cost over $2 million based on 2003 budget estimate.

WAO-07 Fulvia Pilat

Cryo system: Phase III Upgrade

New gas bearing turbine for energy removal at the cold end of the refrigerator (Run-7).

New high efficiency vertical heat exchanger system at the cold end of refrigerator (Run-7).

Re-configured the cold helium supply to the accelerator rings to eliminate the use of the cold circulators (Run-6).

Modified Cold Box 5 to reduce Helium inventory, improve insulation, and reduce flow restrictions (Run-6).Results:

Saved an additional 1.0 MW of compressor power in Run-6.

Reduced the liquid inventory in the refrigerator. Additional 1.0 MW achieved during Run-7. Reduced number of running compressors by 4 FS and 1

SS.

WAO-07 Fulvia Pilat

RHIC POWER HISTORY

WAO-07 Fulvia Pilat

Cryo Stumbling at the Start of Run-7:HX OBSTRUCTION

Oil contamination in HX-20 from Rotoflow oil bearing expanders• Oil Crossover Happens During Start-up (Warm)

+ LN2 contamination on HX-20• Extended 80K operations contaminated GHe in RHIC• During cool-down 80K GHe returned to the refrigerator• Poorly seated crossover valve (H409M) between CR line and Expander 6 outlet allowed LN2 to collect on HX-20

= High Recooler Return Pressure resulting in (too) high magnet temperatures.

WAO-07 Fulvia Pilat

Blue 4.5KWave Starts

Blue recoolerWave Starts

Blue ready Yellow

45KWave Starts

Yellow 4.5KWave Starts

HX20 DP

He Flow Rate

Warm-up AttemptsTo Clear Blockage

WAO-07 Fulvia Pilat

OutlineOperation stats, performance

Factors determining time at storeMachine development (short term investment)

APEX: Accelerator Physics EXperiments program (longer term investment)

Scheduled Maintenance talk Sampson todayMachine set-upSystems downtime and failureMode of operation: “pushing the envelope”

WAO-07 Fulvia Pilat

Running for high availability

Example: Low energy copper run (Run-5)2 weeks of physics: choice to limit set-up time and

downtimeMachine parameters(almost the same #bunches 37-41, transmission HE~95%, LE ~ 85-

92 %, same transition set-up)bunch intensity: HE 41 x 4.5e9 LE: 37 x3.8e9

beta* HE: 0.85m LE: 3menergy HE: 100 GeV/u LE: 31.2 GeV/u

Reproducibility: minimized time tuning timeMinimized time between storesLonger lumi-lifetime

WAO-07 Fulvia Pilat

Cu Run-5 high-energy run*=0.85m (0.89m)

*=2.6m*=3.0m

access +snowstorm

power dip+access

access +equipmentfailures

time at store: 52%

WAO-07 Fulvia Pilat

Cu Run-5 low energy run

time at store: 74%

WAO-07 Fulvia Pilat

Cu Run-5 LE (week 2 – stores)

inje

ctio

n

acc

es

s

Phobos

0 &

pola

rity

Beam

experi

em

nts

WAO-07 Fulvia Pilat

Optimization of performance and availability

Projected performance and run plans must include optimization of the time at store if we want to achieve the 60% goalLimit the number of new developments during the run preparationStop or reduce machine developments during physics running once potential for returns is lowOptimal choice of lattice, beta*, bunch intensity and number of bunches (with parameters evolution during the run, more conservative or aggressive, based of optimization of delivered luminosity and time at store)

WAO-07 Fulvia Pilat

Conclusions

Analyzed machine availability at RHICIdentified the main factors determining the time at storeHave a plan towards increase availability to 60% in ~2 RHIC runs

….will report at the next WAO !