32
MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

Embed Size (px)

Citation preview

Page 1: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

MAX (MYRRHA Accelerator eXperiment) – MAX School

- Reliability Basics with Focus on the MYRRHA Linac Case

Adrian Pitigoi – EA (Spain)

Page 2: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

A. Reliability Basics – Concepts & Definitions

B. Common techniques in Reliability Analysis

C. Modeling High-power Accelerators Reliability - SNS Linac case (SNS-ORNL) - Myrrha Linac (MAX project)

Page 3: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

A. Reliability Basics – Concepts & Definitions

Reliability / Unreliability

Reliability Analysis – objectives: Evaluate Failure rate of components and overall system reliability Evaluate Design feasibility, compare design alternatives, Identify potential failure areas and track reliability improvement.

 

Failure -The change functioning - failed state Repair - Change from a failure to a functioning

Repairing - bring the component /system back to an “as good as new” condition.

For a repairable system, the cycle continues repeatedly with the repair-to failure and the failure-to-repair process.

Reliability, R(t) - probability that the component/system experiences no failures during the time interval 0 - t1 (new condition /functioning at t0). Unreliability, F(t) - probability that the component /system experiences the first failure or has failed one or more times during the interval 0 - t, (operating or repaired to a like new condition at t0).

The numerical values of both reliability and unreliability are expressed as a probability from 0 to 1.

R(t) + F(t) = 1

Unreliability F(t) = 1 – R(t)

Availability / Unavailability

Availability, A(t) - probability that the component or system is operating at time t, given that it was operating at time zero. Unavailability, Q(t), - probability that the component or system is not operating at time t, given that is was operating at time zero.

Therefore, the following relationship holds true since a component or system must be either operating or not operating at any time:

Unavailability Q(t) ≤ Unreliability F(t) – (rep)

A(t) + Q(t) = 1)

Page 4: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

A. Reliability Basics – Concepts & Definitions

Failure Rates

Conditional Failure Rate or Failure Intensity, λ(t) - anticipated number of times an item will fail in a specified time period, (good as new at t0 and functioning at time t). It is a calculated value that provides a measure of reliability for a product. This value is normally expressed as failures per million hours (fpmh or 106 hours)

Basic categories of failure rates:

Mean time between failures (MTBF) - basic measure of reliability for repairable items = time passed before a component, assembly, or system fails, under the condition of a constant failure rate / expected value of time between two consecutive failures, for repairable systems.

It is a commonly used variable in reliability and maintainability analyses.

Ex: a component with a failure rate of 2 failures/106hIs expected to fail 2 times in a million-hour time period.

MTBF= 1/λ, λ=ct.

Mean time to repair (MTTR) - total amount of time spent performing all corrective or preventative maintenance repairs divided by the total number of those repairs.

It is the expected span of time from a failure (or shut down) to the repair or maintenance completion.

This term is typically only used with repairable systems.

Mean time to failure (MTTF)

(non-repairable systems)

Page 5: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

A. Reliability Basics – Concepts & Definitions

Failure Frequencies

Failure Density f (t) of a comp./system - probability per unit time that the component or system experiences its first failure at time t, given that the comp./ system was operating at time zero.)

Failure Rate r(t) of a component or system, r(t) - probability per unit time that the component or system experiences a failure at time t, (operating at time zero and survived to time t).

Conditional Failure Intensity (Conditional Failure Rate) λ (t) - probability per unit time that the component or system experiences a failure at time t, (operating, or was repaired to be as good as new, at time zero and operating at time t).

Unconditional Failure Intensity or Failure Frequency ω(t) - probability per unit time that the component or system experiences a failure at time t, (operating at time zero).

Relationships Between Failure Parameters

r(t), λ(t) Difference: failure rate definition addresses the first failure of the component or system rather than any failure of the component or system

CFI-λ(t), ω(t) Difference: the CFI has an additional condition that the component or system has survived to time t.

For most reliability and availability studies the unavailability Q(t) of components and systems is very much less than 1. In such cases.

Page 6: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

A. Reliability Basics – Concepts & Definitions

Constant failure ratesIf the failure rate - constant then the following expressions apply: A constant failure rate results in an exponential failure density distribution.

Repairable and Non-repairable Items

Non-repairable itemsComponents or systems as light bulb, transistor, rocket motor, etc.

Reliability - survival probability over the items expected life / a specific period of time during its life, when only one failure can occur.

The instantaneous probability of the first and only failure is called the hazard rate or failure rate, r(t) .

Life values such as MTTF -used to define non-repairable items.

Non-repairable itemsReliability is the probability that failure will not occur in the time period of interest ; when more than one failure can occur, reliability can be expressed as the failure rate, λ.

Reliability can be characterized by MTBF, but only under the condition of constant failure rate.

Availability, A(t), is affected by the rate of occurrence of failures (failure rate, λ) or MTBF plus maintenance time.

A(t) is the probability that an item is in an operable state at any time.

Maintenance can be corrective (repair)

or preventive (reducing the likelihood of failure)

Page 7: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

A. Reliability Basics – Concepts & Definitions

Redundancy

Active Redundancy - Active Standby/Hot StandbyAll items operating simultaneously in parallel.

No change in the failure rate of the surviving item after the failure of a companion item.

Standby RedundancyAlternate items are activated upon failure of the first item.

Only one item is operating at a time to accomplish the function.

Warm StandbyNormally active or operational, but not under load.

Failure rate will be less due to lower stress.

Cold Standby (Passive)Normally not operating.

Failure of an item forces standby item to start operating.

k-out-of-n SystemsRedundant system of n items in which k of the n items must function for the system to function (voting decision).

Existence of two or more means, not necessarily identical, for accomplishing a given single function.

Active, Standby and Passive Redundancy function.Redundant components can be fully activated (active), partially activated (standby) or switched off completely (passive).

A mix of the above activity levels is also possible.

Certain Failure modes of one component (short-circuit, major leakeage,etc.) could lead to system failure.

Page 8: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

B. Common techniques in Reliability Analysis

Reliability block diagrams Advantage: ease of reliability expression and evaluation (common system rel. analysis tool- mission success oriented)

A reliability block diagram shows the system reliability structure. It is made up of individual blocks and each block corresponds to a system module or function.

The blocks in either series or parallel structure can be merged into a new block with the reliability expression of the equations a), b).

Page 9: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

B. Common techniques in Reliability Analysis

Reliability block diagrams

Five parallel-series connected modules

The merged blocks

k-out-of-n configuration

Page 10: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

B. Common techniques in Reliability Analysis

Fault Tree Analysis Common tool in system safety analysis. It has been adapted in a range of reliability applications - mission fail oriented.

A fault tree diagram is the underlying graphical model in fault tree analysis. The fault tree shows which combinations of the component failures will result in a system failure; it represents the logical relationships of ‘AND’ and ‘OR’ among diverse failure events.

The status of output/top event can be derived by the status of input events and the connections of the logical gates.

Fault tree for five modules

A fault tree diagram can describe the fault propagation in a system

Page 11: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

C. Modeling High-power Accelerators Reliability - SNS Linac case (SNS-ORNL)

- Myrrha Linac (MAX project)

Page 12: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

1. SNS Linac Modeling

Objective - Feedback on actual SNS reliability performance, in order to develop a reliability modeling tool for MAX project

Activities: Selection of the accelerator to be used for modeling (SNS) SNS Design & Reliability data collection Development of SNS Linac RS reliability model Performing reliability analysis of SNS Linac systems,

Targets: Evaluate the SNS Linac model (model results vs.

SNS operational data) Conclusions and recommendations

on optimization, increasing reliability.

Layout of the SNS Linac

Page 13: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

2. SNS Model - INPUT DATA SNS Design Data

SNS main/auxiliary systems Number of components (by type)

Data Sources:• SNS RAMI Static Model; SNS BlockSim model (Reliasoft)

SNS Systems and Functions SNS Parameters Systems and components System functions & interfaces

Data Sources:• SNS website (http://neutrons.ornl.gov/facilities/SNS/) http://neutrons.ornl.gov/facilities/SNS/works.shtml; • SNS Parameters (doc no. SNS 100000000-PL001R13)

(http://neutrons.ornl.gov/media/pubs/pdf/sns_parameters_list_june05.pdf)• SNS Design Control Documents (DCD)

SNS BlockSim Model

SNS Reliability Data Number of components (by type) Degree of redundancy Failure data: λ=1/MTTF; MTTR

(λ – Failure rate; MTTF-Main Time To Failure;

MTTR-Main Time To Repair)

Data Sources:• RAMI Static Model; SNS BlockSim model

SNS Operating Status Component failures - cause, type of component, time to repair, etc. Availability data (component failures causing accelerator trips: cause, component and system concerned, duration of trip)

Data Sources:• SNS Operation Data collection (http://status.sns.ornl.gov/beam.jsp)

http://status.sns.ornl.gov/beam.jsp

Page 14: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

3. Modeling Methodology

General Assumptions

SNS systems/components not modeled – Ring - RTBT, stripper foil, etc. (considered as not relevant for Max project purposes) Risk Spectrum Type 1 – Repairable components reliability model (continuously monitored) – Type 1 reliability model - modeling all SNS Linac components

- Failure/Repair processes – exponential distributions; failure/repair rates ct.- It is assumed q=0

λ=1/MTTF -failure rate); µ=1/MTTR -repair rate (MTTF;MTTR data – BlockSim Model data)

¨Mean Unavailability¨ type of calculation is used to obtain the unavailability values for the basic events:

Q=λ/(λ+µ) (the long-term average unavailability Q was calculated for each basic event)

Page 15: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

SNS Module 1- first modeling step: RFQ + MEBT + DTL

Gradual development of the SNS Linac model In-depth understanding of the SNS design and functioning for an accurate model.

4. SNS Reliability Model - Fault Tree Model

Page 16: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

SNS Fault Tree (complete model) - graphical representation of the SNS systems functional structure describing undesired events (“ system failures") and their causes.

4. SNS Reliability Model - Fault Tree Model

The Fault tree – logical gates and basic events. A fault tree - subdivided between several fault tree pages (bound together using transfer gates).

Page 17: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

4. Modeling the SNS Linac

SNS Linac Fault Tree Structure - Main levels of the fault trees - major parts of the SNS accelerator (Ion Source, LEBT, RFQ, MEBT, DTL-CCL-SCL, HEBT, CONV - auxiliary systems)

Page 18: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

4. Modeling the SNS Linac

DTL RF Fault Tree Structure

Page 19: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

4. Modeling the SNS Linac

CCL Transmitter Fault Tree Structure

Page 20: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

5. SNS Systems - Reliability Analysis Results

Analysis Case – Results Q = 2.60E-01 = 0.26; Q = 26 %

A = 1 - Q = 73 % (the limit Availability –

Mean Availability)

Minimal Cut-sets (MCS)

MCS Contribution

Page 21: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

5. SNS Systems - Reliability Analysis Results Analysis Case – Results

Q = 2.60E-01 = 0.26; Q = 26 %

A = 1 - Q = 73 % (the limit Availability –

Mean Availability)

Minimal Cut-sets (MCS)

Page 22: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

5. SNS Systems - Reliability Analysis Results

Analysis Case – Results Q = 2.60E-01 = 0.26; Q = 26 %

A = 1 - Q = 73 % (the limit Availability – Mean Availability)

MCS Analysis has been performed for the SNS Linac complete model (SNS ACC DOWN), or different parts (SCL, etc.) of the accelerator, with the following conclusions:

 Results - wide range of failure modes for comps/systems (wide failures dispersion)

The Linac, (DTL-CCL-SCL) represents the most concerned part (Q=1.25E-01; A=87.5%)

The higher values of Unavailability:• SCL (Q=9.85E-02; A=90%)• DGN&C (Q=7.15E-02; A=93%)• Front-End (Q=6.93E-02; A=93%)

The most affected part of the SCL is the SCL RF system: Q=6.33E-02; A=94% (primarily due to power supplies failures and klystron failures, but also to cooling and vacuum malfunctions)

The most affected parts of the Front-End are the LEBT (Q=2.83E-02; A=97%) and MEBT (Q= 2.82E-02; A=97%), more specifically the magnets the vacuum systems

Page 23: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

5. SNS Logbook Data – Accelerator trip failures

Page 24: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

5. SNS Logbook Data – Accelerator trip failuresSNS Reliability graphics (Logbook Availability and failure data)

SNS Outages (Jan-Feb, June 2012)

Accelerator trip failures frequency (by system)

Accelerator downtime contribution (by system)

Availability (Oct.2011 - June 2012)

RF system and electrical system failures - the most frequent; Electrical systems failures - the most important contribution to total accelerator downtime

(in consonance with the conclusions from the SNS RS Model runs)

Page 25: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

5. SNS Logbook Data – Accelerator trip failures

The most affected subsystems of the SNS Linac (failures leading to accelerator trips): SCL-HPRF (Superconducting Linac - High Power Radiofrequency)-

(short failures frequency)HVCM (High Voltage Converter Modulator (duration of trips)

(in accordance with the SCL RS analysis)

Electrical subsystems contribution to the acc. downtime

RF System failures (no. & duration-hours)

Page 26: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

5. SNS Reliability modeling – Model evaluation

SNS Reliability considerations (from past operation experience)

The reliability of input data mix used (RAMI static model, BlockSim model) - sources - data from staff Engineers, manufacturers (e.g. Titan, Varian, Maxwel), design reviews, etc. A reliability program has been implemented at SNS, reaching significant increase of the reliability of SNS installations in the past few years.

SNS RS Model Limitations

SNS reliability data (MTTF; MTTR) - SNS data mix The reliability improvement program - not quantified/represented in the RS model. The LEBT and DGN&C modules - relatively less developed (lack of detailed information)

Considering the reliability database used for quantifying, and the fact that the last years reliability improvements have not been included in the model, it can be affirmed that the overall availability of the SNS Linac (A=73%) resulting from RS model is confirmed by the availability figures of the SNS from the first years of SNS operation

Accelerator reliability Workshop in Cape Town, South Africa in April 2011 (G.Dodson talk)

The availability results obtained by MCS analysis run separately for the different SNS Linac parts (IS, RFQ, MEBT, DTL, CCL, SCL, HEBT) have matched up very well with the SNS Logbook Availability records, although the global result is A=73%. This is attributable to the fact that the MTTF and MTTR values used for model quantification may be too conservative and other constraints above.

aph
Page 27: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

6. Conclusions

The reliability results show that the most affected SNS Linac parts/systems are: SCL, Front-End systems (IS, LEBT, MEBT), Diagnostics & Controls RF systems (especially the SCL RF system) Power Supplies and PS Controllers

These results are in line with the records in the SNS Logbook

Reliability issue that most needs to be enforced in the linac design is the redundancy of the

systems, subsystems and components most affected by failures

Need for intelligent fail-over redundancy implementation in controllers, for compensation purposes

Enough diagnostics have to be implemented to allow reliable functioning of the redundant solutions and to

ensure the compensation function.

Page 28: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

7. MAX Task 4.4 – Myrrha linac Reliability model

Overall approach

Fault Tree, based on SNS model + Max design

Basic Events: Component / Function failures

Undeveloped Events/Systems: Reliability targets

Reliability model: Availability / Failure frequency (Linac shutdown)

Reliability Analysis: Design Optimization

Design & reliability data base

Data Source: SNS, Max team, suppliers, conservative assumptions / reliability targets

Support systems – gen. level

Page 29: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

7. MAX Task 4.4 – Myrrha linac Reliability (MTBF > 250 h)

Reliability challenges: Injector Switch reliability and duration

Conditions: High-reliability of injectors, reduced MTTR and possibility to perform maintenance without stopping the beam.

Injector Switch sequence: fault detection and first action of MPS few beam restart tries (w/ short pulses) by the MPS and the fault confirmation fault full diagnostic and acknowledgement by the control system dipole magnets switch fast beam commissioning before reaching nominal beam

Reliability analysis objective: to determine the relation MTTR-MTBF in configuration of 2 injectors, 1 operational and 1 hot standby

Page 30: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

7. MAX Task 4.4 – Myrrha linac Reliability (MTBF > 250 h)

Reliability challenges: Fault tolerance/compensation function (linac fault-recovery system)

Faults compensation- special conditions for the detuning system (CTS piezo detuning of the failed cavities) - higher failure rate should be considered (lower MTBF)

Fault detection + Compensation sequence: Recovery Data processing - Linac Control System defining new set-points (load or calculate) RF fields updating in the corrective cavities (by CCSs) CCS (LLRF loop + CTS) fast beam commissioning before reaching nominal beam

Page 31: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

7. MAX Task 4.4 – Next steps

Development of the Myrrha Linac Reliability model, based on the SNS RS Model and considering the

SNS reliability analysis results and conclusions.

Iterative process – Myrrha Linac Model to be updated during design work

Myrrha linac Risk Spectrum fault tree - currently under development

Reliability analysis to be performed, with due consideration of reliability challenges

Special attention - design of Diagnostics and Control systems (advanced)

Page 32: MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

Thank you