Reliability Handbook.pdf

  • Upload

  • View

  • Download

Embed Size (px)

Text of Reliability Handbook.pdf

  • 8/20/2019 Reliability Handbook.pdf



    John D. Campbell, editorGlobal Leader, Physical Asset Management

    PricewaterhouseCoopers LLP.



    Volume 23, Issue 6

    From downtime

    to uptime– in no time!

    From downtime

    to uptime– in no time!






  • 8/20/2019 Reliability Handbook.pdf


  • 8/20/2019 Reliability Handbook.pdf


    C c  a

    MRO Handbook is published by Clifford/Elliot Ltd., 209–3228 South Service Road., Burlington, Ontario, L7N 3H8. Telephone (905) 634-2100.Fax 1-800-268-7977. Canada Post – Canadian Publications Mail Product Sales Agreement 112534. International Standard Serial Number (ISSN) 0710-362X.

    Plant Engineering and Maintenance assumes no responsibility for the validity of the claims in items reported.*Goods & Services Tax Registration Number R101006989.

    EDITORPaul [email protected]


    George F.W. Clifford


    Joanna Malivoire



    Todd PhillipsNathan Mallet


    David Berger, P.Eng. (Alta.)


    Wilfred ListKen Bannister


    Ian Phillips


    Joanna Malivoire


    Julie Clifford


    Alistair Orr


    Corina Horsley


    Christine Zulawski



    Nicole Diemert


    Julie Bertoia


    Janice Armbrust


    Kent Milford


    A CLIFFORD/ELLIOT PUBLICATIONVolume 23, Issue 6 December 1999

    The Reliability Handbook 3

    JOHN D. CAMPBELL is a partner in PricewaterhouseCoopers and director

    of the firm’s maintenance management consulting practice. Specializing in

    maintenance and materials management, he has more than 20 years of

    worldwide experience in the assessment /implementation of strategy, man-

    agement and systems for maintenance, materials and physical asset life-

    cycle functions. He wrote the book U ptime: Strategies for Excellence in

    Maintenance Management (1995), and is co-author of Planning and Con-

    trol of Maintenance Systems: Modeling and Analysis (1999). You can reach

    him at 416-941-8448, or by e-mail at [email protected].

    ANDREW JARDINE is a professor in the Department of Mechanical and

    Industrial Engineering at the University of Toronto and principal investi-

    gator in the Department’s Condition-Based Maintenance Laboratory

    where the EXAKT software has been developed. He also serves as a seniorassociate consultant in the International Center of Excellence in Mainte-

    nance Management of PricewaterhouseCoopers. Dr. Jardine wrote the

    book Maintenance, Replacement and Reliability , first published in 1973

    and now in its sixth printing. You can reach him at 416-869-1130 ext.

    2475, or by e-mail at andrew.k. [email protected].

    JAMES PICKNELL is a principal in PricewaterhouseCoopers Mainte-

    nance Management Consulting Center of Excellence. He has more than

    twenty-one years of engineering and maintenance experience including

    international consulting in plant and facility maintenance management,

    strategy development and implementation, reliability engineering,

    spares inventories, life cycle costing and analysis, benchmarking for best

    practices, maintenance process redesign and implementation of Com-

    puterized Maintenance Management Systems (CMMS). You can reach

    him at 416-941-8360 or by e-mail at [email protected].

    A word from PEM 

    Our tradition of publishing annual, fact-filled PEM handbooks continues with this issue, The Reliability Handbook. As soonas the ink was dry on last year’s MRO Handb oo k , we went back to John Campbell and his team of experts at Pricewater-houseCoopers to see if they could provide our readers with cur rent and to-the-point information on the subject of re-

    liability-based maintenance, along with the tips they’d need to put this information into action. Well, the team at PWC camethrough with flying colours, and what you’ll see on the next 72 pages represents the cutting edge of reliability research andimplementation techniques from a firm that’s one of the world’s leading providers of this kind of information to plant pro-fessionals around the world. All of us at PEM hope you enjoy it, and use it well! — Paul Challen 

    LEN MIDDLETON is a principal consultant in PricewaterhouseCoopers’ Phys-

    ical Asset Management consulting practice. He has more than twenty years

    of professional experience in a variety of industries, including an indepen-

    dent practice of providing project management and engineering services.

    Project experience includes a variety of technical projects in existing manu-

    facturing sites and green-field sites, and projects involving bringing new

    products into an existing operating plant. You can reach him at 416 941-

    8383, ext. 62893, or by e-mail at [email protected]

    BEN STEVENS is a managing associate in Pricewaterh o u seCooper s ’

    Maintenance Management Consulting Centre of Excellence. He has

    more than thirty years of experience including the past twelve dedicated

    to the marketing, sales, development, justification and implementation

    of Computerized Maintenance Management Systems. His prior experi-ence includes the development, manufacture and implementation of

    production monitoring systems, executive-level management of mainte-

    nance, finance, administration functions, and management of re-engi-

    neering eff o rts for a major Canadian bank. You can reach him at

    416-941-8383 or by e-mail at [email protected].

    MURRAY WISEMAN is a principal consultant with Pricewaterhouse-

    Coopers, and has been in the maintenance field for more than 18 years.

    He has been a maintenance engineer in an aluminum smelting opera-

    tion, and maintenance superintendent at a large brewery. He also found-

    ed a commercial oil an alysis laboratory where he developed a

    Web-enabled Failure Modes and Effects Criticality Analysis system, incor-

    porating an expert system and links to two failure rate/ mode distribution

    databases at the Reliability Analysis Center. You can contact him at 416-

    815-5170 or by e-mail at [email protected].


  • 8/20/2019 Reliability Handbook.pdf


    The Reliability Handbook 5

    Reliability: past,

    present, futureEstablishing the historical andtheoretical framework of RCMby John D. Campbell

    Not all that long ago, equipment design and production cycles created an environment in which

    equipment maintenance was far less important than run-to-failure operation. Today, however, condition

    monitoring and the emergence of Reliability Centred Maintenance have changed the rules of the game.

    i n t ro d u c t i o n

    At the dawn of the new millenni-um, it is fitting that this editionof P E M ’s annual h a n d b o o k

    should be a discuss ion on re l i a b i l i t y management. A half a millennium ago,Galileo figured that the earth re vol vedaround the sun. A thousand years ago,e fficient wheels were made to re v o l v ea round axles on chariots. Four thou-sand years ago, the loom was the latest engineering marvel. A little closer tothe present day, maintenance manage-ment has evolved tremendously overthe past century.

    The maintenance function wasn’t even contemplated by early equipment designers, probably because of the un-complicated and robust nature of themachinery. But as we’ve moved to built-in obsolescence, we have seen a pro-g ression from preventive and plannedmaintenance after WWII, to conditionmonitoring, computerization and life

    cycle management in the 1990s. Today,evolving equipment characteristics aredictating maintenance practices, withp redominant tactics changing fro mrun-to-failure, to prevention and now toprediction. We’ve come a long way!

    Reliability management is often mis-understood. Reliability is very specific— it is the process of managing the in-terval between failures. If availability isa measure of the equipment uptime, orconversely the duration of downtime,reliability can be thought of as a mea-

    sure of the frequency of downtime.L et’s look at an example. In case 1,

     your injec tion moulding machine isdown for a 24-hour repair job in themiddle of what was supposed to be asolid five-day run. Its availability ist h e re f o re 80 perc ent (120 hr-24hr/120hr). The reliability of the ma-chine is 96 hours (96hr/1 failure).

    In case 2, your machine is down24 times for one hour each time.Its availability is st i l l 80 perc e n t  (120hr-24x1hr/120hr), but its reliabili-ty is only four hours (96/24 failures)!

    The two measures are, however,closely related:

    Availability =Reliability

    (Reliability + Maintainability)

     where maintainability is the mean timeto repair.

    One of the most robust approaches

    for managing reliability is adopting Re-liability Centred Maintenance. The re-cently-issued SAE standard for RCM is agood place to start to see if you areready to embrace this methodology. Al-though SAE defines RCM as a “logicaltechnical achieve design re-liability,” it requires the input of every-one associated with the equipment tomake the resulting maintenance pro-gram work. Even at that, we are stilldealing with a lot of uncertainty.

    Dealing with uncertainty in equip-

    ment perf o rmance is, in some ways,like dealing with people. If, for thepast three generations, your fore f a-thers lived to the ripe old age of 95,then there is a strong likelihood you will live into your 90s, too. But therea re a lot of random events that willhappen between now and then. De-spite this randomness, we can usestatistics to help us know what tasks todo (and not to do) to maximize ourlife span, and when to do them. Do ex-e rcise three times a week and don’t smoke — and by analogy, do vibrationmonitoring monthly and don’t re-build yearly.

     When we put cost into this equation, we start to enter the realm of optimiza-tion. For this, there are several model-ing techniques that have proven quiteuseful in balancing run-to-failure, timebased replacement and conditionbased maintenance. Our objective is to

    minimize costs and maximize availabili-ty and reliability.Our Physical Asset Management 

    team at PricewaterhouseCoopers ispleased to provide this introduction toreliability management and mainte-nance optimization. If you are inter-ested in a broader discussion on thesetopics, be sure to read our new bookMai n te n a nc e E x c e l le n c e : O p t im i z in g  

     E q uip m e nt Li fe C yc l e D e c is io ns , pub-lished by Marcel Dekker, New Yo r k(Spring 2000).  e

  • 8/20/2019 Reliability Handbook.pdf


    The Reliability Handbook 7

    The evolution

    of reliabilityHow RCM developed as aviable maintenance approachby Andrew K.S. Jardine

    Reliability mathematics and engineering really developed during the Second World War, through the

    design, development and use of missiles. During this period, the concept that a chain was as strong as

    its weakest link was clearly not applicable to systems that only functioned correctly if a number of sub-

    systems must first function. This resulted in “the product law for series systems,” which demonstrates

    that a highly reliable system requires very highly reliable sub-systems.

    chapter one

    To illustrate this, consider a systemthat consists of three sub-systems(A, B, and C) that must work in se-

    ries for the complete system to func-tion, as in Figure 1.

    In the system, where subsytems havereliability values of 97 percent, 95 and98 percent, respecti vel y, then the com-plete system has a reliability of 90 per-cent. (This is obtained from 0.97 x 0.95x 0.98). This is not 95 percent, which would be the case under the case underthe weakest-link theory.

    C l e a r l y, for real complex systems where the design configuration is dra-matically more complex than Figure 1,the calculation of system reliability ismore complex. To wards the end of the1940s, efforts to improve the reliabilit y 

    of systems focused on better engineer-ing design, stronger materials, andharder and smoother wearing surfaces.

    For example, General Motors ex-tended the useful life of traction motorsused in locomotives from 250,000 milesto one million miles by the use of betterinsulation, high temperature testing,and improved tapered-spherical ro ll e rbearings.

    In the 1950s, there was extensive de- velopment of reliability mathematics by statisticians, with the U.S. Depart ment 

    of Defense coordinating the re l ia bil i t y analysis of electronic systems by estab-lishing AGREE (Advisory Group onReliability of Electronic Equipment).

    The 1960s saw great interest inaerospace applications, and this result-ed in the development of re l i a b i l i t y block diagrams, such as Figure 1, forthe analysis of systems. Keen interest insystem reliability continued into the

    1970s, primarily fuelled by the nucleari ndustry. Also in the 1960s, there gre w an interest in system reliability in thecivil aviation industry, out of which re-sulted Reliability Centred Maintenance(RCM), which is covered in Chapter 3of this handbook.

    Subsequent to the early success of RCM in the aviation industry RCM hasbecome the predominant methodology  within enterprises (military, industrialetc.) in the 1970s, 80s and 90s to estab-lish reliable plant operations.

    The next millennium will see RCMcontinuing to play a significant role inestablishing maintenance programs, but a new feature will be the focus by mainte-nance professionals of closely examiningthe plans that result from an RCM analy-sis, and using procedures which enablethese plans to be optimized.

    Other sections (Chapter Five,“Opti-mizing time-based maintenance”, and

    Chapter Six,“Optimizing conditionbased maintenance”) of this handbookaddress procedures already available toassist in this thrust of maintenance opti-mization. Both of them cover tools,policies, analysis methods, and datap reparation techniques to further theimplementation of an RCM strategy.

    ReferencesReliability Enginee ring and Risk Asse ss - ment  , Henley and Komamoto, Prentice-Hall, 1981. e

    Figure 1: Series system reliability

  • 8/20/2019 Reliability Handbook.pdf


    The Reliability Handbook 9

    Take stock of

    your operationMeasuring and benchmarkingyour plant’s reliabilityby Leonard G. Middleton and Ben Stevens

    The benefits of benchmarking

    Benchmarking reinforces positive be-haviour and resource commitment. It enables faster progress towards goals

    by providing experience, through the ex-ample of the participant and impro ved“buy-in.” By using external standards of perf o rmance, the organization can stay competitive by reducing the risk of beingsurpassed by competitors’ performance,and by customers’ requirements.

    Most import a n t l y, benchmarkingmeets the information needs of stake-holders and management by focusingon critical , value-adding pro c e s s e s ,

     while achieving an integrated view of the business. Benchmarking is a pro-cess that makes demands of all its par-ticipants, but it is one of the best ways tofind practices that work well withincomparable circumstances.

    What should you measure?There is no mystique about performance

    measurement — the trick is in how touse the results to achieve the actions that are needed. This requires a number of 

    conditions to be in place — includingconsistent and reliable data, high quality analysis, a clear and persuasive presenta-

    Does your business operate using “cash-box accounting”? That’s where you count your money at the

    beginning and end of each month. If you have more when month’s end rolls around, then that’s good. If

    not, well you’ll try to do better next month — or at least until the money runs out. No truly successful

    large business operates using this model, yet maintenance is often organized and performed without

    proper measures to determine its impact on the business’s success. The use of performance measurement

    is rapidly increasing in maintenance departments around the world. This stems from a very simple

    understanding: You can’t manage what you don’t measure. Performance measurement is therefore a core

    element of maintenance management. The methodologies for capturing performance are of vital

    importance, since unreliable measurements lead to unreliable conclusions, which lead to faulty actions.

    chapter two

    Figure 2: Inputs, outputs, and measures of process effectiveness

  • 8/20/2019 Reliability Handbook.pdf


    tion of the information, and a recepti ve

     work environment (see Figure 2).In accordance with the theme that 

    maintenance optimization is targ e t e dat the company’s executive manage-ment and the boardroom, it is vital that the results are shown as a reflection of the basic business equation: Mainte-nance is a business process turning in-

    puts into usable outputs. Figure 3 shows

    the three major elements of this equa-tion — the inputs, the outputs and theconversion process within examples of performance measures.

    Input measures are the resources weallocate to the maintenance pro c e s s .Output measures are the outcomes of the maintenance process. Measures of 

    maintenance process effectiveness de-t e rmine where improvements shouldbe made. Specific measures for reliabil-ity include MTBF (reliability), MTTR (maintenance ser viceability).

    Relating performance measuresto business objectivesT h ree basic business operational sce-narios impact the focus and strategiesof maintenance. They are: 1. Excess operations capacity; 2. Constrained by operations capacity; 3. Focus on compliance to serv i c e ,quality or regulatory requirements.

    Benchmarking measures must re-flect the predominant business opera-tional scenario at the time. While somemeasures are common to all three sce-narios, the maintenance focus must align with the current focus of the or-ganisation. The operational scenariomay change due to changes in the econ-omy. For example, a strong economy or

    a reduction in interest rates could causean increase in demand of building ma-terials. As a consequence, building ma-terials industries (like gypsum

     w a l lboard and brick making) will shift from scenario 1 (excess capacity) to sce-nario 2 (constrained capacity).

    Most of the inputs in Figure 1 are

    10 The Reliability Handbook

    Figure 3: Maintenance optimising — where to start

  • 8/20/2019 Reliability Handbook.pdf


  • 8/20/2019 Reliability Handbook.pdf


    12 The Reliability Handbook

    also in the way “company X” does busi-ness — a heavier management structureand far less use of outside contractors,for example. What is clear from this

    high level benchmarking is that to pre-s e rve company X’s competitiveness inthe marketplace, something needs to bedone. Exactly what, though, re q u i re smore detailed analysis.

     As one may guess, the number of po-tential perf o rmance measures far ex-ceeds the ability (or the will) of themaintenance manager to collect, anal- yse and act on the data. An importa n t part, there f o re, of any program to im-plement perfo rmance measurement isthe thorough understanding of the few,key perf o rmance drivers. Maximum

    leverage should always take top priority,that is, you must first identify the indica-tors that show results and pro g ress in

    those areas that have the most criticalneed for improvement. As a place tostart, consider Figure 3 on page 10.

    If the business would be able to sellm o re products or servic es if theirprices were lowered, then the businessis said to be “cost-constrained.” Underthese circumstances, the maximump a y o ff is likely to come from concen-trating on controlling inputs — i.e.l abou r, materials, contractor costs, ando v e rheads. If the busin ess can pro f-itably sell all it produces, then it is

    called “production constrained,” and islikely to achieve maximum payoff fro mfocussing on maximizing outputst h rough re l i a b i l i t y, ava ilab ilit y and

    maintainability of the assets.

    Excess capacity (cost-constrained) businessesTypical maintenance financial mea-sures could include: 1. Maintenance budget versus expen-ditures (i.e. predictability of costs); 2. Maintenance expenditures cashflow (i.e. impact on ability to pay for ex-penditures); 3. Maintenance expenditures, re l a-tive to output (e.g. maintenance costsper production unit);

    4. Derivative measures of expendi-tures describing the actual activities thee x pe n d i tu res are used for: emerg enc y 

    re p a i r, condition based monitoring,c o rrecti ve planned work, shutdown work, process efficiency, cycle time per-centages, or production rate perc e n t-ages (not absolute production rate).

    Capacity-constrained businessesTypical maintenance productivity oroutput measures include: 1. Overall equipment effectiveness andeach of a piece of equipment’s compo-nents of availability, production rate, andquality rate, as primary measures;

    2. MTBF and MTTR, as secondary measures used to analyse problems withrespect to availability.

    RCM analysis (see Chapter 3)should include all operations bottle-necks and critical equipment (allowingfor impact of system or equipment re-d u n d a n c y, or parallel proces ses), inevaluating tactics.

    Compliance with requirementsThe critical success factors of the orga-nization often depend heavily on com-plia nce with a set of re q u i re m e n t sd e t e rmined by re g u l a t o ry agencies orby the customer base. Compliance may apply to the operation, such as effluent monitoring equipment. Regulated utili-ties, pharmaceutical and health careproducts, or “prestige” products are ex-amples of businesses whose marg i n sand nature are such that compliance istheir most critical operational aspect.Financial measures and output mea-

    s u res remain important, though s ec-ondary in focus.

    Typical maintenance measures with-in this operating scenario include: 1. Quality rate; 2. Availability (e.g. regulatory compli-ance requirement); 3. Equipment or system precision orrepeatability.

    RCM analys is shoul d, there f o re ,focus on the critical aspects of the com-pliance, as required by the critical out-side stakeholders.

    How well are you performing?The 10 maintenance process items of Figure 5 (above) result from the bench-marking exercise, and a subsequent analysis. They provide the broad targetsand the ability to measure progress to- wards goal achievement.

    Findings and sharing results“Best practices” are identified by bench-marking organizations, who take into ac-count constraints that may exist. Animplementation plan is developed tomake the “best practices” part of thebenchmarking org a n i z a t i o n ’s mainte-

    nance process (see Figure 6). In the spir-it of benchmarking, the results of thestudy are shared with the parti ci pa n t s.They receive a report detailing the find-ing of the benchmarking study, without recommendations specific to the bench-marking organization. This report is crit-ical, as it is the essential value they receive, for their effort expended.

    External sources of dataLegal means of getting additional datafor perf o rmance comparison re q u i re

    Figure 5: Measuring the gap

    An important part of any program to implement performance

    measurement is the thorough understanding of the key

    performance drivers — with prime emphasis given to the

    indicators that show progress in the areas that need the most

    help within a company’s maintenance operations.

  • 8/20/2019 Reliability Handbook.pdf


    14 The Reliability Handbook

    re s e a rching for secondary data (de-scribed below) and benchmarking.Benchmarking may be one or more of the following: 1. internal benchmarking; 2. benchmarking within the industry; 3. benchmarking with comparable or-ganisations not in the industry.

     After obtaining it, a cri tical issue with external data is how comparable it is. Financial data is particularly difficult to compare, because both financial ac-counting (including GAAP restrictions)and management cost accounting prac-tices vary according to the org a n i s a-tion’s objectives.

    Questions you should ask when ana-lyzing accounting data include: AreMRO materials and outside services in-cluded in the maintenance budget or

    p u rchasing budget? Are re p l a c e m e n t -in-kind projects and turn a rounds in-cluded in the maintenance budget or inthe capital budget? In calculatingequipment availability, does it considerall downtime, or just unscheduleddowntime? The answers will dependupon what the organization is trying toachieve with the calculation.

    Researching secondary dataS e c o n d a ry data is information collect-ed for other purposes. Typically these

    could be government-generated re-p o rts, industry association re p o rts, orannual re p o rts of publicly-traded com-panies. The data has a number of lim-itations. It is likely to be quantitative

     with little information available to un-derstand the context. It may not bec u rrent . Compar able per f o rm a n c em e a s u res may not be available be-cause their underlying data were not collected.

    General considerationsin benchmarkingBenchmarking is the sharing of similarinformation among benchmarking par-ticipants. Benchmarking can be compre-hensive, covering the entire businesso rganisation or focused at a part i cul arprocess or set of measures (see Figure 7).

    Benchmarking can take a number of forms. Specific, though usually qualita-tive information can be acquire dt h rough short (15 to 30 minute) tele-phone surveys. Detailed information canbe obtained through compre h e n s i v eq u est i o n n a i res, but participation thenbecomes an issue. A focused benchmark-ing study can address specific interests,but the content has to be a broad enoughto obtain participation (see Figure 8).

    Internal benchmarkingThe principal difficulty in benchmark-ing is getting the critical data neededfor comparison. Internal benchmark-

    ing addresses this issue by comparingdata from other organisations and divi-sions within the company. Informationcan be freely exchanged since it doesnot provide an advantage to a competi-t o r. Data exchanged through intern a lbenchmarking programs is typically quantitative because qualitative data re-quires considerably more analysis.

    The limitation (of internal bench-marking) is that knowledge of one’s per-f o rmance relativ e to the exter n a lcompanies remains unknown. Wit hout outside comparisons, a company is un-able to determine whether it is perform-ing at its highest possible capability.

    Industry-wide benchmarkingIn some industries, there is sharing of data through a third part y. It may be

    Figure 6: Acting on the results of a benchmarking study

    Figure 7: Benchmarking comparators

  • 8/20/2019 Reliability Handbook.pdf


    The Reliability Handbook 17

    t h rough an industry association orthrough an outside party that has devel-oped industry expertise and wants toremain a focus point of the industry (e.g. the PWC Global Fore s t ry bench-marking re p o rt). This third party en-sures that the data remains confidentialand is not directly attributed to any par-ticipating organisation. The organisation

     would also develop the q u e s t i o n n a i re ,although input from the part i c i p a n t s

     would help provide direction to en-s ure the results have the highest utility to the participants. It i s sometimes dif-ficult to get participation of all the crit-ical parties, as some companies view their operations as a strategic advan-tage over their competition. If they areindeed better than their competitorsand have nothing to learn from them,that would be true — but there is al- ways something to learn .

    Benchmarking withcomparable industries W h e re specific measures are desi re dby an organization, it is possible to per-f o rm a focused benchmarking study .Using an outside third party to main-tain confidentiality (that is, who be-longs to what data), an org a n i z a t i o ncan measure and compare specificp a rts of its process with the best prac-tices revealed by the exercise. As it isoften difficult to get information fro md i rect competitors, it is usually possi-ble to get information from other in-dustries with similar process issues andconstraints. A spin-off benefit of benchmarking with comparible indus-tries may be the discovery of new us-able ideas that may not be commonpractice in one’s own industry.

    For example, a client in the oil andgas refining industry wished to bench-

    mark electrical and instr u m e n t a t i o nmaintenance management. The partic-ipant selection criteria for other organi-zations were: 1. Electr ica l and instr u m e n t a t i o nmaintenance is critical to reliable oper-ations; 2. Production is a continuous processrequiring operating 24 hours a day, 7days a week; 3. Ramifications of unscheduleddowntime are severe and there is con-siderable eff o rt and focus by mainte-nance to avoid downtime; and 4. Maintenance is pro-active.

    The list of possible participants that met most or all of the requirements, in-cluded chemical sites, electrical powergeneration sites, waste water treatment plants, steel mills, as well as other oiland gas refining sites.

    Maintenance is an essential part of the overall business of an organization,and it must therefore take its lead fromthe objectives and direction of that or-ganization. Maintenance cannot oper-ate in isolation. The continuous

    i m p rovement loop that is key to im-p rovement i n maintenance, must bedriven by and mesh with the corpora-t i o n ’s own planning , execution andfeedback cycle.

    Disconnects frequently occur be-cause of the failure of maintenance tocorrelate from the corporate to the de-p a rtment level — for example if thecompany places a moratorium on new capital expenditures, then this must befed into the maintenance department’sequipment maintenance and re p l a c e-

    Figure 8: Stages of benchmarking

    Figure 9: The maintenance continuous improvement loop

  • 8/20/2019 Reliability Handbook.pdf


    ment strategy. Likewise, if the corporate mission is to pro-duce the highest possible quality product, then this is proba-bly not in synch with a maintenance depart m e n t ’s cost minimisation target. This type of disconnect frequently crops

    up inside the maintenance department itself; if the mainte-nance department’s mission is to be the best performing onein the business, then a strategy which excludes condition-based maintenance and reliability is unlikely to achieve theresults (see Figure 9). Similarly, if the strategy statement callsfor a 10 percent increase in reliability, then reliable and con-sistent data must be available to make the comparisons.

    Conflicting priorities for the maintenance managerIn modern industry, all maintenance departments face thesame dilemma — which of the many priorities is at the topof the list? (And dare one add “this week”?) Should the or-ganization minimize maintenance costs — or maximizep roduction throughput? Does it minimise downtime — orconcentrate on customer satisfaction? Should it spends h o rt- t e rm money on a reliability program to reduce long-t e rm costs?

    Corporate priorities are set by the senior executive andratified by the board of directors. These priorities shouldthen flow down to all parts of the organization. The mainte-nance manager’s task is to adopt those priorities, and convert them into the corresponding maintenance priorities, strate-gies and tactics which will achieve the results; then trackthem and improve on them.

    F i g u re 10 (above) shows an example of how the corpo-rate priorities can flow down through the maintenance pri-orities and strategies to the maintenance tactics that c o n t rol the everyday work of the maintenance depart m e n t .Hence if the corporate priorit y is to maximise pro d u c t 

    sales, then this can legitimately be converted into mainte-nance priorities that focus on maximizing throughput andt h e re f o re equipment re l i a b i l i t y. In turn, the maintenancestrategies will also reflect this, and could include (for ex-ample) implementing a formal reliability enhancement p rogram supported by condition-based monitoring. Out of these strategies, the daily, weekly and monthly tactics flow — providing the lists of individual tasks which then be-come the jobs that will appear on the work orders from theEAM or CMMS. The use of the work order as the “pro m p t ”to ensure that the inspections get done is widespre a d ;

     w h e re organizations frequently fail is to ensure that the fol-low-up analysi s and re p o rting is completed on a re g u l a r

    18 The Reliability Handbook

    Figure 10: Relating macro measurements with micro tasks

  • 8/20/2019 Reliability Handbook.pdf


    and timely basis. The most effective method of doing this isto set them up as weekly work order tasks which are thensubject to the same perf o rmance tracking as the pre v e n t i v eand repair work ord e r s .

    In seeking ways to improve perf o rmance, a ma inte-nance manager is confronted with many, seemingly con-flicting alternatives. Many review techniques are availableto establish where organizations stand in relation to indus-t ry standards or best maintenance practice. The best tech-niques are those which will also indicate the pay-off to bederived from improvement and there f o re the priorities.The review techniques tend to be split into the macro (cov-ering the full maintenance department and its relation tothe business) and the micro approaches (with the focus ona specific piece of equipment or a single aspect of themaintenance function).

    The leading techniques are:1. Maintenance effectiveness review: This covers the over-

    all effectiveness of the maintenance function and its relation-ship with the organization’s business strategies. These can beconducted internally or externally, and typically cover areassuch as: Maintenance strategy and communication; Maintenance organization;

    Human resources and employee empowerment; Use of maintenance tactics; Use of reliability engineering and reliability-based ap-proaches to equipment; Equipment perf o rmance monitoring and impro v e m e n t ; Information technology and management systems; Use and effectiveness of planning and scheduling; Materials management in support of maintenance opera-tions.

    2. External benchmark:This draws parallels with other or-ganizations to establish the organizations standing relative toindustry standards. Confidentiality is a key factor here, andresults are typically presented as a range of performance in-dicators and the target org a n i z a t i o n ’s ranking within that range. Some of the topics covered in benchmarking will over-lap with the maintenance effectiveness review; and additionaltopics include: Nature of business operations Current maintenance strategies and practices Planning and scheduling Inventory and stores management practices Budgeting and costing Maintenance performance and measurement  Use of CMMS and other IS tools Maintenance process re-engineering

    3. Internal comparisons: These will measure a similar set of parameters as the external benchmark, but will be drawnfrom different departments or plants. As such, they are gen-erally less expensive to undertake and, provided the data is

    consistent, can i llustrate diff e rences in the maintenancepractices among similar plants. These diff e rences then be-come the basis for shared experiences and the subsequent adoption of best practices drawn from these experiences.

    4. Best practices review: Looks at the process and operat-ing standards of the maintenance department and comparesthem against the best in the industry.

    This is generally the starting point for a maintenance pro-cess upgrade program, and will focus on areas such as: Preventive maintenance; Inventory and purchasing; Maintenance workflow; Operations involvement;

    The Reliability Handbook 19

  • 8/20/2019 Reliability Handbook.pdf


    Predictive maintenance; Reliability based maintenance; Total productive maintenance; Financial optimisation; Continuous improvement.

    5. Overall Equipment Eff e c t i v e n e s s( O E E ): A measure of a plant’s overalloperating effectiveness after deduct-ing losses due to scheduled and un-scheduled downtime, equipment p e rf o rma nce and quali ty. In eachcase, the sub-components have been

    defined meticulously, to provide oneof the few reasonably objective and

     widely-used indicators of equipment p e rf o rmance. In looking at the follow-ing summary of the results from onecompany (see Figure 11), re m e m b e rthat the category numbers are multi-plied through the calculation to de-rive the final result. Thus, Company Y 

     which achieves 90 percent or higherin each categor y (which look like pre t-ty good numbers), will only have an

    OEE of 74 percent. This means that by i n c reasing the OEE to, say, 95 perc e n t ,Company Y can increase its pro d u c-tion by 28 percent with minimal capi-tal expenditure (95-74)/74 =28).Doing this in three plants prevents thef o u rth from being built.

    These, then, are some of the highlevel indicators that serve to pro v i d emanagement with an overall compari-son of the effectiveness and compara-

    tive standing of the maintenanced e p a rtment. They are ver y useful forhighlighting the key issues at the execu-tive level, but re q u i re more detailedevaluation to generate specific actions.They also typically will re q u i re seniormanagement support and corporatefunding.

    F o rt u n a t e l y, there are many mea-s u res that can (and should) be imple-mented within the maintenanced e p a rtment which do not re q u i re ex-t e rnal approval or corporate funding.These are important to maintainers,as they can be used to stimulate a cli-mate of improvement and pro g re s s .Some of the many indicators at them i c ro level are : 1. Benefits realization assessment fol-lowing the purchase and implementa-tion of a system (EAM) or equipment against the planned results or initialcost-justification; 2. Machine reliability analysis/failurerates: targeted at individual machine orproduction lines; 3. Labour effectiveness review: mea-suring the allocation of manpower to

     jobs or categories of jobs compared to

    last year; 4. Analyses of materials usage, equip-ment availability, utilization, productivi-ty, losses, costs, etc.

     All of these indicators give useful in-formation about the maintenance busi-ness and how well tasks are beingperformed. The effective maintenancemanager will need to be able to select those that most directly contribute tothe achievement of the maintenancedepartment’s goals as well as the overallbusiness goals.  e

    20 The Reliability Handbook

    Target Company Y

    Availability 97% 90%

    xUtilization rate 97% 92%


    Process efficiency 97% 95%


    Quality 99% 94%


    Overall equipment

    effectiveness90% 74%

    Figure 11: Overall equipment effectiveness

  • 8/20/2019 Reliability Handbook.pdf


  • 8/20/2019 Reliability Handbook.pdf


    24 The Reliability Handbook

    ure of those functions; 4. Identify the fa ilure modes that cause those functional failures; 5. Identify the impacts or effects of those failures’ occurrence; 6. Use RCM logic to select appro pri-ate maintenance tactics; and 7. Document your final maintenanceprogram and refine it as you gather op-

    erating experience.These seven steps are intended to

    answer the seven questions posed in thenew SAE standard. Figure 12 (page 23)depicts the entire RCM process.

    In the first step, the RCM practition-er must decide what to analyze. It is themost critical items that require the most attention. There are many possible cri-teria and a few are suggested here: Personnel safety  Environmental compliance Production capacity  Production quality  P roduction cost (including mainte-nance costs) and; Public image.

     When a failure occurs in any system,equipment or device it may have vary-ing degrees of impact on each of thesecriteria, from “no impact” through “in-c reased risk” and “ minor impact” t o“major impact”. Each of these criteriaand impacts can be weighted. Items with the highest combined impact overall criteria should be analyzed first.

    The functions of each system are what it does — in either an active or pas-sive mode. Active functions are usually the obvious ones for which we name ourequipment. For example a motor con-trol center is used to control the opera-tion of various motors. Some systemsalso have less obvious secondary or evenprotective functions. A chemical processloop and a furnace both have a sec-o n d a ry function of containment andmay also have protective functions pro-

     vided by thermal insulating or chemicalcorrosion resistance properties.

    It is important to note that some

    systems do not perf o rm their activerole until some other event occurs, asin safety systems. This passive statemakes failures in these systems diff i-cult to spot until it’s too late. Eachfunction also has a set of operatinglimits. These parameters define “nor-mal” operation of the function. Whenthe system operates outside these “nor-mal” parameters, it is considered tohave failed. Defining functional fail-ures follows from these limits. We canexperience our sy stems failing high,l o w, on, off , open, closed, bre a c h e d ,drifting, unsteady, stuck, etc.

    Functions are often more easily de-termined for parts of an assembly than

    for the entire assembly. There are twoa p p roaches to determining functionsthat dictate the way the analysis will pro-ceed. One alternative is to look at equip-ment functions. To think of all thefailure modes it is necessary to imaginee verything that can go wrong with thisfairly high level of assembly. This ap-p roach is good for deter mining themajor failure modes but can miss someless obvious. An alternative is to look at “part” functions. This is done by dividingthe equipment into assemblies and partsmuch the same way as if you took theequipment apart. Each part has its ownfunctions and failure modes.

     A f ailure mode is “how” the systemfails to perform its function. A cylindermay be stuck in one position because of alack of lubrication by the hydraulic fluidin use. The functional failure is the fail-ure to stroke or provide linear motionbut the failure mode is the loss of lubri-cant properties of the hydraulic fluid. Of 

    course there are many possible causes forthis sort of failure that we must considerin determining the correct maintenanceaction to take to avoid the failure and itsconsequences. Causes may include: useof the wrong fluid, the absence of fluiddue to leakage, dirt in the fluid, corro-sion of the surfaces due to moisture inthe fluid, etc. Each of these can be ad-dressed by checking, changing or condi-tioning the fluid. These are maintenanceinterventions.

    Not all failures are equal. The conse-quences of failure are its effects on therest of the system, plant and operatingenvironment in which it is taking place.The failure of the cylinder above may cause excessive effluent flow to a river if it is actuating a sluice valve or weir in atreatment plant – that is, severe impact.The effects may also be as minor as fail-ing to release a “dead-man” brake on aforklift truck that is going to be used fora day of stacking pallets in a warehouse– that is, relatively minor impact. In onecase the impact is on the environ ment and in another it may be only a mainte-nance nuisance.

    By knowing the consequences of 

    each failure we can determine if thef a il ure is worthy of prevention, eff o rt sto predict it, some sort of periodic inter- vention to avoid it altogether, redesignto eliminate it or no action.

    F igu re 15 (page 26) graphically de-picts the RCM logic. RCM logic helps usto classify failures as being either hiddenor not and as having safety or environ-mental, production or maintenance im-pacts. To simplify the classical RCMlogic diagram we have shown the failureclassifying questions nearer the end of 

    Figure 13: What’s at stake when it fails?

    Figure 14: How complex? Which functions are most easily defined?

  • 8/20/2019 Reliability Handbook.pdf


    26 The Reliability Handbook

    the logic tree. Investigations of failuremodes reveal that most failures of com-plex systems made up of mechanical,electrical and hydraulic components

     will fail in some sort of random fashion– they are not predictable with any de-gree of confidence.

    Many of these failures will howeverbe detectable before they have reacheda point where the functional failure canbe deemed to have taken place. For ex-ample, failure of a booster pump to re-

    fill a res ervoir that is in use to pro videoperating head to a municipal water sys-tem may not cause a loss of system func-t io n a l it y. It can be detected before welose municipal water pressure, however,if we are watching for it – that is theessence of condition monitoring. Welook for the failure that has already hap-pened but hasn’t pro g ressed to t hepoint of degrading system functionality.By finding these failures in this “early”failed state we can avoid the conse-quences on overall functional perf o r-

    mance. Since most failures are randomin nature RCM logic first asks if it is pos-sible to detect them in time to avoid lossof the function of the system. If the an-swer is yes then the need for a “condi-tion monitoring” task is the result.

    To avoid the functional failure event  we must monitor often enough that weare confident in detecting the deteriora-tion with sufficient time to act beforethe function is lost. For example, in thecase of our booster pump we may want 

    to check for its correct operation once aday if we know that it takes a day to re-pair it and two days for the reservoir todrain. That provides at least 24 hoursbetween detection of a pump failureand its restoration to service without loss of the re s e rvoir system’s function.Optimization of these decisions is dis-cussed further in Chapter 7.

    If the failure is not detectable in suffi-cient time to avoid functional failure thenthe logic asks if it is possible to repair theitem failure mode to reduce failure rate.

    Some failures are quite pre d i c t a bl eeven if they can’t be detected early enough. For example we can safely pre-dict brake wear, belt wear, tire wear, ero-sion, etc. These failures may be difficult to detect through condition monitoringin time to avoid functional failure, orthey may be so predictable that monitor-ing for the obvious is not warranted. Why shut down equipment to monitor for belt  wear monthly if you know with confi -dence that it is not likely to appear fortwo years? You could monitor every yearbut in some cases it may be more logicalto simply replace the belts without check-ing for their condition every two years. Yes there is a risk that failure occurs early and there is also a risk that the belts willbe fine and you are replacing belts that a re not worn. These decisions are dis-cussed further in Chapter 6.

    If it is not practical to replace com-ponents or to re s t o re “as new” condi-tion through some sort of usage or

    time based action then it may be possi-ble to replace the entire equipment.Usually this makes sense if the loss of function is very critical because thisimplies an expensive sparing policy tos u p p o rt this approach. Perhaps thecost of lost production in the down-time associated with part re p l a c e m e n t is too expensive and the cost of entirereplacement spare equipment is less. Again, this sort of decision is discussedf u rther in Chapter 6.

    In the case of hidden failure modesthat are common in safety or protecti vesystems it may not be possible to moni-tor for deterioration because the systemis normally inactive. If the failure modeis random it may not make sense to re-place the component on some timedbasis because you could be replacing it  with another like component that failsimmediately upon installation.

     You simply can’t tell. In these casesRCM logic asks us to explore functionalfailure finding tests. These are tests that 

     we can perform that may cause the de- vice to become active, demonstra tingthe presence or absence of correct func-tionalit y. If such a test is not possible you

    should re-design the component or sys-tem to eliminate the hidden failure.In the case of failures that are not 

    hidden and you can’t predict with suffi-cient time to avoid functional failureand you can’t prevent failure thro ug husage or time based replacements youcan either re-design or accept the fail-ure and its consequences. In the case of safety or environmental consequences you should re-design. In the case of pro-duction related consequences you may chose to redesign or run to failure de-

    Figure 15: RCM decision logic

  • 8/20/2019 Reliability Handbook.pdf


    pending on the economics associated with the consequences. If there are noproduction consequences but there aremaintenance costs to consider youmake a similar choice. In these cases thedecision is based on economics — that is, the cost of redesign vs. the cost of ac-cepting failure consequences (like lost production, repair costs, overtime, etc.).

    Task frequency is often difficult todetermine with confidence. Chapters 6and 7 discuss this problem in detail, but for the purpose of this chapter, it is suf-ficient to recognize that failure history isa prime determinant. You should recog-nize that failures won’t happen exactly  when you predict, so you have to allow some leeway. Recognize also that the in-formation you are using to base your de-cision upon may be faulty orincomplete. To simplify the next step, which entails grouping similar tasks, it makes sense to pre-determine a numberof acceptable frequencies such as daily,

     w e e kl y, every shift, quarterl y, annually,units produced, distances traveled ornumber of operating cycles, etc. Select those that are closest to the frequencies your maintenance and operating histo-ry tells you make the most sense.

     After having run the failure modesthrough the above logic, the practition-

    er must then consolidate the tasks intoa maintenance plan for the system. Thisis the final “product” of RCM. Whenthis has been produced the maintainerand operator must continually strive toimprove the product. Task frequenciesthat are originally selected may be over-ly conservative or too long.

    If you experience too many failuresthat you think you should be pre v e n t-ing then you are probably not perform-ing your proactive maintenanceinterventions frequently enough. If younever see any of what used to be com-mon failures that have little conse-quence or your preventive costs arehigher than your costs when you did nopreventive maintenance then it is possi-ble that you are maintaining the itemtoo fre qu e ntl y. This is where optimiza-tion techniques come in. (See Chapters6 and 7 for a complete discussion.)

    The RCM “product”

    The output of RCM is a maintenanceplan. That document contains consol-idated listings with descriptions of thecondition monitoring, time or usagebased intervention and failure findingtasks, the re-design decisions and theru n - t o - f a i l u re deci sions. Thi s docu-ment is not a “plan” in the true sense.

    It does not contain typical mainte-nance planning information like taskduration, tools and test equipment,p a rts and mat eria ls re q u i re m e n t s ,trades re q u i rements and a detailed se-quence of steps.

    In a complex system there may bethousands of tasks identified. To get a“feel” for the size of the output considera typical process plant that carries sparesfor only 50 percent or so of its compo-nents. Each of those may have severalfailure modes. That plant probably car-ries some 15,000 to 20,000 individualpart numbers (stock keeping units) in itsin ventory. That means that there may besome 40,000 parts with one or more fail-ure modes and task decisions.

    F o rt u n a t e l y, the re are a lim itednumber of condition monitoring tech-niques available to us and these willcover much of the output task list, be-cause most of failure modes are ran-dom. These tasks can be grouped by 

    technique (e.g. vibration analysis) andby location (like the machine ro o m )and by sub-location on a route. It may be possible to group tens and evenh u n d reds of individual failure modesthis way so as to reduce the number of output tasks for detailed maintenanceplanning. It is necessary to watch the

    The Reliability Handbook   27

  • 8/20/2019 Reliability Handbook.pdf


    f requenc ies at whi ch the gro u p e dtasks were specified.

    Time- or usage-based tasks are alsoeasy to group together. All the replace-ment or refurbishment tasks for a singlepiece of equipment may be grouped by task frequency into a single overh a u ltask. Similarly, multiple overhauls in asingle area of a plant may be grou pedinto a single shutdown plan.

     Another way to group the outputs is by 

     who does them. Tasks assigned to opera-tors are often done using the senses of touch, sight, smell or sound. These areoften grouped logically into daily or shift checklists or inspection rounds checklists.

    In the end you should have a com-plete listing that tells you what mainte-nance must do, and when. The plannerhas the job of determining the detailsof what is needed to execute the work.

    What can RCM achieve?RCM has been a round for ap pro x i-

    mately 30 years. It began with studies of airliner failures in the 1960s, to reducethe amount of maintenance work re-quired for what was then the new gener-ation of larger wide-bodied aircraft. Asaircraft grew larger and had more partsand therefore more things to go wrong,it was evident that maintenance re-q u i rements would similarly grow andeat into flying time which was needed togenerate revenue. In the extreme, safe-

    ty could have been very expensive toachieve and could have made flying un-economical. The success of the airlinei n d u s t r y in increasing flying hours,drastically improving its safety re c o rd

    and showing the rest of the world that an almost entirely proactive mainte-nance approach is possible, all attest tothe success of RCM.

    New aircraft that had their mainte-nance determined using RCM re-q u i red fewer maintenance man-hoursper flight hour. Since the 1960s, air-

    craft safety perf o rmance has been im-p roving dramatically.

    Outside the aircraft industry, RCMhas also been used successfully. Military projects often mandate the use of RCMbecause it allows the end users to expe-rience the sort of highly reliable equip-ment perf o rmance that the airlinesexperience. The author participated ina shipbuilding project where total main-tenance workload on the ship’s crew wasreduced by almost 50 percent from that experienced on a similarly sized class of ships. At the same time the ship’s avail-ability for service was improved from 60to 70 percent through the reduced re-q u i rement for downtime for mainte-nance intervention.

    The mining industry typically findsitself operating in remote locations that are far from sources of parts and mate-rials and replacement labour. Conse-quently miners want high reliability andavailability of equipment — minimum

    downtime and maximum pro d u c t i v i t y f rom the equipment. RCM has beenhelpful in improving availability forfleets of haul trucks and other equip-ment while reducing maintenance costsfor parts and labour and planned main-tenance downtime.

    RCM has also been successful in

    28 The Reliability Handbook

    The success of the airline industry was a highly-visible

    endorsement of the success of RCM, and showed the world the

    benefits of an almost entirely proactive maintenance approach.

  • 8/20/2019 Reliability Handbook.pdf


    The Reliability Handbook   29

    chemical plants, oil refineries, gasplants, remote compressor and pump-ing stations, mineral refining and smelt-ing, steel, aluminum, pulp and papermills, tissue converting operations,food and beverage processing andbreweries. Anywhere that high reliabili-ty and availability is important is a po-

    tential application site for RCM.

    What does it take to do RCM?RCM is not a household word (oracronym). It must be learned and prac-tised to attain proficiency and to gainthe benefits that can be achieved. Imple-mentation of RCM entails: Selecting a willing practitioner team; Training them in RCM; Teaching other “stakeholders” in theplant operation and maintenance what RCM is and what it can achieve forthem, Selecting a pilot project to impro v eupon the team’s proficiency whiledemonstrating success and A roll-out of the process to other areasof the plant.

    One key to success in RCM is thedemonstration of success. Before the anal-

     ysis begins, the RCM team should deter-mine the plant baseline measures forreliability and availability as well as proac-tive maintenance program coverage andcompliance. These measures will be usedlater in comparisons of what has beenchanged and the success it is achieving.

    The team must be multi-disciplinary,

    and able to draw upon specialist knowl-edge when it’s needed. It re q u i re sknowledge of the day-to-day operationsof the plant and equipment, along withdetailed knowledge of the equipment itself. This dictates at least one operatorand one maintainer. Knowledge of planning and scheduling and overallmaintenance operations and capabili-ties is also needed to ensure that thetasks are truly doable in the plant envi-ronment, and senior l evel operationsand maintenance representation is also

    needed. Finally, detailed equipment de-sign knowledge is important to theteam . This know ledge re q u i re m e n t generates the need for an engineer orsenior technician / technologist fro mmaintenance or production, usually  with a strong background in either themechanical or electrical discipline.

    The team now numbers f ive, and ex-perience shows that this is optimum.Too many people will slow pro g re s s ,and too few means that a lot of time isspent in seeking answers to the many questions that inevitably arise.

    Initi al l y, the team will need help toget started. Training can take from a week to a month depending on the ap-p roach used. It is usually followed up

     with the pilot project. The pilot is part of the training that is used to produce areal product.

    Training for the team should takeabout a week. Training of other stake-holders can take as little as a couple of hours to a day or two depending ontheir degree of interest and “need toknow”.

    The pilot project time can vary widely depending on the complexity of theequipment or system selected for analysis. A good guideline is to allow for a monthof pilot analysis work to ensure the teamknows RCM well and is comfortable inusing it. Each failure mode can take

    about half an hour of analysis time. Usingthe process plant example from before a

     very thorough analysis of all systems en-tailing at least 40,000 items (many withmore than one failure mode) would en-tail over 20,000 man-hours (that’s nearly 10 man-years for an entire plant). When you divide that by five team members youcan expect the analysis effort to take up totwo years for an entire plant of that size.

    Can you afford it?So, what can you expect to pay fortraining, software, consulting support,and your staff’s time? You can see fro mthe example that a large process plant  will re q u i re a lot of eff o rt to analyze.That eff o rt comes at a price. Ten man- years at an average of, say, $70,000 perperson tallies up to $700,000 for yours t a ff time alone. The training for theteam and others will req u ire a coupleof weeks from a third- pa rty expert. Theexp e rt should also be retained for the

    duration of the pilot project — andt hat ’s another month.

    Several software tools exist that step you through the RCM process and store your answers and results as you producethem. Some of the software can bebought for only a few thousand dollarsfor a single user license. Some of it comes with the training in RCM andsome of it comes as part of large com-puterized maintenance management systems. Prices for these high-end sys-tems that include RCM are typically inthe hundreds of thousands of dollars.

    To assist in determining task fre-quencies it will be necessary to have anunderstanding of your plant failure his-tories or to be able to interro g a t edatabases of failure rates. Your plant f a i l u re histor y should be available to

     you already through your maintenancemanagement system. You may req u irehelp in building queries and ru n n i n greports and that may require time from your programming staff. External relia -bility databases are available although

    Figure 16: Aircraft are more reliable now than several years ago, due in part to RCM.

    Figure 17: RCM can be a lot of work!

  • 8/20/2019 Reliability Handbook.pdf


    not always easily located. Access to themmay require a user or license fee.

    RCM has experienced a great deal of success and widespread acceptance insome industries where safety and highreliability have been drivers. It has alsofailed in many other attempts.

    Reasons for Failure of RCMThere are many reasons for failure, in-cluding. but not necessarily limited to: Lack of management support andleadership; Lack of “vision” of the end result of the RCM program; No clearly stated reason(s) for doingRCM (i.e. it becomes another “programof the month”); Lack of the right resources to man thee ff o rt especially in “lean manufactur-ing” environments; A clash of RCM’s proactive underpin-nings with a traditional and highly reac-tive plant culture;

    Giving up before it’s complete; Continued errors in the process andresults that don’t stand up to practical“sanity checks” by dirt-under-the-finger-nails maintainers. The wrong teamcomposition or lack of understandingcontribute to this one; Lack of available information on theequipment/systems under analysis. Inreality this need not be a significant hur-dle, but it often stops people cold; Disappointment occurs when theRCM-generated tasks appear to be thesame as those already in the PM pro-gram that has been in use for sometime. Criticism arises that it’s a big exer-cise which is merely proving what youare already doing; Lack of measurable success early inthe RCM program. This is usually be-cause a starting set of measures wasn’t taken, a goal did not exist and no ongo-ing measurements are taken; Results don’t happen quickly enough.The impacts of doing the right type of PM often don’t happen immediately and it takes time for results to show up— typically 12 to 18 months; T h e re is no compelling reason to

    maintain the momentum or even start the program; The program runs out of funding; or The organization lacks the ability toimplement the results of the RCM anal- ysis (e.g. no functional work order sys-tem that can trigger PM work orders ona pre-determined basis).

    There are a number of solutions tothis problem. One which often works isthe use of an outside facilitator or con-sultant. A knowledgeable facilitator canhelp get the client through the process

    The Reliability Handbook 31

  • 8/20/2019 Reliability Handbook.pdf


    and help to maintain momentum. Alsoa few shortcut methods have been de-

     veloped to help companies get beyondthese problems.

    “Flavours” of RCMIn one methodological “flavour” of RCM,logic is used to test the validity of an exist-ing PM program. A drawback is that thisapproach fails to recognize what you arealready missing in your PM program. Forexample, if your current PM pro g r ammakes extensive use of vibration analysisand thermographic analysis but nothingelse, it may very adequately address fail-u re modes that result in vibrations orheat. But it will miss other failure modesthat manifest themselves in cracks, re-duction in thickness, wear, lubricant property degradation, wear metal depo-sition, surface finish or dimensional dete-rioration, etc. Clearly this program doesnot cover all possibilities.

    In another f lavour of RCM,

    criticality is used to weed out failuremodes from ever being analyzed. Ty pi-cal l y, the failure modes being ignoredeither arise in parts of equipment that is deemed to be non-critical or the fail-u re modes and their effects aredeemed to be non-critical and theRCM logic is not applied. In these casesthe program is disregarding failures be-cause they do not exceed some hurdlerate. The savings arise because analysise ff o rt is reduced. When criticality is ap-plied to the failure modes themselvesthere is relatively little risk of causing acritical problem. The disadvantage of this approach is that you spend most of the effort and cost getting to a decisionto do nothing — remember that youa re at step five of seven here. There-fore, relatively little is saved.

     When a criticality hurdle rate is ap-plied to equipment, the decisions to re-duce analysis can be made before most of the analysis is done (at step one).Some but not necessarily all of theequipment failure modes will be knownintuitively to those performing the criti-cality analysis but they will not be docu-mented at this point. Little eff o rt is

    expended and considerable pro g r a mcosts can be saved. This method has ap-peal to companies that are limiting theirbudgets for these proactive efforts.

    This latter flavour of RCM may be en-tirely acceptable if the consequences of possible failure are known with confi-dence and are acceptable in production,maintenance, cost, environmental andhuman terms. For example, many fail-u res have relatively little consequenceother than the loss of production andmanuf actu ring process disr u p t i o n ,

    along with their associated costs.Purists may argue that doing anything

    less than full RCM is irresponsible be-cause without the full analysis you are po-tentially ignoring real and critical failures,even if inadvert e n t l y. This is of courset rue and it is this very concern that prompted the SAE to develop JA-1011.

    U n f o rt u n a t e l y, it is also true that many companies suffer from one ormany of the reasons for failure de-scribed, and possibly others. Wi t h o u t the force of law, RCM standards such asSAE JA-1011 and Nowland and Heapand others are mere guidelines that may or may not be followed, dependingon the decisions made at each compa-n y. These decisions are often made by those who, although senior, lack a com-prehensive RCM knowledge.

    Knowledgeable practitioners or en-thusiastic supporters of being proacti vemay recognize that their particular plant suffers from one or more of these poten-

    tial causes of failure. We should do what  we can to mitigate po tent ia l conse-quences and avert risk. As responsible en-gineers and maintainers we may findourselves in a position that demands westart slowly and build up to full RCM.

    Simply reviewing an existing PMp rogram using RCM logic will accom-plish very little. Reviewing only criticalequipment does more and does it  w h e re it counts the most. Rev iewingcritical equipment first and then mov-ing down the criticality scale does moreagain and eventually achieves the fullobjectives of RCM.

    If perf o rming RCM i s s imply toomuch to expect realistically in your com-pan y, then an alternative approach may be the best you can do and it may bemuch more achievable. This will also re-duce risk by at least some amount and isbetter than doing nothing at all.

    Capability driven RCMIf RCM logic progresses from equipment to failure modes and then through deci-sion logic to a result, can the oppositeprocess flow not address many of the fail-ures even though they may not be clearly 

    identified? Why not reverse the logic,start with the solutions (of which thereare a finite number) and look for goodplaces to apply those solutions?

    I t ’s worth looking at the followingquestions as well:  W h a t ’s wr ong with us ing exist ingcondition monitoring techniques andextending their use to other pieces of equipment? If you can do vibrationanalysis on some equipment why not do it elsewhere ? What’s wrong with looking specifically 

    32 The Reliability Handbook

  • 8/20/2019 Reliability Handbook.pdf


  • 8/20/2019 Reliability Handbook.pdf


    ments and overhauls. In plants having a great deal of redun-d a n c y, extensive “swinging” of operat-ing equipment from A to B and back,possibly combined with equalization of running hours. Extensive testing of safety systems. The systematic capture of informationabout failures that occur and analysis of that information and the failures them-selves to determine root-causes of thefailures so that they are eliminated.

     Wh il e thi s app roa ch does not achieve the results that full RCM anal- ysis wil l accomplish, it is founded onRCM principles and will move the or-ganization in the direction of beingm o re proactive. CD-RCM is intendedto build upon early success withp roven methods t argeted where they make sense so that credibil ity isgained and the likelihood of imple-menting full RCM is enhanced.

    How do you decide? You can see that RCM is a lot of workand can be expen sive to perf o rm .T here are alternatives that are less rig-orous. You may be faced with the chal-lenge of justifying the costs associated

     with a full RCM program and be unableto say what sort of savings will arise withany confidence. This is a tough situa-tion to be faced with and each situation will have its own peculiarities and twistsand personalities to deal with.

    RCM is the most thorough and com-plete approach you can take to deter-mine the right proactive maintenanceapproaches to use in achieving high sys-tem reliability. It is expensive and timeconsuming — the results, although im-p ressive, can take time to accomplish.Often this time is sufficient that RCMdoes not exceed the hurdle rates oftencalled for in the modern world of busi-ness investments.

    Simply reviewing your existing PMprogram with an RCM approach is not really an option for a responsible man-ager — it simply risks missing too muchthat may be critical and safety or envi-ronmentally significant.

    Streamlined (or “Lite”) RCM may beappropriate for industrial environments where criticality is recognized and usedto guide the analysis eff o rts using thelimited or time re s o u rces that can bemade available. This will achieve the de-sired RCM results on a smaller but welltargeted sub-set of the failure modes onthe critical equipment and systems.

     W h e re RCM inv estment is not anoption, the final alternative is to buildup to it using CD-RCM which adds abit of logic to the old approach of get-

    ting a new technology and applying it e v e ry w h e re. In CD-RCM we take stockof what we can do now, make sure wea re applying that as widely as possibleand demonstrate success by comply-ing with the new program. After suc-cess is demonstrated you can expandthe program using that success as “ev-idence” that it works and pro d u c e sthe desired results. Eventually, RCMcan be used to ensure that the pro-gram is complete.

    RCM decision checklist You must answer a number of questionsand evaluate several alternatives tod e t e rmine if RCM is right for you.These questions are posed througho u t this chapter and are summarized herefor quick reference. 1.Can your plant or operation sell ev-erything it can produce? If the answer is yes, then high reliabilit y is import a n t and RCM should be considered, and you can skip to question five. If the an-

    The Reliability Handbook 35

  • 8/20/2019 Reliability Handbook.pdf


    swer is no then you need to focus oncost cutting measures. 2.Do you experience unacceptablesafety or environmental perform ance?If yes, then RCM is probably for you —skip to question five. 3. Do you already have an extensivep reventive maintenance program inplace? If the answer is yes you may ben-efit from RCM if its costs are unaccept-ably high. If the answer is no you may 

    still benefit from RCM if your mainte-nance costs are high compared withothers in your business. 4. Are your maintenance costs highrelative to others in your business? If 

     yes then RCM is for you — proceed toquestion 5. If not, then you probably won’t benefit from RCM, and you can stoph e re .

     At this point you have one or severalof the following: a need for high reliability;

    safety or environmental problems; an expensive and low performing PMprogram, or; no significant PM program and highoverall maintenance costs. 5. RCM is for you. You need to en-su re that your organization is ready forit. Do you already experience a “con-t rolled ” maint enance envi ro n m e n t  w h e re most work is predicta ble andplanned and where you can confident-

    ly expect that planned work, like PMand PdM, will get done when sched-uled? If yes, you pass the very basic test of readiness — your maintenance envi-ronment is under control — and can

    p roceed to question 6. RCM won’t  work well if you can’t do it in a con-t rolled environment. If not, then youneed help beyond what RCM alonecan do for you. Get help in getting your maintenance activities under con-trol f irst, before going any furt h e r.

     You need RCM and your org a n i z a-

    tion is under control already — you areready for it. Now you need to get theOK to go ahead with it. If you can sim-ply approve it then go for it. Otherwise: 6. Can you get senior management support for the investment of time andcost in RCM training and piloting androllout? If yes, then you are ready forRCM and it’s ready for you — so stophere, your decision is made. If not then you need to consider the alternatives tofull RCM — proceed to question 7. 7. Can you get senior management support for the investment of time andcost in RCM “Lite” training and pilot-ing? This investment will require about one month of your team’s time (5 per-sons) plus a consultant for the month.If yes then you should consider theRCM “Lite” method to demonstratesuccess before attempting to roll RCMout across the entire organization. Youcan stop here — your decision is made.If not then you are faced with the chal-

    lenge of proving yourself to your seniormanagement and demonstrating suc-cess with a less thorough approach that requires little up front investment anduses existing capabilities. Your re m ain-ing alternative here is CD-RCM and agradual build up of success and cre di-bility to expand on it.  e

    36 The Reliability Handbook

    When full RCM investment is not an option, there are options,

    like “RCM Lite” or Capability-driven RCM (CD-RCM) that might

    do the trick for your operation in the short run.

  • 8/20/2019 Reliability Handbook.pdf


    The Reliability Handbook   39

    The problem

    of uncertaintyWhat to do when your reliabilityplans aren’t looking so reliableby Murray Wiseman

    How many of us have been toldsince childhood that “failure isthe mother of success” and that 

    “a fall in the pit is a gain in the wit”.N o w h e re is this folk wisdom more val-ued than in a maintenance depart-ment employing the tools of re l i a b i l i t y engineering. In such an enlightenede n v i ronment, failures — an imperson-al fact of life — can be leveraged andc o n v e rted to valuable knowledge fol-lowed up with productive action. Toachieve this lofty but attainable goal in

    our own opera tions we re q u i re asound quantitative approach to main-tenance uncert a i n t y. So, let’s beginour ascent on solid ground — the eas-ily conceptualized relative fre q u e n c y histogram of past failure s .

    The four basic functionsIn this section we discover the RelativeF requency Hist ogram and the fourbasic functions: 1) the Probability Den-sity Function; 2) the Cumulative Distri-bution Function; 3) the Reliability 

    Function; and 4) the Hazard Function. Ass um e tha t a po pu lati on of 48

    items purchased and placed in serviceat the beginning of the year all fail by November.

    List the failures in order of theirf a i l u re ages as in Figure 18. Gro u pthem in convenient time segments, inthis case by month. Plot the numberof failures in each time segment as inF i g u re 19 (on page 40). The high barsin the centre of Figure19, re p re s e n t the most “popular” ( or most pro b a-ble) failure times. By adding the num-ber of failures occurring prior to

     April , that is, 14, and dividing that by the total population of items, 48, we

    may est imate that the cumulativep robability of the item failing in thefirst quarter of the year is 14/48. Thep robability that all of the items will failb e f o re November is 48/48 or 1.

    By transforming the numbers of f a i l u res into probabilities in this way,the relative frequency histogram may be converted into a mathematically m o re useful form called the pro b a b i l i-ty density function (PDF). To do so,the data are replotted such that thea rea under the curve re p resents the

    When faced with uncertainty, our instinctive, human reaction is often indecision and distress. We would

    all prefer that the timing and outcome of our actions be known with certainty. Put another way, we

    would like all problems and their solutions to be deterministic . Problems where the timing and outcome

    of an action depend on chance are said to be probabilistic or  stochastic . In maintenance, however, we

    cannot reject the latter mode, since uncertainty is unavoidable. Rather, our goal is to quantify the

    uncertainties associated with significant maintenance decisions to reveal the course of action most likely

    to attain our objective. The methods described in this chapter will not only help you deal with

    uncertainty but may, we hope, persuade you to treat it as an ally, rather than a foe.

    chapter four

    Figure 18: Monthly failure ages for 48 in-service items

  • 8/20/2019 Reliability Handbook.pdf


  • 8/20/2019 Reliability Handbook.pdf


    from extensive life testing, or 2. Hypothesize it to be a certain pa-rameterized function whose parame-ters may be estimated via statisticalsampling techniques, and conductingn u m e rous statistical confidence tests. We will adopt the latter approa ch .

    F o rt u n a t e l y, we discover from past failure observations, that the probabilit y density functions, (hence their derivedreliabilit y, cumulative distribution, andhazard functions) of real data usually fit one of a number of mathematical for-mulas, whose characteristics are alread y familiar to reliability engineers. Theseknown distributions include the expo-nential, Weibull, Lognormal, and Nor-mal distributions. They can be fully described if one can estimate the theirparameters.

    For example the Weibull CDF is:

    F(t)= 1 – e - ( t_


    ,whe re  t >_0

    The parameters ß and η can be esti-mated from the data using the methodsto be described. Through one of thecommon distributions, once we willhave confidently estimated their param-eters, we may conveniently process ourfailure and replacement data. We do soby manipulating the statistical functions

     we learned in the previous section to: a)understand our problem; and b) fore-cast failures and analyze risk to makebetter maintenance decisions. Thosedecisions will impact upon the times we

    choose to replace, re p a i r, or overh a u lma c hi n e ry as well as help us optimizemany other maintenance decisions.

    The trick is threefold: 1) to collect good data; 2) to choose the appropriatefunction to re p resent our own situa-tion, then to estimate the function pa-rameters, and finally; 3) to evaluate the

    level of confidence we may have in theresulting model. Modern softwaremakes this process easy and fun. Muchmore than a toy for engineers, though,reliability software gives us the ability to

    communicate and share with our man-agement, the common goal of business— to devise and select procedures andpolicies which minimize cost and risk

     while maintaining and even increasingproduct quality and throughput.

    The four failure rate functions orhazard functions corresponding to the

    The Reliability Handbook   41

    Figure 20: Probability density, cumulative probability and reliability functions

  • 8/20/2019 Reliability Handbook.pdf


  • 8/20/2019 Reliability Handbook.pdf


    data re q u i red for reliability manage-ment slip through our fingers as werelentlessly pursue the ever elusivec o n t rol over our plant and pro d u c-tion equipments’ maintenance costs.Data management is the first step to- w a rds su ccessf ul physical ass et man-agement. Good data embodies richexperience from which one may l e a rn — and thereby improve upon —o n e ’s current maintenance manage-

    ment process. An essential role of upper leve l managers i s to placeample computer re s o u rces and scien-

    tific methodologies into the hands of tra ined maint enance pro f e s s i o n a l s who col lect, fi lte r, and process datafor the expressed purpose of guidingtheir decisions. The intention of thischapter is to inspire dedicated trades-men, planners, engineers, and man-agers to gather , earnest ly and

    c o n t i n u o u s l y, their own maintenanceand replacement data with re n e w e da p p reciation of its high value to theiro rg a n i z a t i o n ’s bottom line.

     Without doubt, the first step in any f o rw a rd looking activity is to get goodi n f o rmation. In importance, this stepoutweighs the subsequent analysissteps which may seem trivial by com-parison. The history of humanity isone of pro g ress built o n it s experi-

    ences. Yet there have been countlessmoments when opportunity wass q u a n d e red by neglect ing to collect 

    and process readily available data.Today many maintenance depart -ments , unfor t u n a t e l y, fall into that c a te g o ry. That is why one of thei mp o rtant measurem ents used by P r i c ew a t e rhouseC oopers’ Centre of Excellence in Maintenance Manage-ment to benchmark companies re l a-

    tive to world class industry best prac-tices, is the extent to which their datais fed back to guide their maintenancedecisions, tactics, and policies.

    Here are some examples of decisionsbased on reliability data management: A maintenance planner notes three in-service failures of a component during at h ree-month period. The superinten-dent asks, to help plan for adequateavailable labour, “How many failures will we have in the next quarter?” To order spare parts and schedulemaintenance labour, how many gear-boxes will be returned to the depot foro v e rhaul for each failure mode in thenext year? An effluent treatment system requiresa re gulat o ry shutdown overhaul when-ever the contaminant level exceeds atoxic limit for more than 60 seconds ina month. What level and frequency of maintenance is re qu i red to avoid pro-duction interruptions of this kind?

     After a design modification to elimi-nate a failure mode, how many unitsmust be tested and for how long to ver-ify that the old failure mode has beeneliminated, or significantly impro v e d with 90 percent confidence.  A haul truck fleet of transmissions isroutinely overhauled at 12,000 hours as

    44 The Reliability Handbook

    Ironically, we often let the minimal data required

    for reliability management to slip through our fingers,

    as we’re busy pursuing the ever-elusive control

    over the cost of maintenance in our plants and processes.

  • 8/20/2019 Reliability Handbook.pdf


  • 8/20/2019 Reliability Handbook.pdf


    The Reliability Handbook   49


    time basedmaintenanceTools for devising a replacement

    system for your critical componentsby Andrew K.S. Jardine

    We’ll also take a look at age andblock replacement strategieson the LRU level. This enables

    the software package RelCode to be in-troduced as a tool that can be used toassist maintenance managers optimizetheir LRU maintenance decisions. We will also take a look at the optimizingcriteria of cost minimization, availabili-ty maximization, and safety re q u i re-ments.

    Enhancing reliability throughpreventive replacementThe reliability of equipment can be en-hanced by preventively replacing criticalcomponents within the equipment at ap-propriate times. Just what the best time is

    depends on the overall objective, such ascost minimization or availability maxi-mization. While the best preventive re-placement time may be the same for bothcost minimization and availability maxi-mization, this is not necessarily the case.Data first needs to be obtained and ana-lyzed before it is possible to identify thebest preventive replacement time. In thischapter several optimizing pro ce du res will be presented that can be used withease to establish optimal preventive re-placement times for critical components.

    Block replacement policiesThe block replacement policy is some-times termed the group or constant i nt e rva l policy since preventive re-placement occurs at fixed intervals of time with failure replacements occur-ring whenever necessary. The policy isillustrated in Figure 22 where Cp andCf are the total costs associated withpreventive and failure replacement re-s p e c t i v e l y. tp is the fixed interval be-tween preventive replacements. In thef i g u re, you can see that for the first cycle there is no failure, while there aretwo in the second cycle and none in thet hird or fourth. As the interval betweenp reventi ve replacements is incre a s e dt h e re will be more failures occurr i n g

    between the preventive re p l a c e m e n t s

    and the optimization is to obtain thebest balance between the investment inp reventive replacements and the con-sequences of failure. This conflictingcase is illustrated in Figure 23 for a cost minimization criterion where C(tp) isthe total cost per week associated with apolicy of preventively replacing thecomponent at fixed intervals of lengthtp, with failure replacements occurring whenever necessary. The equat ion of the total cost curve is provided in sev-eral text books including the one by D u ffuaa, Raouff and Campbell [seeref. 1]. The following problem issolved using the software package Rel-Code, which incorporates the cost model, to establish the best pre ve n t i v e

    replacement interv a l .