26

Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network
57268
File Attachment
Coverjpg

TELECOMMUNICATIONSSYSTEM RELIABILITY

ENGINEERING THEORYAND PRACTICE

IEEE Press445 Hoes Lane

Piscataway NJ 08854

IEEE Press Editorial BoardJohn B Anderson Editor in Chief

R Abhari G W Arnold F CanaveroD Goldof B-M Haemmerli D JacobsonM Lanzerotti O P Malik S Nahavandi T Samad G Zobrist

Kenneth Moore Director of IEEE Book and Information Services (BIS)

Technical ReviewersGene Strid Vice President and Chief Technology Officer at GCI

ieee ed board_gridqxd 882012 842 PM Page 1

TELECOMMUNICATIONSSYSTEM RELIABILITY

ENGINEERING THEORYAND PRACTICE

Mark L Ayers

Cover Image Bill DonnelleyWT Design

Copyright 2012 by the Institute of Electrical and Electronics Engineers Inc

Published by John Wiley amp Sons Inc Hoboken New Jersey All rights reservedPublished simultaneously in Canada

No part of this publication may be reproduced stored in a retrieval system or transmitted in any form or byany means electronic mechanical photocopying recording scanning or otherwise except as permittedunder Section 107 or 108 of the 1976 United States Copyright Act without either the prior writtenpermission of the Publisher or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center Inc 222 Rosewood Drive Danvers MA 01923 (978) 750-8400fax (978) 750-4470 or on the web at wwwcopyrightcom Requests to the Publisher for permissionshould be addressed to the Permissions Department John Wiley amp Sons Inc 111 River Street HobokenNJ 07030 (201) 748-6011 fax (201) 748-6008 or online at httpwwwwileycomgopermission

Limit of LiabilityDisclaimer of Warranty While the publisher and author have used their best efforts inpreparing this book they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitness fora particular purpose No warranty may be created or extended by sales representatives or written salesmaterials The advice and strategies contained herein may not be suitable for your situation You shouldconsult with a professional where appropriate Neither the publisher nor author shall be liable for any loss ofprofit or any other commercial damages including but not limited to special incidental consequential orother damages

For general information on our other products and services or for technical support please contact ourCustomer Care Department within the United States at (800) 762-2974 outside the United Statesat (317) 572-3993 or fax (317) 572-4002

Wiley also publishes its books in a variety of electronic formats Some content that appears in printmay not be available in electronic formats For more information about Wiley products visit our web siteat wwwwileycom

Library of Congress Cataloging-in-Publication Data

Ayers Mark LTelecommunications system reliability engineering theory and practice Mark L Ayers

p cmISBN 978-1-118-13051-3 (hardback)1 Telecommunication systems I TitleTK5101A89 2012621382ndashdc23 2012013009

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

CONTENTS

List of Illustrations vii

Preface xiii

About the Author xv

Acronym List xvii

INTRODUCTION 1

1 RELIABILITY THEORY 7

11 System Metrics 8

12 Statistical Distributions 18

13 System Modeling Techniques 25

14 Systems with Repair 33

15 Markov Chain Models 35

16 Practical Markov System Models 41

17 Monte Carlo Simulation Models 47

18 Repair Period Models 58

19 Equipment Sparing 61

2 FIBER-OPTIC NETWORKS 71

21 Terrestrial Fiber-Optic Networks 71

21 Submarine Fiber-Optic Networks 84

3 MICROWAVE NETWORKS 95

31 Long-Haul Microwave Networks 96

32 Short-Haul Microwave Networks 117

33 Local Area Microwave Networks 124

4 SATELLITE NETWORKS 133

41 Propagation 134

42 Earth Stations 138

v

43 VSAT Earth Stations 140

44 Earth Stations 143

45 Spacecraft 156

46 Satellite Network Topologies 160

5 MOBILE WIRELESS NETWORKS 171

51 Mobile Wireless Equipment 172

52 Mobile Wireless Network Systems 182

6 TELECOMMUNICATIONS FACILITIES 187

61 Power Systems 188

62 Heating Ventilation and Air Conditioning Systems 207

7 SOFTWARE AND FIRMWARE 213

71 Software Failure Mechanisms 214

72 Software Failure Rate Modeling 216

73 Reliability and Availability of Systems with Software Components 220

References 227

Index 229

vi CONTENTS

LIST OF ILLUSTRATIONS

Chapter 1 Reliability Theory

Figure 11 Gaussian CDF and associated reliability function R(t)Figure 12 Average availability for system 1 (short duration

frequent outages) and system 2 (long duration infrequentoutages)

Figure 13 Bathtub curve for electronic systemsFigure 14 Exponential distribution PDF for varying values of lFigure 15 Exponential distribution CDF for varying values of lFigure 16 Normal distribution PDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 17 Normal distribution CDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 18 Weibull distributed random variable for submarine

fiber-optic cable TTRFigure 19 Series and parallel reliability block diagramsFigure 110 Series structure reliability block diagramFigure 111 Single-thread satellite link RF chainFigure 112 Parallel structure reliability block diagramFigure 113 Parallel satellite RF chain systemFigure 114 One-for-two (12) redundant HPA system block diagramFigure 115 Redundant Markov chain state diagramFigure 116 Redundant Markov chain state diagram identical

componentsFigure 117 Single-component Markov state transition diagramFigure 118 Hot-standby redundant Markov state transition diagramFigure 119 Cold-standby Markov state transition diagramFigure 120 Monte Carlo system analysis algorithmFigure 121 Component modelFigure 122 State vector algorithm flow chartFigure 123 Sample state vector algorithm outputFigure 124 Serial component state assessment flow diagramFigure 125 Parallel component state assessment flow diagramFigure 126 Exponentially distributed TTR with MTTRfrac14 8 hFigure 127 Normal distributed TTR with MTTRfrac14 8 h variancefrac14 2 hFigure 128 Centralized warehousing and dispatch sparing approachFigure 129 Territorial warehousing and dispatch sparing approachFigure 130 On-site sparing approach

vii

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 2: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

TELECOMMUNICATIONSSYSTEM RELIABILITY

ENGINEERING THEORYAND PRACTICE

IEEE Press445 Hoes Lane

Piscataway NJ 08854

IEEE Press Editorial BoardJohn B Anderson Editor in Chief

R Abhari G W Arnold F CanaveroD Goldof B-M Haemmerli D JacobsonM Lanzerotti O P Malik S Nahavandi T Samad G Zobrist

Kenneth Moore Director of IEEE Book and Information Services (BIS)

Technical ReviewersGene Strid Vice President and Chief Technology Officer at GCI

ieee ed board_gridqxd 882012 842 PM Page 1

TELECOMMUNICATIONSSYSTEM RELIABILITY

ENGINEERING THEORYAND PRACTICE

Mark L Ayers

Cover Image Bill DonnelleyWT Design

Copyright 2012 by the Institute of Electrical and Electronics Engineers Inc

Published by John Wiley amp Sons Inc Hoboken New Jersey All rights reservedPublished simultaneously in Canada

No part of this publication may be reproduced stored in a retrieval system or transmitted in any form or byany means electronic mechanical photocopying recording scanning or otherwise except as permittedunder Section 107 or 108 of the 1976 United States Copyright Act without either the prior writtenpermission of the Publisher or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center Inc 222 Rosewood Drive Danvers MA 01923 (978) 750-8400fax (978) 750-4470 or on the web at wwwcopyrightcom Requests to the Publisher for permissionshould be addressed to the Permissions Department John Wiley amp Sons Inc 111 River Street HobokenNJ 07030 (201) 748-6011 fax (201) 748-6008 or online at httpwwwwileycomgopermission

Limit of LiabilityDisclaimer of Warranty While the publisher and author have used their best efforts inpreparing this book they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitness fora particular purpose No warranty may be created or extended by sales representatives or written salesmaterials The advice and strategies contained herein may not be suitable for your situation You shouldconsult with a professional where appropriate Neither the publisher nor author shall be liable for any loss ofprofit or any other commercial damages including but not limited to special incidental consequential orother damages

For general information on our other products and services or for technical support please contact ourCustomer Care Department within the United States at (800) 762-2974 outside the United Statesat (317) 572-3993 or fax (317) 572-4002

Wiley also publishes its books in a variety of electronic formats Some content that appears in printmay not be available in electronic formats For more information about Wiley products visit our web siteat wwwwileycom

Library of Congress Cataloging-in-Publication Data

Ayers Mark LTelecommunications system reliability engineering theory and practice Mark L Ayers

p cmISBN 978-1-118-13051-3 (hardback)1 Telecommunication systems I TitleTK5101A89 2012621382ndashdc23 2012013009

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

CONTENTS

List of Illustrations vii

Preface xiii

About the Author xv

Acronym List xvii

INTRODUCTION 1

1 RELIABILITY THEORY 7

11 System Metrics 8

12 Statistical Distributions 18

13 System Modeling Techniques 25

14 Systems with Repair 33

15 Markov Chain Models 35

16 Practical Markov System Models 41

17 Monte Carlo Simulation Models 47

18 Repair Period Models 58

19 Equipment Sparing 61

2 FIBER-OPTIC NETWORKS 71

21 Terrestrial Fiber-Optic Networks 71

21 Submarine Fiber-Optic Networks 84

3 MICROWAVE NETWORKS 95

31 Long-Haul Microwave Networks 96

32 Short-Haul Microwave Networks 117

33 Local Area Microwave Networks 124

4 SATELLITE NETWORKS 133

41 Propagation 134

42 Earth Stations 138

v

43 VSAT Earth Stations 140

44 Earth Stations 143

45 Spacecraft 156

46 Satellite Network Topologies 160

5 MOBILE WIRELESS NETWORKS 171

51 Mobile Wireless Equipment 172

52 Mobile Wireless Network Systems 182

6 TELECOMMUNICATIONS FACILITIES 187

61 Power Systems 188

62 Heating Ventilation and Air Conditioning Systems 207

7 SOFTWARE AND FIRMWARE 213

71 Software Failure Mechanisms 214

72 Software Failure Rate Modeling 216

73 Reliability and Availability of Systems with Software Components 220

References 227

Index 229

vi CONTENTS

LIST OF ILLUSTRATIONS

Chapter 1 Reliability Theory

Figure 11 Gaussian CDF and associated reliability function R(t)Figure 12 Average availability for system 1 (short duration

frequent outages) and system 2 (long duration infrequentoutages)

Figure 13 Bathtub curve for electronic systemsFigure 14 Exponential distribution PDF for varying values of lFigure 15 Exponential distribution CDF for varying values of lFigure 16 Normal distribution PDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 17 Normal distribution CDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 18 Weibull distributed random variable for submarine

fiber-optic cable TTRFigure 19 Series and parallel reliability block diagramsFigure 110 Series structure reliability block diagramFigure 111 Single-thread satellite link RF chainFigure 112 Parallel structure reliability block diagramFigure 113 Parallel satellite RF chain systemFigure 114 One-for-two (12) redundant HPA system block diagramFigure 115 Redundant Markov chain state diagramFigure 116 Redundant Markov chain state diagram identical

componentsFigure 117 Single-component Markov state transition diagramFigure 118 Hot-standby redundant Markov state transition diagramFigure 119 Cold-standby Markov state transition diagramFigure 120 Monte Carlo system analysis algorithmFigure 121 Component modelFigure 122 State vector algorithm flow chartFigure 123 Sample state vector algorithm outputFigure 124 Serial component state assessment flow diagramFigure 125 Parallel component state assessment flow diagramFigure 126 Exponentially distributed TTR with MTTRfrac14 8 hFigure 127 Normal distributed TTR with MTTRfrac14 8 h variancefrac14 2 hFigure 128 Centralized warehousing and dispatch sparing approachFigure 129 Territorial warehousing and dispatch sparing approachFigure 130 On-site sparing approach

vii

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 3: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

IEEE Press445 Hoes Lane

Piscataway NJ 08854

IEEE Press Editorial BoardJohn B Anderson Editor in Chief

R Abhari G W Arnold F CanaveroD Goldof B-M Haemmerli D JacobsonM Lanzerotti O P Malik S Nahavandi T Samad G Zobrist

Kenneth Moore Director of IEEE Book and Information Services (BIS)

Technical ReviewersGene Strid Vice President and Chief Technology Officer at GCI

ieee ed board_gridqxd 882012 842 PM Page 1

TELECOMMUNICATIONSSYSTEM RELIABILITY

ENGINEERING THEORYAND PRACTICE

Mark L Ayers

Cover Image Bill DonnelleyWT Design

Copyright 2012 by the Institute of Electrical and Electronics Engineers Inc

Published by John Wiley amp Sons Inc Hoboken New Jersey All rights reservedPublished simultaneously in Canada

No part of this publication may be reproduced stored in a retrieval system or transmitted in any form or byany means electronic mechanical photocopying recording scanning or otherwise except as permittedunder Section 107 or 108 of the 1976 United States Copyright Act without either the prior writtenpermission of the Publisher or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center Inc 222 Rosewood Drive Danvers MA 01923 (978) 750-8400fax (978) 750-4470 or on the web at wwwcopyrightcom Requests to the Publisher for permissionshould be addressed to the Permissions Department John Wiley amp Sons Inc 111 River Street HobokenNJ 07030 (201) 748-6011 fax (201) 748-6008 or online at httpwwwwileycomgopermission

Limit of LiabilityDisclaimer of Warranty While the publisher and author have used their best efforts inpreparing this book they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitness fora particular purpose No warranty may be created or extended by sales representatives or written salesmaterials The advice and strategies contained herein may not be suitable for your situation You shouldconsult with a professional where appropriate Neither the publisher nor author shall be liable for any loss ofprofit or any other commercial damages including but not limited to special incidental consequential orother damages

For general information on our other products and services or for technical support please contact ourCustomer Care Department within the United States at (800) 762-2974 outside the United Statesat (317) 572-3993 or fax (317) 572-4002

Wiley also publishes its books in a variety of electronic formats Some content that appears in printmay not be available in electronic formats For more information about Wiley products visit our web siteat wwwwileycom

Library of Congress Cataloging-in-Publication Data

Ayers Mark LTelecommunications system reliability engineering theory and practice Mark L Ayers

p cmISBN 978-1-118-13051-3 (hardback)1 Telecommunication systems I TitleTK5101A89 2012621382ndashdc23 2012013009

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

CONTENTS

List of Illustrations vii

Preface xiii

About the Author xv

Acronym List xvii

INTRODUCTION 1

1 RELIABILITY THEORY 7

11 System Metrics 8

12 Statistical Distributions 18

13 System Modeling Techniques 25

14 Systems with Repair 33

15 Markov Chain Models 35

16 Practical Markov System Models 41

17 Monte Carlo Simulation Models 47

18 Repair Period Models 58

19 Equipment Sparing 61

2 FIBER-OPTIC NETWORKS 71

21 Terrestrial Fiber-Optic Networks 71

21 Submarine Fiber-Optic Networks 84

3 MICROWAVE NETWORKS 95

31 Long-Haul Microwave Networks 96

32 Short-Haul Microwave Networks 117

33 Local Area Microwave Networks 124

4 SATELLITE NETWORKS 133

41 Propagation 134

42 Earth Stations 138

v

43 VSAT Earth Stations 140

44 Earth Stations 143

45 Spacecraft 156

46 Satellite Network Topologies 160

5 MOBILE WIRELESS NETWORKS 171

51 Mobile Wireless Equipment 172

52 Mobile Wireless Network Systems 182

6 TELECOMMUNICATIONS FACILITIES 187

61 Power Systems 188

62 Heating Ventilation and Air Conditioning Systems 207

7 SOFTWARE AND FIRMWARE 213

71 Software Failure Mechanisms 214

72 Software Failure Rate Modeling 216

73 Reliability and Availability of Systems with Software Components 220

References 227

Index 229

vi CONTENTS

LIST OF ILLUSTRATIONS

Chapter 1 Reliability Theory

Figure 11 Gaussian CDF and associated reliability function R(t)Figure 12 Average availability for system 1 (short duration

frequent outages) and system 2 (long duration infrequentoutages)

Figure 13 Bathtub curve for electronic systemsFigure 14 Exponential distribution PDF for varying values of lFigure 15 Exponential distribution CDF for varying values of lFigure 16 Normal distribution PDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 17 Normal distribution CDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 18 Weibull distributed random variable for submarine

fiber-optic cable TTRFigure 19 Series and parallel reliability block diagramsFigure 110 Series structure reliability block diagramFigure 111 Single-thread satellite link RF chainFigure 112 Parallel structure reliability block diagramFigure 113 Parallel satellite RF chain systemFigure 114 One-for-two (12) redundant HPA system block diagramFigure 115 Redundant Markov chain state diagramFigure 116 Redundant Markov chain state diagram identical

componentsFigure 117 Single-component Markov state transition diagramFigure 118 Hot-standby redundant Markov state transition diagramFigure 119 Cold-standby Markov state transition diagramFigure 120 Monte Carlo system analysis algorithmFigure 121 Component modelFigure 122 State vector algorithm flow chartFigure 123 Sample state vector algorithm outputFigure 124 Serial component state assessment flow diagramFigure 125 Parallel component state assessment flow diagramFigure 126 Exponentially distributed TTR with MTTRfrac14 8 hFigure 127 Normal distributed TTR with MTTRfrac14 8 h variancefrac14 2 hFigure 128 Centralized warehousing and dispatch sparing approachFigure 129 Territorial warehousing and dispatch sparing approachFigure 130 On-site sparing approach

vii

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 4: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

TELECOMMUNICATIONSSYSTEM RELIABILITY

ENGINEERING THEORYAND PRACTICE

Mark L Ayers

Cover Image Bill DonnelleyWT Design

Copyright 2012 by the Institute of Electrical and Electronics Engineers Inc

Published by John Wiley amp Sons Inc Hoboken New Jersey All rights reservedPublished simultaneously in Canada

No part of this publication may be reproduced stored in a retrieval system or transmitted in any form or byany means electronic mechanical photocopying recording scanning or otherwise except as permittedunder Section 107 or 108 of the 1976 United States Copyright Act without either the prior writtenpermission of the Publisher or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center Inc 222 Rosewood Drive Danvers MA 01923 (978) 750-8400fax (978) 750-4470 or on the web at wwwcopyrightcom Requests to the Publisher for permissionshould be addressed to the Permissions Department John Wiley amp Sons Inc 111 River Street HobokenNJ 07030 (201) 748-6011 fax (201) 748-6008 or online at httpwwwwileycomgopermission

Limit of LiabilityDisclaimer of Warranty While the publisher and author have used their best efforts inpreparing this book they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitness fora particular purpose No warranty may be created or extended by sales representatives or written salesmaterials The advice and strategies contained herein may not be suitable for your situation You shouldconsult with a professional where appropriate Neither the publisher nor author shall be liable for any loss ofprofit or any other commercial damages including but not limited to special incidental consequential orother damages

For general information on our other products and services or for technical support please contact ourCustomer Care Department within the United States at (800) 762-2974 outside the United Statesat (317) 572-3993 or fax (317) 572-4002

Wiley also publishes its books in a variety of electronic formats Some content that appears in printmay not be available in electronic formats For more information about Wiley products visit our web siteat wwwwileycom

Library of Congress Cataloging-in-Publication Data

Ayers Mark LTelecommunications system reliability engineering theory and practice Mark L Ayers

p cmISBN 978-1-118-13051-3 (hardback)1 Telecommunication systems I TitleTK5101A89 2012621382ndashdc23 2012013009

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

CONTENTS

List of Illustrations vii

Preface xiii

About the Author xv

Acronym List xvii

INTRODUCTION 1

1 RELIABILITY THEORY 7

11 System Metrics 8

12 Statistical Distributions 18

13 System Modeling Techniques 25

14 Systems with Repair 33

15 Markov Chain Models 35

16 Practical Markov System Models 41

17 Monte Carlo Simulation Models 47

18 Repair Period Models 58

19 Equipment Sparing 61

2 FIBER-OPTIC NETWORKS 71

21 Terrestrial Fiber-Optic Networks 71

21 Submarine Fiber-Optic Networks 84

3 MICROWAVE NETWORKS 95

31 Long-Haul Microwave Networks 96

32 Short-Haul Microwave Networks 117

33 Local Area Microwave Networks 124

4 SATELLITE NETWORKS 133

41 Propagation 134

42 Earth Stations 138

v

43 VSAT Earth Stations 140

44 Earth Stations 143

45 Spacecraft 156

46 Satellite Network Topologies 160

5 MOBILE WIRELESS NETWORKS 171

51 Mobile Wireless Equipment 172

52 Mobile Wireless Network Systems 182

6 TELECOMMUNICATIONS FACILITIES 187

61 Power Systems 188

62 Heating Ventilation and Air Conditioning Systems 207

7 SOFTWARE AND FIRMWARE 213

71 Software Failure Mechanisms 214

72 Software Failure Rate Modeling 216

73 Reliability and Availability of Systems with Software Components 220

References 227

Index 229

vi CONTENTS

LIST OF ILLUSTRATIONS

Chapter 1 Reliability Theory

Figure 11 Gaussian CDF and associated reliability function R(t)Figure 12 Average availability for system 1 (short duration

frequent outages) and system 2 (long duration infrequentoutages)

Figure 13 Bathtub curve for electronic systemsFigure 14 Exponential distribution PDF for varying values of lFigure 15 Exponential distribution CDF for varying values of lFigure 16 Normal distribution PDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 17 Normal distribution CDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 18 Weibull distributed random variable for submarine

fiber-optic cable TTRFigure 19 Series and parallel reliability block diagramsFigure 110 Series structure reliability block diagramFigure 111 Single-thread satellite link RF chainFigure 112 Parallel structure reliability block diagramFigure 113 Parallel satellite RF chain systemFigure 114 One-for-two (12) redundant HPA system block diagramFigure 115 Redundant Markov chain state diagramFigure 116 Redundant Markov chain state diagram identical

componentsFigure 117 Single-component Markov state transition diagramFigure 118 Hot-standby redundant Markov state transition diagramFigure 119 Cold-standby Markov state transition diagramFigure 120 Monte Carlo system analysis algorithmFigure 121 Component modelFigure 122 State vector algorithm flow chartFigure 123 Sample state vector algorithm outputFigure 124 Serial component state assessment flow diagramFigure 125 Parallel component state assessment flow diagramFigure 126 Exponentially distributed TTR with MTTRfrac14 8 hFigure 127 Normal distributed TTR with MTTRfrac14 8 h variancefrac14 2 hFigure 128 Centralized warehousing and dispatch sparing approachFigure 129 Territorial warehousing and dispatch sparing approachFigure 130 On-site sparing approach

vii

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 5: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

Cover Image Bill DonnelleyWT Design

Copyright 2012 by the Institute of Electrical and Electronics Engineers Inc

Published by John Wiley amp Sons Inc Hoboken New Jersey All rights reservedPublished simultaneously in Canada

No part of this publication may be reproduced stored in a retrieval system or transmitted in any form or byany means electronic mechanical photocopying recording scanning or otherwise except as permittedunder Section 107 or 108 of the 1976 United States Copyright Act without either the prior writtenpermission of the Publisher or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center Inc 222 Rosewood Drive Danvers MA 01923 (978) 750-8400fax (978) 750-4470 or on the web at wwwcopyrightcom Requests to the Publisher for permissionshould be addressed to the Permissions Department John Wiley amp Sons Inc 111 River Street HobokenNJ 07030 (201) 748-6011 fax (201) 748-6008 or online at httpwwwwileycomgopermission

Limit of LiabilityDisclaimer of Warranty While the publisher and author have used their best efforts inpreparing this book they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitness fora particular purpose No warranty may be created or extended by sales representatives or written salesmaterials The advice and strategies contained herein may not be suitable for your situation You shouldconsult with a professional where appropriate Neither the publisher nor author shall be liable for any loss ofprofit or any other commercial damages including but not limited to special incidental consequential orother damages

For general information on our other products and services or for technical support please contact ourCustomer Care Department within the United States at (800) 762-2974 outside the United Statesat (317) 572-3993 or fax (317) 572-4002

Wiley also publishes its books in a variety of electronic formats Some content that appears in printmay not be available in electronic formats For more information about Wiley products visit our web siteat wwwwileycom

Library of Congress Cataloging-in-Publication Data

Ayers Mark LTelecommunications system reliability engineering theory and practice Mark L Ayers

p cmISBN 978-1-118-13051-3 (hardback)1 Telecommunication systems I TitleTK5101A89 2012621382ndashdc23 2012013009

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

CONTENTS

List of Illustrations vii

Preface xiii

About the Author xv

Acronym List xvii

INTRODUCTION 1

1 RELIABILITY THEORY 7

11 System Metrics 8

12 Statistical Distributions 18

13 System Modeling Techniques 25

14 Systems with Repair 33

15 Markov Chain Models 35

16 Practical Markov System Models 41

17 Monte Carlo Simulation Models 47

18 Repair Period Models 58

19 Equipment Sparing 61

2 FIBER-OPTIC NETWORKS 71

21 Terrestrial Fiber-Optic Networks 71

21 Submarine Fiber-Optic Networks 84

3 MICROWAVE NETWORKS 95

31 Long-Haul Microwave Networks 96

32 Short-Haul Microwave Networks 117

33 Local Area Microwave Networks 124

4 SATELLITE NETWORKS 133

41 Propagation 134

42 Earth Stations 138

v

43 VSAT Earth Stations 140

44 Earth Stations 143

45 Spacecraft 156

46 Satellite Network Topologies 160

5 MOBILE WIRELESS NETWORKS 171

51 Mobile Wireless Equipment 172

52 Mobile Wireless Network Systems 182

6 TELECOMMUNICATIONS FACILITIES 187

61 Power Systems 188

62 Heating Ventilation and Air Conditioning Systems 207

7 SOFTWARE AND FIRMWARE 213

71 Software Failure Mechanisms 214

72 Software Failure Rate Modeling 216

73 Reliability and Availability of Systems with Software Components 220

References 227

Index 229

vi CONTENTS

LIST OF ILLUSTRATIONS

Chapter 1 Reliability Theory

Figure 11 Gaussian CDF and associated reliability function R(t)Figure 12 Average availability for system 1 (short duration

frequent outages) and system 2 (long duration infrequentoutages)

Figure 13 Bathtub curve for electronic systemsFigure 14 Exponential distribution PDF for varying values of lFigure 15 Exponential distribution CDF for varying values of lFigure 16 Normal distribution PDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 17 Normal distribution CDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 18 Weibull distributed random variable for submarine

fiber-optic cable TTRFigure 19 Series and parallel reliability block diagramsFigure 110 Series structure reliability block diagramFigure 111 Single-thread satellite link RF chainFigure 112 Parallel structure reliability block diagramFigure 113 Parallel satellite RF chain systemFigure 114 One-for-two (12) redundant HPA system block diagramFigure 115 Redundant Markov chain state diagramFigure 116 Redundant Markov chain state diagram identical

componentsFigure 117 Single-component Markov state transition diagramFigure 118 Hot-standby redundant Markov state transition diagramFigure 119 Cold-standby Markov state transition diagramFigure 120 Monte Carlo system analysis algorithmFigure 121 Component modelFigure 122 State vector algorithm flow chartFigure 123 Sample state vector algorithm outputFigure 124 Serial component state assessment flow diagramFigure 125 Parallel component state assessment flow diagramFigure 126 Exponentially distributed TTR with MTTRfrac14 8 hFigure 127 Normal distributed TTR with MTTRfrac14 8 h variancefrac14 2 hFigure 128 Centralized warehousing and dispatch sparing approachFigure 129 Territorial warehousing and dispatch sparing approachFigure 130 On-site sparing approach

vii

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 6: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

CONTENTS

List of Illustrations vii

Preface xiii

About the Author xv

Acronym List xvii

INTRODUCTION 1

1 RELIABILITY THEORY 7

11 System Metrics 8

12 Statistical Distributions 18

13 System Modeling Techniques 25

14 Systems with Repair 33

15 Markov Chain Models 35

16 Practical Markov System Models 41

17 Monte Carlo Simulation Models 47

18 Repair Period Models 58

19 Equipment Sparing 61

2 FIBER-OPTIC NETWORKS 71

21 Terrestrial Fiber-Optic Networks 71

21 Submarine Fiber-Optic Networks 84

3 MICROWAVE NETWORKS 95

31 Long-Haul Microwave Networks 96

32 Short-Haul Microwave Networks 117

33 Local Area Microwave Networks 124

4 SATELLITE NETWORKS 133

41 Propagation 134

42 Earth Stations 138

v

43 VSAT Earth Stations 140

44 Earth Stations 143

45 Spacecraft 156

46 Satellite Network Topologies 160

5 MOBILE WIRELESS NETWORKS 171

51 Mobile Wireless Equipment 172

52 Mobile Wireless Network Systems 182

6 TELECOMMUNICATIONS FACILITIES 187

61 Power Systems 188

62 Heating Ventilation and Air Conditioning Systems 207

7 SOFTWARE AND FIRMWARE 213

71 Software Failure Mechanisms 214

72 Software Failure Rate Modeling 216

73 Reliability and Availability of Systems with Software Components 220

References 227

Index 229

vi CONTENTS

LIST OF ILLUSTRATIONS

Chapter 1 Reliability Theory

Figure 11 Gaussian CDF and associated reliability function R(t)Figure 12 Average availability for system 1 (short duration

frequent outages) and system 2 (long duration infrequentoutages)

Figure 13 Bathtub curve for electronic systemsFigure 14 Exponential distribution PDF for varying values of lFigure 15 Exponential distribution CDF for varying values of lFigure 16 Normal distribution PDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 17 Normal distribution CDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 18 Weibull distributed random variable for submarine

fiber-optic cable TTRFigure 19 Series and parallel reliability block diagramsFigure 110 Series structure reliability block diagramFigure 111 Single-thread satellite link RF chainFigure 112 Parallel structure reliability block diagramFigure 113 Parallel satellite RF chain systemFigure 114 One-for-two (12) redundant HPA system block diagramFigure 115 Redundant Markov chain state diagramFigure 116 Redundant Markov chain state diagram identical

componentsFigure 117 Single-component Markov state transition diagramFigure 118 Hot-standby redundant Markov state transition diagramFigure 119 Cold-standby Markov state transition diagramFigure 120 Monte Carlo system analysis algorithmFigure 121 Component modelFigure 122 State vector algorithm flow chartFigure 123 Sample state vector algorithm outputFigure 124 Serial component state assessment flow diagramFigure 125 Parallel component state assessment flow diagramFigure 126 Exponentially distributed TTR with MTTRfrac14 8 hFigure 127 Normal distributed TTR with MTTRfrac14 8 h variancefrac14 2 hFigure 128 Centralized warehousing and dispatch sparing approachFigure 129 Territorial warehousing and dispatch sparing approachFigure 130 On-site sparing approach

vii

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 7: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

43 VSAT Earth Stations 140

44 Earth Stations 143

45 Spacecraft 156

46 Satellite Network Topologies 160

5 MOBILE WIRELESS NETWORKS 171

51 Mobile Wireless Equipment 172

52 Mobile Wireless Network Systems 182

6 TELECOMMUNICATIONS FACILITIES 187

61 Power Systems 188

62 Heating Ventilation and Air Conditioning Systems 207

7 SOFTWARE AND FIRMWARE 213

71 Software Failure Mechanisms 214

72 Software Failure Rate Modeling 216

73 Reliability and Availability of Systems with Software Components 220

References 227

Index 229

vi CONTENTS

LIST OF ILLUSTRATIONS

Chapter 1 Reliability Theory

Figure 11 Gaussian CDF and associated reliability function R(t)Figure 12 Average availability for system 1 (short duration

frequent outages) and system 2 (long duration infrequentoutages)

Figure 13 Bathtub curve for electronic systemsFigure 14 Exponential distribution PDF for varying values of lFigure 15 Exponential distribution CDF for varying values of lFigure 16 Normal distribution PDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 17 Normal distribution CDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 18 Weibull distributed random variable for submarine

fiber-optic cable TTRFigure 19 Series and parallel reliability block diagramsFigure 110 Series structure reliability block diagramFigure 111 Single-thread satellite link RF chainFigure 112 Parallel structure reliability block diagramFigure 113 Parallel satellite RF chain systemFigure 114 One-for-two (12) redundant HPA system block diagramFigure 115 Redundant Markov chain state diagramFigure 116 Redundant Markov chain state diagram identical

componentsFigure 117 Single-component Markov state transition diagramFigure 118 Hot-standby redundant Markov state transition diagramFigure 119 Cold-standby Markov state transition diagramFigure 120 Monte Carlo system analysis algorithmFigure 121 Component modelFigure 122 State vector algorithm flow chartFigure 123 Sample state vector algorithm outputFigure 124 Serial component state assessment flow diagramFigure 125 Parallel component state assessment flow diagramFigure 126 Exponentially distributed TTR with MTTRfrac14 8 hFigure 127 Normal distributed TTR with MTTRfrac14 8 h variancefrac14 2 hFigure 128 Centralized warehousing and dispatch sparing approachFigure 129 Territorial warehousing and dispatch sparing approachFigure 130 On-site sparing approach

vii

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 8: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

LIST OF ILLUSTRATIONS

Chapter 1 Reliability Theory

Figure 11 Gaussian CDF and associated reliability function R(t)Figure 12 Average availability for system 1 (short duration

frequent outages) and system 2 (long duration infrequentoutages)

Figure 13 Bathtub curve for electronic systemsFigure 14 Exponential distribution PDF for varying values of lFigure 15 Exponential distribution CDF for varying values of lFigure 16 Normal distribution PDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 17 Normal distribution CDF of TTR where mfrac14 8 h and sfrac14 2 hFigure 18 Weibull distributed random variable for submarine

fiber-optic cable TTRFigure 19 Series and parallel reliability block diagramsFigure 110 Series structure reliability block diagramFigure 111 Single-thread satellite link RF chainFigure 112 Parallel structure reliability block diagramFigure 113 Parallel satellite RF chain systemFigure 114 One-for-two (12) redundant HPA system block diagramFigure 115 Redundant Markov chain state diagramFigure 116 Redundant Markov chain state diagram identical

componentsFigure 117 Single-component Markov state transition diagramFigure 118 Hot-standby redundant Markov state transition diagramFigure 119 Cold-standby Markov state transition diagramFigure 120 Monte Carlo system analysis algorithmFigure 121 Component modelFigure 122 State vector algorithm flow chartFigure 123 Sample state vector algorithm outputFigure 124 Serial component state assessment flow diagramFigure 125 Parallel component state assessment flow diagramFigure 126 Exponentially distributed TTR with MTTRfrac14 8 hFigure 127 Normal distributed TTR with MTTRfrac14 8 h variancefrac14 2 hFigure 128 Centralized warehousing and dispatch sparing approachFigure 129 Territorial warehousing and dispatch sparing approachFigure 130 On-site sparing approach

vii

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 9: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

Chapter 2 Fiber-Optic Networks

Figure 21 Shallow-buried fiber-optic cable installation example inwestern Alaska

Figure 22 Terrestrial fiber-optic cable TTF model PDF and CDFFigure 23 Terrestrial fiber-optic cable TTR model PDF and CDFFigure 24 Monte Carlo simulation results for terrestrial fiber-optic

cableFigure 25 Terrestrial fiber-optic terminal functional block diagramFigure 26 Unprotected fiber-optic network system block diagramFigure 27 Unprotected fiber-optic network reliability block diagramFigure 28 UPSR ring network topology normal operationFigure 29 UPSR ring network topology fiber path failureFigure 210 UPSR ring network topology transceiver failureFigure 211 Example SONET network topology for Monte Carlo

analysisFigure 212 UPSR system model rule set flow chartFigure 213 UPSR system model simulation resultsFigure 214 Submarine fiber-optic network block diagramFigure 215 Submarine line terminal equipment functional block diagramFigure 216 Power feed equipment operation nominal and failureFigure 217 Normal distributed submarine cable TTR modelFigure 218 Sample submarine system with 10 periodic repeatersFigure 219 Submarine repeater RBD

Chapter 3 Microwave Networks

Figure 31 Long-haul microwave network tower in western AlaskaFigure 32 Multipath signal propagationFigure 33 Multipath outage event model using uniform occurrence

distributionFigure 34 Multihop microwave radio link in a low-intensity rain regionFigure 35 Long-haul microwave radio block diagramFigure 36 Microwave tower damaged by ice formationFigure 37 Ice bridge infrastructure damaged by ice formationFigure 38 Long-haul microwave antenna mount damaged by

ice formationFigure 39 Sample microwave radio block diagramFigure 310 Two-hop radio transceiver system (one-for-two redundancy)Figure 311 Single-thread transceiver system RBDFigure 312 One-for-one redundant transceiver system RBDFigure 313 One-for-two redundant transceiver system RBDFigure 314 Two-hop radio link serial transceiver RBDFigure 315 Microwave TRX path reliability comparisonFigure 316 Long-haul microwave network multiplexed baseband

OC-3 interface

viii LIST OF ILLUSTRATIONS

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 10: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

Figure 317 Single-hop long-haul microwave network block diagramFigure 318 Single-hop long-haul microwave radio system model rule setFigure 319 Single-hop long-haul microwave radio system availabilityFigure 320 Single-hop long-haul microwave radio downtime distributionFigure 321 Three-hop long-haul microwave availability analysisFigure 322 Short-haul microwave fiber optic ring network restoral pathFigure 323 Short-haul microwave cellular network backhaul

applicationFigure 324 Short-haul microwave urban structure applicationFigure 325 Short-haul cellular backhaul microwave radioFigure 326 Unlicensed short-haul commercial service microwave radioFigure 327 Short-haul microwave availability for redundant and

single-thread designs at varying MTTR valuesFigure 328 Point-to-point versus local area network topology

failure modesFigure 329 Generic local area microwave network elementsFigure 330 Local area wireless network heat map coverage regionFigure 331 Wi-Fi access point functional block diagramFigure 332 Radio design types integrated versus split (ODUIDU)Figure 333 Sample Wi-Fi local area wireless network diagram

Chapter 4 Satellite Networks

Figure 41 Satellite earth station multipath condition sketchFigure 42 Generalized satellite earth station equipment complementFigure 43 Remote VSAT signal chain block diagramFigure 44 VSAT station reliability block diagramFigure 45 C-band satellite earth station constructed in Nome AlaskaFigure 46 Typical earth station RF chain block diagramFigure 47 Nonredundant earth station reliability block diagramFigure 48 Fully redundant earth station system block diagramFigure 49 One-for-two redundant Markov failure state transition diagramFigure 410 Modular satellite power amplifier system block diagramFigure 411 Modular SSPA MTTR distribution modelFigure 412 Modular SSPA system availability for three-out-of-four

configurationFigure 413 Modular SSPA system availability for seven-out-of-eight

configurationFigure 414 In-orbit spare satellite diagramFigure 415 Satellite capacity restoral by in-orbit spare moveFigure 416 Satellite capacity restoral by ground station repointingFigure 417 Hubremote satellite network topologyFigure 418 Ku-band hubremote VSAT network block diagramFigure 419 Ku-band VSAT hub station block diagramFigure 420 Bidirectional point-to-point satellite network block diagram

LIST OF ILLUSTRATIONS ix

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 11: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

Chapter 5 Mobile Wireless Networks

Figure 51 GSM network block diagramFigure 52 Distributed MSC network block diagramFigure 53 Distributed MSC failure scenario and service continuityFigure 54 Base station subsystem block diagramFigure 55 Mobile wireless base station TRX configurationFigure 56 Markov chain state transition diagram for BTS TRX

modulesFigure 57 Base station overlap and probability of coverage by

multiple stationsFigure 58 Network switching subsystem packet switching redundancyFigure 59 Example GSM cellular wireless network

Chapter 6 Telecommunications Facilities

Figure 61 Primary power system redundancy configurationsFigure 62 Weibull distribution fit to transformer TTF and downtime

empirical dataFigure 63 Single-thread generator system block diagramFigure 64 Single-thread generator TTF and TTR for a village

environmentFigure 65 Single-thread generator system availabilityFigure 66 Cold-standby redundant generator system block diagramFigure 67 Cold-standby redundant generator system availabilityFigure 68 Load-sharing generator system block diagramFigure 69 Load-sharing generator system relaxed TTR modelFigure 610 Load-sharing generator system availabilityFigure 611 Modular rectifier system block diagramFigure 612 1N and soft-fail rectifier design descriptionsFigure 613 Soft-fail rectifier system availability distributionFigure 614 48 VDC battery plant block diagramFigure 615 Normal distributed TTR with mfrac14 12 h and sfrac14 3 hFigure 616 Availability performance versus battery capacity for

single-thread and cold-standby generator systemsFigure 617 Fiberglass communications shelter dimensionsFigure 618 Room air temperature increase rate for two AC scenarios

Chapter 7 Software and Firmware

Figure 71 Sample hardware and software failure rate versus time curvecomparison

Figure 72 Software reliability improvement failure rate functionFigure 73 Software feature addition and upgrade failure rate functionFigure 74 Aggregate software failure rate trajectory for reliability

improvement and feature addition

x LIST OF ILLUSTRATIONS

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 12: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

Figure 75 Component block diagram consisting of hardware and softwareFigure 76 Discrete hardware and software component reliability functionsFigure 77 Total component reliability function for hardware and softwareFigure 78 Sample software TTR distributionFigure 79 Software and hardware component availability distributionsFigure 710 Combined component availability including software and

hardware components

LIST OF ILLUSTRATIONS xi

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 13: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

PREFACE

The topic of reliability is somewhat obscure within the field of electrical (andultimately communications) engineering Most engineers are familiar with theconcept of reliability as it relates to their automobile electronic device or homebut performing a rigorous mathematical analysis is not always a comfortable orfamiliar task The quantitative treatment of reliability has a long-standing traditionwithin the field of telecommunications dating back to the early days of BellLaboratories

Modern society has developed an insatiable dependence on communicationtechnology that demands a complete understanding and analysis of system reliabilityAlthough the technical innovations developed in modern communications areastonishing engineering marvels the reliability analysis of these systems can some-times be treated as a cursory afterthought Even in cases where analysis of systemreliability and availability performance is treated with the highest concern thesophistication of analysis techniques is frequently lagging behind the technicaldevelopment itself

The content in this book is a compilation of years of research and analysis of manydifferent telecommunications systems During the compilation of this research twoprimary points became evident to me First most communications engineers understandthe need for reliability and availability analysis but lack the technical skill andknowledge to execute these analyses confidently Second modern communicationsnetwork topologies demand an approach to analysis that goes beyond the traditionalreliability block diagram and exponential distribution assumptions Modern computingplatforms enable engineers to exploit analysis techniques not possible in the days whenthe Bell Laboratoriesrsquo techniques were developed and presented This book presentstechniques that utilize computer simulation and random variable models not feasible20 years ago I hope that readers of this book find within it a useful resource that I foundabsent in the academic literatures during my research and analysis of communicationssystem reliability Although compilation of the data in this book took me years it is mydesire to convey this information to the reader in a matter of hours enabling engineers toanalyze complex problems using basic tools and theories

I would like to thank Tom Plevyak and Veli Sahin for their editing and review of thisbook Their help in producing this book has been instrumental to its completion andquality

xiii

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 14: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

I would also like to thank Gene Strid for his contributions to my career and to thedevelopment of this book His mentoring spirit and attention to detail have had asignificant influence on my personal development as a professional engineer Genersquostechnical review of this book alone is impressive in its detail and breadth Thank youGene for everything you have done to help me remain inspired to grow and learn as anengineer and a leader

xiv PREFACE

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 15: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

ABOUT THE AUTHOR

Mark Ayers is the Manager of RF Engineering at GCI Communications Corporationheadquartered in Anchorage Alaska Mark has a broad range of telecommunicationsexperience including work in fiber optics microwave radio and satellite networkdesigns Mark holds a BS degree in Mathematics from the University of AlaskaAnchorage and an MS degree in Electrical Engineering from the University of AlaskaFairbanks Fairbanks Alaska He is a registered Professional Electrical Engineer in theState of Alaska and a Senior Member of the IEEE Mark teaches a variety of courses asan Adjunct Faculty Member in the Engineering Department at the University of AlaskaAnchorage His primary interests are systems design modeling and optimization

xv

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 16: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

ACRONYM LIST

AC Alternating currentACM Adaptive coding and modulationAGM Absorbed glass matAP Access pointAuC Authentication centerBLSR Bidirectional line switched ringBSC Base station controllerBTS Base transceiver stationBTU British thermal unitBUC Block upconverterCDF Cumulative distribution functionCDMA Code division multiple accessCOTS Commercial off the shelfCPE Customer premise equipmentCRAC Computer room air conditionerDC Direct currentEDFA Erbium-doped fiber amplifierEIR Equipment identity registerEIRP Equivalent isotropic radiated powerFCC Federal Communications CommissionFITs Failures in timeFMEA Failure mode and effects analysisFPGA Field-programmable gate arrayFSS Fixed satellite systemGSM Global system for mobile communicationsHLR Home location registerHVAC Heating ventilation and air conditioningIDU Indoor unitIEEE Institute for Electrical and Electronics EngineersISM Industrial scientific and medicalITU International Telecommunications UnionLHS Lefthand sideLNA Low-noise amplifierLNB Low-noise blockLTE Line-terminating equipment

xvii

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 17: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

LTE Long-term evolutionMDT Mean downtimeMODEM ModulatordemodulatorMSC Mobile switching centerMTBF Mean time between failuresMTTF Mean time to failureMTTR Mean time to repairNASA National Air and Space AdministrationNSS Network switching subsystemOC-n Optical carrier level nODU Outdoor unitPDF Probability density functionPFE Power feed equipmentPM Preventative maintenanceRBD Reliability block diagramRF Radio frequencyRHS Righthand sideRMA Return material authorizationRSL Received signal levelSDH Synchronous digital hierarchySES Severely error secondSLA Service-level agreementSLTE Submarine line-terminating equipmentSMS Short message serviceSONET Synchronous optical networkSP Service providerSRGM Software reliability growth modelSSPA Solid state power amplifierTDM Time domain multiplexingTRX TransceiverTTF Time to failureTTR Time to repairUMTS Universal mobile telecommunications systemUPS Uninterruptable power supplyUPSR Unidirectional path switched ringVLR Visitor location registerVLSI Very large-scale integrationVRLA Valve-regulated lead acidVSAT Very-small-aperture terminalWiFi Wireless fidelityXPIC Cross-polarization interference cancellation

xviii ACRONYM LIST

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 18: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

INTRODUCTION

The concept of reliability is pervasive It affects our attitudes and impacts our decisionson a daily basis Its importance would imply that everyone has a clear understanding ofreliability from a technical perspective Unfortunately the general public typicallyequates emotion and perception with reliability In many cases even technically mindedpeople do not have a clear quantitative understanding of reliability as a measure ofperformance

Reliability engineering is a relatively new field Although the term reliability has along history it was not until the twentieth century that reliability began to take on aquantitative meaning In the early twentieth century the concept of reliabilityengineering began to take form as the industrial revolution brought about mechanicaland electronic systems such as the automobile and the telegraph Large-scaleproduction resulted in an increased awareness of item failure and performance andits impact on business During the 1930s Wallodie Weibull began documenting hiswork on the measurement and definition of material fatigue behavior The result of hiswork is theWeibull distribution one of the most widely used statistical distributions inreliability engineering The Second World War brought about the formalization ofreliability engineering as a field of study The advent of radar and other electronic

1

Telecommunications System Reliability Engineering Theory and Practice Mark L Ayers 2012 by the Institute of Electrical and Electronics Engineers Inc Published 2012 by John Wiley amp Sons Inc

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 19: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

warfare systems identified further the need to begin quantifying reliability and itsimpacts on mission success During the Second World War vacuum tubes wereextensively used in many electronic systems The low reliability of early vacuum tubesled to both poor system performance and high maintenance costs The IEEE Reli-ability Society was formed in 1948 as a result of the increasing focus on reliability inelectronic systems

Following the SecondWorldWar reliability engineering began to find applicationsin both military and commercial environments System reliability was studied from alife-cycle standpoint including component design quality control and failure analysisSpace exploration in the 1960s continued the need for a life-cycle approach to reliabilityengineering The establishment of NASA and an interest in nuclear power generationbecame driving forces for the development of highly reliable components and systemsLaunching commercial communications satellites by INTELSAT and landing onmoon by the United States proved the importance of reliability engineering aspart of the system engineering process at the end of the 1960s Semiconductordevelopment military applications communications systems biomedical researchand software-based systems in the 1980s led to new work in both system designand reliability analysis Improved component design and quality control led tosignificant improvements in reliability performance Consumer awareness and com-mercial focus in the 1990s and 2000s led to the current state of reliability engineering intodayrsquos society Most consumers are unconsciously aware of reliability as a measure ofan itemrsquos performance and overall value Engineers and technical resources are aware ofan itemrsquos reliability in a more quantitative sense but many times this understanding isneither complete nor found in solid reliability engineering principles

The presentation of reliability data whether qualitative or quantitative must bebased in solid theory In many cases reliability data is used to make business andtechnical decisions with far-reaching implications Predictive analysis is typically thefirst step in the reliability engineering process Target performance measures are used toguide the design process and ensure that system design is compliant with systemperformance targets Modern predictive reliability analysis utilizes statistical modelingof component failures These statistical models are used to predict a number of expectedsystem performance measures Changing the system topology or design and reanalyzingsystem performance allows engineering to do costperformance trade-off analyses Theanalyses can then be used to make business and technical decisions about the best designthat meets target requirements

Once a design has been selected and constructed it is important to collect empiricaldata This data allows the engineer or the operator to measure system performance andcompare that performance with expected or predicted data Empirical data collection isparticularly important in large production environments where statistical behavior canbe observed These observations can be tabulated and compared with the predicted orassumed behavior refining the system model and improving future predictions anddecisions In some cases empirical data can be directly used to analyze the predictedperformance of a new system One must be careful when using empirical data forpredictive analysis because it is rare to find an existing system that exactly matches anew design

2 INTRODUCTION

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 20: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

One of the most significant benefits of empirical analysis and data collection isfailure mode and effects analysis (FMEA) This analysis approach allows the engineerto identify systemic problems and design flaws by observing the failure of componentsor systems using this data to improve future performance Operational models andprocesses can be adjusted based on failure data and root cause analysis

Telecommunications systems have a long history of reliability-based design Thesedesign criteria are typically specified in terms of availability rather than reliabilityAvailability is another measure of statistical system performance and is indicative of asystemrsquos ldquouptimerdquo or available time for service delivery In many cases servicecontracts or service-level agreements (SLAs) are specified in terms of availabilityService providers (SPs) will sign a contract to provide a service that has specific targetprobability of being available or a target maximum downtime over a specific timeinterval Both of these measures are metrics of availability Without predictive andorempirical data to ensure compliancewith these targets the SP and the customer will takerisk in signing the contract This risk is sometimes realized risk (the party is aware of therisk quantified or not) or unrealized risk (the party is taking risk and is not aware thatthey are in jeopardy) Decisions made while assuming unrealized risk can jeopardizebusiness Reliability engineering of systems in telecommunications serves to reduceoverall risk in both realized and unrealized cases

Conducting business in the field of telecommunications always involves makingdecisions with financial implications Telecommunications contracts are often writtenaround SLAs in which a performance target is specified SPs must ensure that theirservice can achieve the required performance while customers must maintain realisticexpectations from the service requested Without access to a quantitative reliabilityanalysis these financial decisions are based on assumptions at best and perception atworst Rigorous reliability engineering and analysis of telecommunications systemsallows managers and technical resources to design systems that achieve the requiredtargets with minimum cost and maximum performance

Analysis of telecommunications systems requires specialized application of reliabil-ity engineering theory and principles Performance expectations within the field oftelecommunications can range from high to extreme Rarely do consumers of tele-communications expect less than highly available systems This is true even of consumerservices such as cable television consumer Internet and local telephone serviceCommercial service expectations are typically higher than those in a consumer environ-ment because the impact on the business may be significant and costly if their tele-communications services are critical to their operations delivery of service and ability togenerate revenues Performing detailed analyses of systems both consumer and com-mercial allow risks to be managed and costs to be controlled These analyses allow thedesigner to produce a system that is carefully crafted to just meet the requirements of thecustomer rather than greatly exceed them or completely miss the target In the case ofoperational systems knowledge of the achievable system performance and its maintain-ability allows the operator to understand whether their achieved performance is withinspecification and to optimize maintenance and repair efforts

This book is written with the goal of providing the reader with the knowledge andskills necessary to perform telecommunications system reliability analysis and to

INTRODUCTION 3

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 21: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

examine system designs with a critical eye Telecommunications service providersfrequently provide service to customers who know what they would like to purchasewhether it is wireless or terrestrial packet or TDM It is far less frequent that thecustomer understands how to specify system availability or reliability Knowledge of thetheory and practice of reliability engineering allows service providers and engineers toeducate their customers regarding this important metric of network performance Evenif the reader does not perform firsthand reliability analysis the knowledge gained bystudying both the theory and the practice of reliability engineering allows the individualto make more informed better decisions about design and operation of telecommu-nications systems or the purchase of telecommunications services The truly pervasivenature of reliability as a metric in telecommunications systems requires engineersmanagers and executives to have extensive knowledge of system topologies costs andperformance In many cases these system details are obtained through experience andpractice The author of this book would argue that experience without academic studyparticularly in the field of reliability engineering results in decisions that at timesinvoke unrealized serious business risk

The reader is expected to have a basic working knowledge of engineeringmathematics A college-level course in probability and statistics is of particular valueto the reader This book relies extensively on the application and use of statisticaldistributions and probability models Experience with telecommunications systemdesign and network topologies is valuable in understanding the trade-offs involvedwith different reliability analyses Lastly if the reader has interest in developing his orher own reliability models knowledge of MATLAB and computer programmingmethods is of value All of the topics presented in this book are intended to providesufficient depth to enable the reader to either work with them directly or conductminimal further research in order to obtain a complete understanding of a topic

The previous paragraph should allow readers to identify themselves as a member ofa specific group These groups can generally be classified as one of the followingengineers managers or executives Engineers can use this book as a complete technicalresource to be used in building and analyzing system models The engineer reader thatuses this book will have the ability to develop complex detailed statistical models oftelecommunications systems that produce a variety of system metrics that can be usedfor business design and other technical decisions Managers reading this book willderive value from the knowledge obtained about proper reliable system design contractimplications and operational impacts Executive readers will find value in the high-level knowledge obtained about design best practices and proper expectations forsystem performance

This book is logically organized to provide two distinct sets of information theoryand applications Chapter 1 introduces and develops the concepts and accepted theoriesrequired for system reliability analysis This includes discussions of probability andstatistics system reliability theory and systemmodeling The remaining chapters of thisbook are organized by technology subject matter Chapter 2 discusses fiber-opticnetworks Both terrestrial and submarine networks are discussed with the subtletiesof each presented in detail Chapter 3 presents reliability analysis approachesfor terrestrial microwave systems The discussion includes short-haul point-to-point

4 INTRODUCTION

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 22: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network

long-haul point-to-point cellular wireless and WiFi networks Satellite communica-tions networks are discussed in Chapter 4 Both teleport and VSAT network topologiesare discussed along with propagation availability calculation techniques Chapter 5addresses reliability concerns for mobile wireless (cellular) systems In Chapter 6 theoften underanalyzed topics of power systems and heating ventilation and air con-ditioning systems related to communications networks are analyzed The final chapter(Chapter 7) introduces software and firmware as they relate to telecommunicationssystem reliability Each section presents the analysis in terms of two discrete partsThese parts are the communications equipment and the communications channel Thegoal of this book is to provide the reader with sufficient knowledge to abstract and applythe concepts presented to their own problem statement

The ability to blend academic theory and practical application is a rare commodityin the field of engineering Few practicing engineers have the ability to apply abstracttheory to real problems while even fewer academics have the practical experience tounderstand the engineering of ldquorealrdquo systems Telecommunications reliability engineer-ing necessitates the blend of abstract statistical theory and practical engineeringexperience Fortunately in the case of reliability engineering this blend is easilyunderstood when the information required is presented in a logical organized formatThe use of predictive andor numerical models in the design of telecommunicationssystems brings great value to system designs Acceptance of these models requires theengineer manager and executive to have enough confidence in the modelrsquos results sothat significant decisions can be made based on the results of that model The ability toplace that level of confidence in a model can only come from a fusion of reliabilityengineering academics and experience

INTRODUCTION 5

Page 23: Cover - download.e-bookshelf.de · Figure 2.8 UPSR ring network topology, normal operation Figure 2.9 UPSR ring network topology, fiber path failure Figure 2.10 UPSR ring network