13

Click here to load reader

System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

Embed Size (px)

Citation preview

Page 1: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

9Assessing and MaturingTechnology Readiness LevelsRyan M. MackeyNASA Jet Propulsion Laboratory California Institute of Technology, USA

Overview

Having considered architecture and design of an SHM system, we next turn to maturity assessmentof specific SHM technologies. Like the concept of SHM itself, many of these technologies will benew, untested, and immature. Many of them are needed in the short term, and must be matured fromconcept to flight-proven applications within 5 to 10 years. Furthermore, the technologies themselvesare often inherently complex and difficult to validate. This chapter focuses on these issues of SHMdevelopment, and aims to facilitate technology maturation and certification.

9.1 Introduction

Technology maturity is an important factor in the selection of any given technology, and this is doublytrue for SHM technologies, which are inherently more difficult to assess and mature. Much of thedifficulty is rooted in the fact that SHM technologies are commonly associated with complex softwareand modeling components. To understand the challenges in more detail, we begin by consideringequivalent efforts to bound and understand software and information technologies in general, andthen expand these conventions to include the broader scope of SHM. Previous studies of technologyreadiness levels, or TRLs, for autonomy-related information technology provide a solid foundation toevaluate SHM maturity.

One of the key issues hindering maturation is that rigorous testing of SHM generally involves raresituations that are risky or impossible to simulate. One anticipated and highly visible function of SHMin space applications is automatic crew escape in case of catastrophic vehicle failure, a situation that isdifficult to test. Virtually all SHM functions are triggered by failures, events that are random, unusual,and often poorly understood if they are known at all. The expense and low reflight rate associatedwith spaceflight makes in-space validation of SHM technologies a daunting prospect. However, as

System Health Management: with Aerospace Applications, First Edition. Edited by Stephen B. Johnson,Thomas J. Gormley, Seth S. Kessler, Charles D. Mott, Ann Patterson-Hine, Karl M. Reichard, and Philip A. Scandura, Jr.© 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.

Page 2: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

146 System Health Management: with Aerospace Applications

we will explore, a careful analysis of the underlying difficulties with SHM maturation leads us topromising alternate strategies, substantially reducing dependence on in-flight testing and permittingcost-effective approaches in many cases.

This chapter concludes with a brief example demonstrating how experiment design can be adjustedto meet specific needs of SHM maturation. In this discussion, we illustrate specific challenges of SHMmaturation for the National Aeronautics and Space Administration (NASA), how these needs weremet using an aircraft testbed and associated facilities, where further validation effort remains, andhow the experiment was adapted to address as many validation needs as possible. Finally, we willcompare this experience to other categories of SHM functions, in an effort to propose optimal meansof technology maturation on an individual basis.

9.2 Motivating Maturity Assessment

SHM uses technologies that cover any approach to sense, interpret, predict, correct, and optimizethe health of a given physical system. SHM technologies have been deployed in aerospace systemssince at least the 1950s, starting with technologies to manage redundant hardware, processes to ensurehigh-quality components, the development of reliability estimation methods, and methods to analyzefailures from remote system telemetry. In the 1960s abort systems for failing launch vehicles evolved,along with a variety of contingency operations and plans for human flight, and new fault detection andresponse techniques that used computers. By the 1970s the Shuttle deployed dissimilar redundancy witha backup flight system, and the Viking and Voyager projects developed autonomous fault protectionmethods, including techniques to test these methods with fault injections. The 1980s and 1990s saw thedevelopment of new modeling and analytical methods derived from artificial intelligence techniquesand directed graph technologies. In the 2000s new SHM technologies continued to develop. Thustechnology maturation has been and remains an ongoing theme for the deployment of SHM. There isalso a new and significant focus on SHM for spaceflight as a means to improve system safety and,by consequence, overall mission capability.

NASA’s recent mission experiences – both successful and unsuccessful – highlight the need fornew SHM technologies, as well as improvement in related fields of software, modeling, and methodsof certification. For example, the cost, time, and difficulty inherent in returning the Space Shuttle toflight, responding to fault modes that went unappreciated for two decades of service, highlight theimportance and the complexity of SHM. This need is acknowledged in new NASA programs suchas Project Constellation, where SHM has been identified as an important need for mission safety andassurance.

While some SHM technologies were designed into NASA systems (typically, redundancy, variousfault protection algorithms, and ground quality measures), many operational SHM developments withinNASA have been retrofitted, gradually improving monitoring capabilities within existing systemsrather than implementing a systems-level capability. However, there have been isolated episodes ofgroundbreaking SHM development in maturation efforts. Two well-known examples are the RemoteAgent Experiment (RA/X) (Nayak et al., 1999), flown on the Deep Space One (DS-1) spacecraft in1999, followed by the Autonomous Sciencecraft Experiment (ASE) on Earth Observing One (EO-1)which began in 2004 (Chien et al., 2004). These experiments demonstrated the feasibility of on-boardreasoning technologies to handle mission execution as well as mission replanning, functions that arealso central to SHM. Another example is the Shuttle Advanced Health Monitoring System (AHMS)(Jue and Kuck, 2002; Fiorucci et al., 2000), adding new monitoring capabilities to the Space ShuttleMain Engines (SSMEs), including for the first time detection and correction of sensor data inside thecontrol loop. In the aircraft industry, the Boeing 777 (Ramohalli, 1992) and its on-board model-baseddiagnostic reasoner have a long and successful history. More recently, the Boeing C-17 transportaircraft includes a pair of on-board and ground-based reasoners, and the Lockheed Martin Joint StrikeFighter (JSF) is the first aircraft to attempt a thorough application of prognostic technologies.

Page 3: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

Assessing and Maturing Technology Readiness Levels 147

Given the apparent interest, value, and body of research in SHM, it is remarkable that many SHMtechnologies remain at a relatively immature level. A closer analysis of the examples listed abovereveals that maturation of SHM is unusually difficult. For example, the estimated cost to certify theShuttle AHMS is in the realm of $100 million, and the C-17 health monitoring performance falls wellbelow its anticipated performance in terms of false alarms. The Remote Agent Experiment has yetto be adapted to any “normal” spacecraft, despite being demonstrated almost a decade prior to thiswriting. For these reasons, some SHM technologies appear to have lagged progress in other aerospacesciences, leaving us with significant unanswered questions about the capabilities of next-generationspacecraft. These discrepancies are due in large part to inherent difficulties in assessing and maturingSHM technologies. However, maturing any technology is a complicated task, and what few standardsexist are prone to abuse.

9.3 Review of Technology Readiness Levels

To manage technology maturity, we first need to define methods to measure technology maturity.The technology readiness level (TRL) is a commonly used yardstick of technology maturity, which isunderstood to be a separate concept from the maturity of any specific application. TRL is an abstractscale, arbitrarily chosen to range between 1 and 9, that reflects the development high-water mark ofa given technology. An example of NASA TRL guidelines (Mankins, 1995) is given in Figure 9.1.

While NASA was the first agency to apply TRLs, the concept has been adopted by aerospace ingeneral. There are few significant differences between the NASA and broader industry definitions. For

Technology Readiness Level

Basic Technology Research

Research to ProveFeasibility

Technology Development

Technology Demonstration

System Test, Launch andOperations

System or SubsystemDevelopment

Basic principles observed and reported

Technology concept and / or applicationformulated

Analytical and experimental critical functionand / or characteristic proof of concept

Component and / or breadboard validation in alaboratory environment

Component and / or breadboard validation in arelevant environment

System / subsystem model or prototypedemonstration in a relevant environment(Ground or Space)

System prototype demonstration in a spaceenvironment

Actual system completed and "Flight Qualified"through test and demonstration(Ground or Space)

Actual system "Flight Proven" throughsuccessful mission operationsLevel 9

Level 8

Level 7

Level 6

Level 5

Level 4

Level 3

Level 2

Level 1

Figure 9.1 NASA general TRL guidelines

Page 4: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

148 System Health Management: with Aerospace Applications

example, the US Department of Defense guidelines (2005) are virtually identical to NASA’s from TRL1 through 7. One contrast is an increased focus on TRL 8 and 9, concerning evaluation and deploymentof the completed system, which is of greater import to mission- or customer-oriented developers andless significant to research and development institutions such as NASA. Another significant differenceis that the NASA operational environment is usually the spaceflight environment, whereas aerospacein general has several possible environments. If one broadens the NASA TRL guidelines to consideroperational environments other than space, TRL definitions are essentially consistent across the variousbranches of aerospace.

Since comprehensive SHM integrated at the system level has never been demonstrated in safety-critical aviation or in space, with the exceptions of only a select few technologies and functions(such as fault protection in deep-space probes), we are most concerned with the intermediate TRLs,between about TRL 3 and 7. This includes the range of technology development after the formula-tion of the method, from first laboratory experiments and prototypes, up to and including first-flightvalidation.

There are a few focal points contained within the TRL guidelines that merit special consideration.The most significant element is the relevant environment , in which technologies must be tested toreach TRL 5. A relevant environment is a carefully arranged laboratory environment that accuratelysimulates the key (“relevant”) difficulties and stresses of operation in space. It must sufficiently stressthe technology to permit an accurate, defensible estimate of how the technology will perform inits intended application. Because of this definition, the relevant environment differs and is entirelydependent on the technology itself. For the case of SHM, we can improve this loose definition withsome specific guidelines, which we will expand upon in the next section.

Two other important points are the precise meaning of validation and consequently the perfor-mance model of the technology. Both terms appear at TRL 4, the first stage at which a prototypeof the technology must exist, and must be demonstrated to function correctly. Validation, in the con-text of technological maturity, means obtaining sufficient knowledge of a technology to predict itsbehavior over a significant range of operating conditions and inputs. The “performance model” is theembodiment of this prediction, meaning performance can be reduced to a formula based on one ormore inputs. As TRL advances from 4 through 7, the performance model is expanded and provenon increasingly large “envelopes” of inputs, from the laboratory through actual flight environments.Without this model, without defining boundaries for which the model is usable (i.e., “valid”), andwithout demonstrating the model’s accuracy within those boundaries, a technology cannot be said tohave been validated.

Questions of what constitutes the relevant environment, and how to describe the performance model,are significantly easier for hardware technologies. Specifically, the relevant environment is defined interms of tangible dimensions where physical devices are concerned – temperature, vibration, radiation,and so forth – whereas software-enabled technology invites a host of new environmental criteria, suchas data interchange and operating system. We should note that nearly all aerospace technologies havesome software components, and few technologies can be unambiguously classified as hardware orsoftware only.

Many SHM technologies are largely or completely software based, or more accurately can bedescribed as information technologies (IT), though the full spectrum of SHM includes both extremes.For example, a new type of structural damage sensor without on-board processing would be strictlyhardware, while an algorithm to correlate faults using a computer-aided design schematic of the systemwould be strictly information technology. The bulk of SHM technologies lies somewhere in between,with considerable emphasis on information technology.

For this reason, we will also examine special issues for information technology maturation. InMackey et al. (2003), a revised list of TRL guidelines, centered on the all-important 3–7 range, is asfollows:

Page 5: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

Assessing and Maturing Technology Readiness Levels 149

TRL 1: Identified/invented and documented a useful IT with a qualitative estimate of expected benefit.Basic functional relationships of a potential application formulated and shown to be compatible withreasonable processing constraints.

TRL 2: Completed a breakdown of IT into its underlying components and analyzed requirements andinteractions with other systems. Defined and documented the relevant IT execution environment.Preliminary design assessment confirmed compatibility with the expected IT environment.

TRL 3: Key components of IT prototyped to prove scientific feasibility. Successful preliminary testsof critical functions demonstrated and documented. Experiments with small representative data setsconducted. IT development environment and development tools required to complete prototypedefined and documented.

TRL 4: Prototype completed on laboratory hardware and tested in a realistic environment simulation.Experiments conducted with full-scale problems or data sets in a laboratory environment and resultsof tests documented. IT development environment completed as needed for the prototype. A modelof IT performance, adequate for prediction of performance in the intended space environment, mustbe documented as a result of these tests.

TRL 5: Prototype refined into a system and tested on simulated or flight-equivalent hardware. Inter-action environment, including interfaces to other systems, defined and included in the testingenvironment. Rigorous stress testing completed in multiple realistic environments and documented.Performance of the IT in the relevant environment must be documented and shown to be consistentwith its performance model.

TRL 6: System ported from breadboard hardware testbeds to flight hardware and tested with othersystems in realistic simulated environment scenarios. IT tested in complete relevant executionenvironment. Engineering feasibility fully demonstrated.

TRL 7: IT validated in the intended service environment. Adequate documentation prepared fortransfer from developers to full operations engineering process team.

TRL 8: Development environment completed and validated. Approved by an independent verificationand validation (IV&V) team.

TRL 9: Documentation of the IT completed, approved, and issued. Operational limits of the softwareare understood, documented, and consistent with the operational mission requirements.

The revised TRL guidelines above place special emphasis on the relevant environment and theperformance model. SHM requires similar tailoring, although there are a few additional factors uniqueto SHM that must be addressed. We will investigate these issues in the following section.

9.4 Special Needs of SHM

To understand the special needs of SHM technologies, let us next consider what is required and whatis typical of SHM, and therefore what abilities SHM technology must demonstrate in the course ofits development. SHM differs from other technologies in several important aspects, all of which bearupon the process of technology maturation.

SHM exists to treat or prevent unusual, unexpected, and hazardous behavior of systems. Because ofthis fact, SHM needs are always intertwined with the system to which it is applied. This also has strongimplications on the way SHM performance is evaluated. In the ideal case, a system that functionedperfectly would have no need of SHM. More realistically, requirements placed upon SHM – reliability,false alarm rate, latency, and so on – depend on the frequency, severity, and consequences of the failuremodes it must manage. In aerospace, one frequently hears of SHM “buying its way” onto the system,meaning that SHM must demonstrably be the best solution to a given problem. Since SHM tends tobe highly complex, it must justify itself with substantial improvements to system performance. Theseperformance parameters define the performance model, but they are often abstract and subjective

Page 6: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

150 System Health Management: with Aerospace Applications

measurement – such as false alarm rate – as compared to the performance model for a hardwaretechnology.

By necessity, SHM must be a system-level implementation. There are numerous SHM func-tions that for our purposes can be broadly classified into detecting, understanding , and reactingto abnormal conditions. While a significant fraction of SHM is hardware, specifically sensors andcomputing hardware at a minimum, many needed SHM technologies are software technologies.Furthermore, with few exceptions, the sensors and computing hardware required for SHM are iden-tical to sensors and computers needed for control. The exceptions include sensors that are onlyused by SHM, such as strain gages or wear detectors. However, unlike the hardware, much of theSHM software is SHM specific. This emphasis on software drives technology maturation to a largedegree.

As discussed elsewhere in this text, the “understanding” components of SHM are generally referredto as reasoners , meaning software that interprets information to hypothesize about the state of thesystem. Reasoners, and many other SHM components besides, are dependent upon models. Thesemodels are generally separable from the reasoners themselves, but are often at least roughly equal incomplexity to the reasoner algorithms. While the model may not need “development” as it is usuallynot a distinct technology, models do require certification.

Any discussion of SHM technology also invariably leads toward automation and autonomy. Themajority of SHM functions can be described as either providing some form of active response tooff-nominal conditions, or partially or fully replacing the judgment of human pilots, maintainers, orcontrollers. Either of these processes is a step toward automation by definition. Autonomy bringsspecial challenges, because, in general, tasks that are traditionally performed by people are difficultto define in rigorous terms. Just as a conventional “autopilot” is in no manner equivalent to an actualpilot, SHM technologies that replace functions such as inspections and certification for flight requirea substantial leap in capability and complexity.

Finally, there is a fundamental difficulty with any SHM technology, whether it is hardware orotherwise, regardless of function or relevant environment. This difficulty is the simple fact that SHMis intended to deal with system failures, situations that are typically unusual and unwanted, as wellas expensive and dangerous. As a result, SHM as a class requires a rethink of technology maturationpractices. In an ideal SHM maturation effort, we would begin with a complete understanding of allanticipated states of the system, both nominal and faulty. We would be able to describe these statesin terms of measurable quantities and environmental conditions that lead to such effects. We wouldhave past examples of each of these states for laboratory testing of our technologies, followed bya full-scale series of tests. We would then be able to examine the technology’s performance in thesituations for which it was intended.

In general, few of these desires can be fully met, and validation ultimately rests on our limitedability to test the technology. A distilled summary of SHM-specific testing difficulties is given below:

1. Denumerability of states: Since the relevant environment for SHM generally has many dimen-sions, it becomes difficult or impossible to test technologies in every plausible set of conditions.This applies to both the number of states defining test cases and the SHM technology or archi-tecture’s internal representation of states. For example, a monitoring technology designed to sensefailures in a system with 10 different known failure modes cannot usually be tested for everycombination of faults, since there are over 1000 combinations.

2. Defining nominal: Implicit in the assumption that one can define failures is the converse, namely,that a well-defined nominal state exists. The nominal state may be difficult to establish, due tomanufacturing variation between identical spacecraft and components, effects of normal wear andusage, and other long-term effects of the environment. There may be uniqueness due to specificpayloads, configurations, or interfaces with launch vehicles. Establishing nominal is particularlydifficult if technology and the intended application are developed concurrently, as is often the case

Page 7: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

Assessing and Maturing Technology Readiness Levels 151

for space systems. Many properties of nominal operation are only evident after a long period ofoperational experience.

3. Inability to perform full-scale test of faults: Certification of a failure-reactive system – not tomention operator confidence in the system – requires unit testing as well as full-scale tests of theassembled system. This is not practical due to the danger and expense. Even component testing ofactual faults is difficult, particularly in such areas as propulsion and structures. It is also assumedthat the SHM system will improve with flight experience, implying that SHM is not fully complete,mature, or tested before the first flight.

4. Access to data and implementation schedule: The SHM system cannot be built without athorough understanding of system behavior. Needed domain knowledge includes test data and engi-neering models wherever possible. If SHM is developed concurrently, it will necessarily evolveas system trades are made and more detailed description becomes available. These data require-ments are much more stringent than for most other system components. High-quality data is rarelyavailable without a determined data capture effort.

5. Certification of algorithms vs. models: Many SHM algorithms use embedded models, and thesemodels are often updated to include new information. This raises the important question of how tocertify SHM, as performance is dependent upon both the algorithm and the model. In many cases,the model is more difficult to create than the algorithm itself.

In legacy systems, designers have reacted to these difficulties by keeping the more complex SHMtechnologies, generally model-based reasoners, out of the safety-critical path. To date, in all aircraftexamples, SHM reasoners have been entirely decoupled from flight controls, only affecting mainte-nance cycles and other ground-based activities. In spaceflight, only straightforward SHM functions(often intricate rule-based systems that reduce the complexity of the underlying behavioral modelto simplified rule sets that enable critical decision-making) have been embedded in control systems,except where absolutely necessary to achieve mission needs, such as on-board “behaviors” governinglocomotion of the Mars Exploration Rovers (MER). To enable more sophisticated and hence morecapable automated SHM reasoning capabilities, we must therefore find an alternate solution to theseproblems.

9.5 Mitigation Approaches

While the difficulties listed above seem nearly intractable, particularly for space applications, it isimportant to keep in mind that the difficulty may be greatly reduced if an elegant system is chosenfor development – a simpler or otherwise more tractable system than the eventual target application,yet one that exhibits the right interfaces and operating complexity. From the standpoint of TRL, weneed only perform validation in the relevant environment . We require only a single example that issuitably detailed and relevant to the broader class of missions. It is therefore reasonable to assumethat we can find a case where these problems are not so severe.

Before we examine how to apply the traditional TRL scale to SHM technologies, let us first thinkabout ways to address these concerns. There are several promising approaches to solve the problemslisted above. A few of the more important ones are listed below:

1. Design for data capture: Few programs currently realize the long-term value of system data,including that gathered during assembly and initial tests, but such information is invaluable toSHM. Additional sensors are rarely required; a program must only perform the incremental (butsubstantial) effort of collecting and managing data through all phases of design, integration, andcheckout. Furthermore, past experience with aircraft shows that SHM can help find and correctfaults during vehicle assembly and test, offsetting the additional cost.

Page 8: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

152 System Health Management: with Aerospace Applications

2. Prototype SHM using mission or vehicle analogs: Often a meaningful surrogate system can befound that has higher reflight rates and greater flexibility in terms of fault injection and missionvariation than the target application. Mission analogs are also capable of adding “flight realism”that may not be present in even the best simulations.

3. Work toward analytic requirements for flight-critical software certification: Since not everyfault can be anticipated, let alone invoked for testing purposes, it is not practical to insist that SHMsoftware be tested for every possible combination of input conditions, as is the standard practice forflight-critical code. A more realistic approach would be to independently verify that the models usedare correct and operable using iterative “pathfinder” techniques, and that the SHM algorithms willnot inject dangerous inputs to the control system whether or not a correct model is loaded. Morecost-effective approaches, such as risk-based testing, are also preferred to requirements based oninstitutional inertia. Highly critical, reflexive SHM, such as emergency escape mechanisms, shouldbe separable and testable independently from other SHM functions.

4. Take maximum advantage of autocoding techniques and model-based engineering (MBE)approaches: Since SHM is a reasoning process and is therefore model dependent, accuracy, com-pleteness, and interoperability of models are paramount. Autocoding, automatic model abstraction,

Table 9.1 Matching approaches to maturation difficulties

Design for data Analog missions Negotiated Autocoding and Meta-environmentalcapture flight-critical MBE models

requirements

Difficulty offull-scaletesting

Gradual testingas system isassembled

Full-scale testsof equivalentsystem

More realistictestingrequirements

Organize unittesting

Provide end-to-endsimulation atvarious levels ofdetails

Poor access todata

Capture dataand domainknowledgeduringassembly andtest

Access toextensive dataof analogoussystem

Playback andintegration ofcaptured data,synthesis withsimulated data

Imprecisedefinition ofnominal

Mechanisms togathernominal datafrom systemas it is built

Realistic flighttesting,includingnominalvariation

Collect knownvariation withinnominal

Algorithm vs.models

Test modelsagainst actualdata assystem isconstructed

Test ability togenerate andinclude modelupdates

Permitcertificationof modelswithoutrecertifyingentire ISHM

Automatedgenerationand checkingof models

State explosion Permitseparation ofISHM toallowabstraction,independentcertification

Allow optimalmodel testingusingexhaustive orbranch/boundmethods

Provide and maintaina meaningfulabstraction of statespace

Page 9: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

Assessing and Maturing Technology Readiness Levels 153

and model checking will greatly reduce the effort needed to certify models and the overall SHMsystem. This benefit accumulates as models are improved over the system’s lifecycle.

5. Construct and maintain a centralized “meta-environment” combining system, subsystem,and component models into a unified simulation: A unified simulation provides three primarybenefits for our purposes. First, it maintains data and domain knowledge. Second, it serves as aplatform to test and certify SHM technologies. Third, it provides a rigorous description of the“relevant environment” for different SHM components, by encapsulating the various interfacesand operating conditions found in different parts of the system. This closes the circle of systemknowledge that began with data gathering above.

The five approaches above are not an exhaustive list, but do cover the major difficulties that faceSHM. Table 9.1 summarizes specific responses to the various challenges.

9.6 TRLs for SHM

Having examined the difficulties posed by SHM as well as some solutions, we now revisit the notionof TRLs. The TRLs are intended as a guideline for technology development, and should explicitlyreference the most difficult achievements at every stage. The slight modifications below are intendedto highlight issues of particular importance to SHM technologies. These additions can be treated asadditional guidance for technology evaluators, and as progress items for technology developers.

TRL 1: Identified/invented and documented a useful SHM technology with a qualitative estimate ofexpected benefit. Basic functional relationships of a potential application formulated and shown tobe compatible with reasonable SHM architectures and testing requirements .

TRL 2: Completed a breakdown of the SHM technology into its underlying functions and compo-nents, and analyzed requirements and interactions with other systems. Defined and documentedrequirements for operation, interfaces, and relevant mission phases. Preliminary design assessmentconfirmed compatibility with the expected SHM architecture.

TRL 3: Major functions of SHM technology prototyped to prove scientific feasibility. Successfulpreliminary tests of critical functions demonstrated and documented, leading to a preliminaryperformance estimate. Experiments with small representative data sets conducted. Execution envi-ronment and development tools required to conduct these tests, such as modeling tools, defined anddocumented.At these low levels of maturity, the key additional ingredient is an understanding of the SHMarchitecture for which the technology is intended, and the modeling approach if any needed forthe technology to function. The architecture defines usage, inputs, outputs, issues of timing, andperformance requirements of the technology, and is thus an integral part of the relevant environment.The modeling approach must also be evaluated for completeness and feasibility along with the SHMalgorithms themselves.

TRL 4: Prototype completed on laboratory hardware and tested in a realistic environment simulation.Experiments conducted with full-scale problems or data sets in a laboratory environment and resultsof tests documented. Development of SHM infrastructure completed as needed for the prototype. Amodel of SHM technology performance, adequate for prediction of performance in the intendedapplication, must be documented as a result of these tests.

TRL 5: Prototype refined into a system and tested on simulated or flight-equivalent hardware. Inter-action environment, including interfaces to other systems, defined and included in the testingenvironment. Rigorous stress testing completed in multiple realistic environments and documented.Performance of the technology in the relevant environment must be documented and shown to beconsistent with its performance model.

Page 10: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

154 System Health Management: with Aerospace Applications

TRL 6: System ported from breadboard hardware testbeds to flight hardware and tested, along withall other needed components, in realistic simulated environment scenarios. SHM technology testedin complete relevant execution environment. Engineering feasibility fully demonstrated.At mid-levels of maturity, the most important issue is defining and testing in the “relevant envi-ronment.” The relevant environment in this case means a prototype or skeleton SHM architecture,conforming to the envisioned final application. Sensors, computing hardware, message passing,etc., are all defined by that architecture. Stress testing, for purposes of SHM, means injection offaults – either simulated or real – that are considered limiting cases, in terms of sensitivity, timing,or severity. Stress testing also should include overlapping or concurrent faults. Finally, stress testingmust include the full spectrum of nominal behavior, including long-duration or borderline cases asappropriate. As part of performance model validation, all technologies, not just SHM technologies,should be tested to failure and beyond.

TRL 7: SHM technology validated in the intended service environment. Adequate documentationprepared for transfer from developers to full operations engineering process team.

TRL 8: Development environment completed and validated. Approved by an independent verificationand validation (IV&V) team.

TRL 9: Documentation of the SHM technology completed, approved, and issued. Operational limitsof SHM are understood, documented, and consistent with the operational mission requirements.

The latest stages of maturity are characterized by completion of documentation and tools, sufficientto permit successful integration and testing of the technology by independent developers. Since manyof the SHM technology’s idiosyncrasies are captured in the SHM architecture, which is definedand tested at lower TRLs, there are few unusual requirements at this stage of development. It is,however, important to keep in mind that verification and validation methods, as well as metricsused to establish operational limits, may be unique to SHM technologies. Independent evaluatorsmust be familiar with the performance model, defined and improved through TRLs 4–6, and mustdetermine whether it adequately addresses their understanding of performance limits. If new metricsmust be included, the technology cannot progress beyond TRL 7 until the performance model hasbeen improved.

9.7 A Sample Maturation Effort

To illustrate this process, let us now consider an example SHM technology maturation effort, recogniz-ing the special issues of SHM and responding accordingly. In 2005, NASA Jet Propulsion Laboratory,NASA Dryden Flight Research Center, and NASA Ames Research Center jointly undertook an effortto mature two specific SHM technologies (Mackey et al., 2006). This effort eventually settled on aDryden F/A-18 aircraft as both the host system and the data source, as the F/A-18’s engines weremonitored in the experiment. The two technologies in question were BEAM (Beacon-based ExceptionAnalysis for Multimissions) (James et al., 2001) and IMS (Inductive Monitoring System) (Iverson,2004). Both are software technologies designed to process raw sensor data and sense anomalies,including system novelty and incipient faults. It is important to understand how these technologiesperform, what their strengths and weaknesses are, and what applications are most suitable. Thesecharacteristics must be captured in the performance model of the technologies, and the model eitherpopulated or verified by experiment.

Prior to this effort, both technologies were at a moderate level of maturity, having been assembledas functional prototypes in the laboratory environment (TRL 4) and demonstrated on hardware-in-the-loop simulators complete with a characterization of performance (TRL 5). There remained, however,a significant gulf between their maturity and readiness for spaceflight.

The specific maturity issues facing this project are common to many SHM development efforts.Researchers needed to answer the following questions:

Page 11: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

Assessing and Maturing Technology Readiness Levels 155

• How does performance – specifically false alarm performance – change in a real system, with envi-ronmental variation, system noise, real operator inputs, etc.?

• How difficult is it to develop processes to certify, install, and use the technology in an actualoperational system?

• What is the performance impact and cost of interfaces between the system and the software? Howmuch customization is needed?

• Are there unexpected effects or consequences from the operating environment?• How does the technology react to inevitable surprises, both of environment and the system itself?• In addition to functioning correctly, does the technology provide useful results in an operating

system?• Can performance be verified for a large number of tests and a large volume of input data?

These needs are complicated by at least the first three common difficulties of SHM listed above:full-scale testing is difficult and expensive, proper validation requires an inordinate amount of data,and the concept of “nominal” in an operating system is far more complicated than in the laboratory.

To address these issues, the development team made several decisions regarding experiment design.The first was to employ a low-cost but fully operational surrogate testbed, namely a NASA F/A-18aircraft, as a spacecraft analog. This approach provided flight realism and efficient access to almostdaily flights, each providing large quantities of data. Next, the researchers chose an operating conceptwhere the detector was always running whenever the experiment computer was powered, therebyallowing the experiment to take advantage of every flight of the aircraft without interference. Thisrequired some “ruggedization” of the experiment software to operate without carefully scripted inputsor in-flight verification that the software was operating, but these changes ultimately led to a morecredible and robust experiment. Finally, in addition to sensor data from the aircraft engines – thesystem monitored by the SHM experiment – the researchers also carefully selected a number of addi-tional parameters from the aircraft controls to provide context. Stored alongside the raw engine data,these parameters allowed the researchers to replay each flight in a flight simulator, and also providedsufficient detail to review each flight with pilots and maintainers afterward. This later proved invalu-able in verifying the existence, and in some cases the root causes, of minor unexpected anomalies thatoccurred during flight. Without this information, a thorough validation would have been impossible.

This project was intended to demonstrate TRL 6 for both technologies. For a proper validation,the testbed must contain all necessary elements of the relevant environment. In this experiment, theseelements were as follows:

SHM architecture: Detector receives sensor data in engineering unitsAsynchronous or synchronous detection capabilityMust operate with no feedback from higher-level reasoners

Hardware: Performance sensors, environmental sensors, state variablesReal-time data sampling at rates of 10 Hz and up

Interfaces: IEEE 1553 data busComputing hardware: Flight-qualified processor (Geode 686 at 300 MHz)Operating system: Flight-like (Vx/Works targeted but dropped)Scenarios: Wide range of nominal conditions and simulated faultsOther flight issues: On-board power, expected sensor anomalies, pilot override, computer

hardware errors, model data errorsDevelopment environment: Automatic model generators and model testing tools

This list of requirements is typical for SHM validation. The requirements above can be met by anaircraft as easily as a spacecraft, with the exception of the specific fault behavior. Significantly, these

Page 12: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

156 System Health Management: with Aerospace Applications

SHM algorithms can be expected to perform identically in atmosphere or microgravity, provided thetestbed accurately describes a spacecraft computing and sensor environment. For these two technolo-gies, the only change between this F/A-18 test and a spaceflight test would be in the models used bythe algorithms. During this validation experiment, researchers also exercised the development tools togenerate models from captured data, and included their performance as part of the experiment.

Through 6 months of designing and conducting this experiment, all of these validation needs havebeen met with two exceptions, limited by the hardware available and project budget. The first wasthe software operating system; Linux-RT and Vx/Works were both considered but not adopted dueto schedule and cost limitations. The second was in the scope of simulated faults, which was limited tounusual pilot inputs – specifically, manual shutdown of engines – and the few unexpected events thathappened to take place during the test flights. Both of these shortcomings must be answered beforeclaiming a complete validation, but these can be easily solved. The first simply requires purchasingand incorporating a proven spaceflight operating system. The team can address the second throughmore creative use of flight controls, inclusion of failed (redundant) hardware on board, or playbackof captured fault data over the 1553 bus during flight.

Using the F/A-18 as a surrogate system conferred several dramatic advantages. First, numerousflights were available for testing – 18 flights were performed in the space of a single month, andthe team was able to effect several model updates and code changes between flights. Second, thisparticular aircraft also comes with realistic simulators, both pure software and hardware-in-the-loop,permitting rigorous checkout of flight software prior to flight experiments. Third, the F/A-18 is farmore amenable to modification and testing in degraded mode than an actual spacecraft, such as exper-iments with one of two engines deliberately failed in flight. From the standpoint of SHM performance,the F/A-18 experiment allowed estimation of algorithm sensitivity, latency, fault coverage, suitabil-ity to different sensor types, model-building requirements, processor and memory requirements, andadequacy of the associated development tools. Combined with the follow-on research listed above,these are all of the ingredients necessary to enunciate and validate the “performance model” of theSHM technologies, and therefore legitimately claim TRL 6. As a result of this relatively inexpensiveand quick experiment, some of the technologies studied are now being infused into NASA groundsystems, as well as being readied for an in-space demonstration (TRL 7) on a small spacecraft. Inevery way, this approach was a success.

This “surrogate spacecraft” approach, as it stands today, is one of many approaches that is rec-ommended for SHM technologies in general – failure detection, diagnostic reasoning, prognostics,planners, etc. The major element missing from the example above is one of feedback – in this system,there was no direct route from the experiment computer back to the flight controls (nor was thisnecessary for the technologies involved). However, the aircraft paradigm is not limited in this regard.Other aircraft experiments, such as the Intelligent Flight Control System project, have included com-plete feedback. There are similar opportunities available on unmanned aerial vehicles (UAVs), amongothers, where experimental technology intrusion into active controls is of lesser concern from a pilotsafety standpoint.

9.8 Conclusion

The value of TRL is in the ability to extrapolate whether or not a technology can be applied to anygiven application, and estimate the effort required, from experiences with different applications. TRLis therefore particularly important for emerging and complex technologies. SHM has both of thesecharacteristics. TRL is, however, only valuable if it is applied consistently and can be supported withdetailed performance information about the technology in question.

The central concept of TRL is the performance model. As one develops and matures a technology,it is essential to create, update, and verify the performance model at every stage. In essence, thisperformance model is the developer’s “radius of confidence” in the technology, describing how well

Page 13: System Health Management (With Aerospace Applications) || Assessing and Maturing Technology Readiness Levels

Assessing and Maturing Technology Readiness Levels 157

one can predict technology performance in a totally new application, given the characteristics of thatapplication. The technology developer should have a good understanding of the technology limitationsand should be able to describe them in the context of the performance model. If the model is constructedwith care, describes operating conditions in a manner that can be understood by other developers, hasbeen thoroughly tested “at the corners” of performance, and shown to be consistent with anticipatedperformance, then it is a simple matter for another developer to evaluate the technology’s suitability.

High TRL can also be described as “trust” in a technology’s behavior – specifically, trust that thetechnology will function as expected after being handed from its inventors to an entirely differentdevelopment team, based on a comprehensive series of tests. The issue of trust takes on an entirelynew significance in the context of SHM. Given the low flight rates of spacecraft in the foreseeablefuture, virtually every response of SHM to a major fault mode is likely to be the first and only timethat a fault is ever encountered. For this reason, trust is essential, and at the same time difficult to earn.

For SHM to be successful, and to have any real impact in system performance, this trust has to beearned. We therefore must be aware of the special difficulties pertaining to SHM. We must understandthe details of TRL to make certain that advanced TRLs are justified. We must also explore alternateforms of technology maturation. TRL is a particularly important tool for SHM, simply because nosingle project will ever be able to afford the time or effort to exhaustively prove SHM. Developersmust look to past performance on simpler systems to build trust in SHM.

The aerospace industry is now reaching for a new generation in air and space vehicles, cognizant ofthe need for safety, reliability, and sustainability. SHM is a key component to reaching these goals, butthe SHM system must be ready – on board, functional, and operating with the confidence of missioncontrollers on the first flight. This is a difficult, but not an impossible, task. Careful attention to thearchitecture, interfaces, and environment that govern SHM; thorough testing of SHM componentsand end-to-end functions on surrogate vehicles; systematic capture and usage of operational data ascomponents are developed and integrated; and coordinated engineering, modeling, and simulation areall achievable. If applied conscientiously, these approaches will be sufficient to meet the challenge.

Bibliography

Chien, S., Sherwood, R., Tran, D. et al. (2004) The EO-1 autonomous science agent. Proceedings of the 2004International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS-2004), July.

Fiorucci, T., Lakin, D., and Reynolds, T. (2000) Advanced engine health management applications of the SSMEreal-time vibration monitoring system. Proceedings of the 36th AIAA/ASME/SAE/ASEE Joint Propulsion Con-ference, July.

Iverson, D. (2004) Inductive system health monitoring. Proceedings of the 2004 International Multiconference inComputer Science and Computer Engineering, June.

James, M., Mackey, R., Park, H., and Zak, M. (2001) BEAM: technology for autonomous self-analysis. Proceedingsof the 2001 IEEE Aerospace Conference, March.

Jue, F. and Kuck, F. (2002) Space Shuttle Main Engine (SSME) options for the future Shuttle. Proceedings of the38th AIAA/ASME/SAE/ASEE Joint Propulsion Conference, July.

Mackey, R., Some, R., and Aljabri, A. (2003) Readiness levels for spacecraft information technologies. Proceedingsof the 2003 IEEE Aerospace Conference, March.

Mackey, R., Iverson, D., Pisanich, G. et al. (2006) Integrated System Health Management (ISHM) TechnologyDemonstration Project Final Report. NASA Technical Memorandum NASA/TM-2006-213482 , February.

Mankins, J. (1995) Technology Readiness Levels. NASA White Paper, April.

Nayak, P., Kurien, J., Dorais, G. et al., (1999) Validating the DS1 remote agent experiment. Proceedings of the 5thInternational Symposium on Artificial Intelligence, Robotics and Automation in Space (iSAIRAS-99), June.

Ramohalli, G. (1992) The Honeywell On-board Diagnostic and Maintenance System for the Boeing 777. Proceed-ings of the 11th IEEE/AIAA Digital Avionics Systems Conference, October.

US Department of Defense (2005) Technology Readiness Assessment (TRA) Deskbook , May.