[IEEE 2005 IEEE International Engineering Management Conference, 2005. - St. John's, Newfoundland & amp; Labrador, Canada (Sept. 11-13, 2005)] Proceedings. 2005 IEEE International

Information-based Risk AssessmentSoftware Architecture

Rattikorn HewettDepartment of Computer Science Abilene

Texas Tech University302 Pine Street

Abilene, TX 79601

Abstract− Effective system management requires a thorough un-derstanding of risks. As more systems depend on software toprovide their functionalities, the need to assess the contributionof software to the risks of such systems becomes inevitable. Thispaper presents a framework for developing a risk assessment toolthat aims to assist users in identifying hazards and risks associ-ated with software and their impacts on its entire system and en-vironment. Our approach employs several modules based onrelevant information about the system conditions and environ-ments, a holistic view of a specific application system, hardwarefault models, and a library of software component risk profiles.We describe the framework architecture that integrates these in-formation-based modules with inference and task modules tosupport automated reasoning for risk assessment and analysis.The paper discusses our ongoing preliminary research includingthe detailed architecture of the proposed framework, its compo-nents and utilization, and future directions of this work.

Keywords: risk assessment and risk analysis tools, software safety,information and knowledge-based systems.

I. INTRODUCTION

Effective system management requires a thorough under-standing of risks [4]. Performing hazard and risk analyses on acomplex system is a laborious and time-consuming processthat requires extensive knowledge about the system being de-signed. Due to the scope and complexity of a system, even anexperienced and knowledgeable engineer may overlook someissues, as no one is likely to understand every aspect of its be-havior. There is a growing need for tools that help both nov-ice and experienced system managers and engineers expediterisk assessment of large engineering systems.

Today more and more systems (e.g., infrastructures, in-strumental or mission controls) depend on software to providesystem functionalities. As the roles of software increases, theability to assess the contributions of software to the risks asso-ciated with an entire system increasingly becomes necessaryfor both the development and the management of the system.It is notoriously known that it is infeasible to develop largesoftware that is defect-free [12]. Verification and validationof software is not adequate to guarantee safety of the system.Software may behave correctly according to its specificationsand users, yet it may endanger lives or may result in unaccept-able damages. Understanding safety implications of the soft-ware under development can help system managers make in-formed decisions within uncertain system environments. It

can also help software developers avoid design flaws and un-desirable consequences of software. There exist many soft-ware development tools that focus on the development of acomplete system in an efficient way. However, in general,they do not directly support risk and hazard analysis [2, 9, 12].

Techniques that specifically address risks associated withsoftware are lacking [3, 7, 9, 12]. Although many risk assess-ment techniques have been successfully applied in varioussafety-related systems, they often omit detailed software prop-erties or exclude application perspectives of an entire system.Furthermore, most existing tools require manual hazard analy-sis. Failures of software are usually due to design errors andnot ageing. Thus, we cannot perform software risk assessmentin exactly the same way as hardware (or physical systems) riskassessment. While the techniques for risk assessment ofphysical or hardware systems are well developed and widelyused, the techniques for assessing risks associated with soft-ware in computer-based systems are in an early stage of de-velopment [4, 6, 8, 9, 11, 12]. Software risk assessment musttake application contexts of the system into account. Cur-rently, there are no risk assessment models that are capable ofcharacterizing software “failures” in an application-independent way [3].

This paper presents a framework for developing a risk as-sessment tool that assists users in identifying hazards and risksassociated with software and their impacts on its entire systemand environment. Section II describes characteristics of soft-ware risks together with related work and background. Sec-tion III gives an overall architecture of the proposed risk as-sessment tool and Section IV illustrates a scenario of how theproposed tool can be used. Section V concludes the paper anddiscusses our ongoing and future research directions.

II. RISKS ASSOCIATED WITH SOFTWARE

Risk assessment provides information about safety aspectsof the system under investigation. While risk represents ameasure of the probability and severity of adverse effects,safety signifies a value judgment on the acceptability of risks(to ensure that the system will not harm humans, properties orenvironments) [4]. Safety can only be assured by consideringrisks associated with every part and variability of the systemincluding its interfaces, connections, environments, externalconditions, software modules, and hardware/physical compo-

0-7803-9139-X/05/$20.00 ©2005 IEEE. 574

nents. (Unless specified, we use the terms component, device,module and subsystem interchangeably for an entity unit.)

Risks associated with software can involve direct or indi-rect consequences of software faults, errors and failures. Ac-cording to Storey [12], faults are defects in a system (e.g., aprogram with an incorrect initialization or incorrect calcula-tion). Faults may or may not lead to errors (deviations fromthe intended operation of the system), which may or may notlead to failures (deviations from the required performance ofthe system). For example, a software bug (fault) in a routinethat does not support system functional requirements wouldnot cause an error in operation controls. On the other hand, asoftware bug may produce an error that could cause a systemmalfunction. However, if the affected area of the software isnever executed, the bug will have no effect on the system anda possible error or failure of the system may never occur.Similarly, faulty operation of hardware or a physical devicewill only cause an error, which could then effect the func-tioning of the system, if the device is used. Additionally, awell-designed system may be able to continue to perform itsrequired functions even when a few errors are present. Forexample, one could connect system components in such a waythat an error in one does not shut down the entire system.

Although software and hardware/physical faults/failuresappear to have similar effects on a system, they are very dif-ferent in nature and behaviors. Fig. 1 gives a comparisonbetween software and hardware characteristics that are rele-vant to risk assessment.

Failure modes of hardware are relatively well defined andcan often be assessed without application contexts. Softwarefailures, on the contrary, are hard to define and predict. Thisis mainly because software behaviors are dependent on com-putational capacities, characteristics of environments dictatedby software design, and are subjected to emergent behaviors,which are hard to detect. Hardware component failures havea random nature. The failures are typically due to physicallimitations (e.g., wear and tear), whereas software failures aredue to design and specification flaws and thus their behaviorsare not random but deterministic [3, 12].

In the development of safety-critical systems, componentisolation is a desirable goal to promote safety by facilitatingeffective component testing and the separation of concernsand failures. While this may be achievable in physical de-vices, obtaining meaningful isolation between software mod-ules is not always possible. Some programming environ-ments and languages (e.g., conventional assembly code, C) donot facilitate the construction of well-isolated software modules

Characteristics Hardware/PhysicalComponents

SoftwareComponents

Failure modes Well defined, understood Not well definedLimitations Physical constraints e.g.,

wear and tear, ageingComputational constrainte.g., time, space

Failure Behaviors Random Deterministic/SystematicFailure Causes Physical limitations Design flawsComponent Isolation Possible Not always possible

Fig. 1. Some risk-related characteristic comparisons.

Despite powerful languages, such as Ada, that provide suchsupport, complete isolation between software modules cannever be guaranteed because it would require a perfect com-piler and elimination of dependence of software execution onhardware operations, neither of which are possible [12].

Risk and safety are related to reliability. System reliabilityrefers to the probability of the system functioning correctlyover a period of time in given operating conditions [9, 12].Reliability is necessary but not sufficient for system safety.The random nature of hardware/physical component failuresmakes them subject to statistical prediction. There exist reli-ability models, based on analysis of past observations onsimilar hardware or physical devices, to predict their failurerates. Knowledge of failure modes and failure rates are oftenwell understood and useful for performing risk assessment in-volving a particular physical component and its effect on therest of the system.

As for software, while some researchers think that softwarefaults are systematic, not random in nature, and thus not sus-ceptible to statistical analysis, others justify their use of sta-tistical analysis to assess software reliability by arguing therandomness of locations of unknown software faults in code.However, reliability and risk assessment of software are verydifferent. Software reliability is concerned with whether codeexecution will deviate from its specification, whereas soft-ware risk is concerned with whether the actions resulting fromcode execution will lead to hazardous events [3]. Softwarereliability does not discriminate between degrees of severityof consequences of different types of faults, whereas softwarerisk assessment needs to differentiate the consequences offailures associated with the system.

III. ARCHITECTURE OVERVIEW

We present a framework for developing a risk assessmenttool that assists users in identifying hazards and risks associ-ated with software and their impacts on its entire system andenvironment. Our approach aims to increase efficiency andscalability of risk assessment by (1) employing and/or ex-tending existing risk assessment techniques (e.g., failuremodes and effects analysis, hazard and operability studies,event tree and fault tree analysis), (2) utilizing relevant infor-mation about the system in guiding and controlling the as-sessment process (e.g., heuristics based on likelihood ofknown faults), and (3) abstracting domain-independent infor-mation about system components (including software parts)that can be used in multiple application contexts and multiplesystems. Unlike most existing risk assessment tools, we aimto specifically assess risks associated with software and itsimpact on the system in which the software is used. Our ap-proach intends to automate, as much as possible, the processof hazard analysis and risk assessment. In order to attain effi-ciency and help system designers most effectively, we aim toprovide an interactive process for tradeoff analysis. A com-pletely automated tool, though useful, would be limited in useand the benefits it can offer.

0-7803-9139-X/05/$20.00 ©2005 IEEE. 575

Fig. 2. An overall architecture of risk assessment framework.

Fig. 2 shows an overview of the architecture of the pro-posed framework. For clarity, we will refer to the tool em-ploying the proposed architecture, as RELIANCE and we willuse the term “system” to refer to the application system to beexamined for risk assessment. RELIANCE’s architectureconsists of three basic components: Inference Modules, SystemModules, and Risk Assessment Task Modules. The three partsare grouped on the left, right and top (enclosed by an oval)parts of Fig. 1, respectively. The architecture is data/eventdriven. The data to/from external environments (shown inoval shapes at the bottom left of Fig. 1) including instrumentaldevices, sensors, actuators and human operators or users, in-teract with RELIANCE via a user/system interface, whilesome data can be accessed directly by a reasoning moduleduring problem solving. We now describe each component inmore details.

A. Inference ModulesThe inference modules include an inference engine that

provides basic reasoning mechanisms (e.g., forward or back-ward inferences) and a working module that maintains currentstates of problem-solving or reasoning tasks. In each reason-ing cycle, RELIANCE would use recent events (e.g., data orchanges of the system or the problem solving state) to triggerapplicable reasoning operations from various risk assessmenttasks and select the most appropriate one for execution, whichin turn would produce new events initiating a new reasoning

cycle. On selecting a rule to be executed, our proposed archi-tecture may employ a simple rule-based control mechanismusing specified priorities, or meta-rules [1, 10] to representcontrol knowledge or it may employ a blackboard-based con-trol mechanism that rates each applicable rule according tohow well it fits an explicit control plan [5].

B. System ModulesRisk assessment must take the application context into ac-

count. This requires a holistic view of connected relation-ships relevant to a system as a whole. The information con-tent and the representation of the system are at the heart of theproposed framework. To provide an integrated approach torisk assessment, RELIANCE includes several modules basedon relevant information about the system being examined.These information-based modules can be classified into twocategories: generic and domain-specific.

The generic modules represent context independent infor-mation about the system components including relevant find-ings commonly known from past experience. As shown at thebottom right of Fig. 2, the generic modules include two sepa-rate parts: a library of software component risk profiles (e.g.,risks of arithmetic calculation routines include values out ofrange, incorrect formula, and floating point implementationcausing overflow/underflow) and the fault models of com-monly used hardware or physical components (e.g., sensors,memory devices, processors, transistors). These genericmodules can contain different abstraction levels of informa-tion about the system components. They can also be appliedto multiple application systems or multiple contexts within asystem in the same manner as we reuse software and hard-ware/physical components in various systems and contexts.

The domain-specific modules represent all aspects of thesystem being examined. There are four model types: physicalsystem, hardware system, software system and system envi-ronment models, as shown at the top right of Fig. 2. Themodels explicitly represent different parts of the system thatcontribute to the system functioning, performance and risks.For example, the system environment model may include ex-ternal conditions (e.g., power outage) and human actions thatcould affect the system. To assess risks properly, it is crucialto represent interactions among components within eachmodel and also interfaces among components across thesemodels. This is particularly important for some hardware andsoftware components that are tightly connected because soft-ware execution depends on hardware actions, therefore evencorrect software could still be disrupted by faults and limita-tions (e.g., memory space, processing time available) intro-duced by hardware.

C. Risk Assessment Task ModulesAs shown in Fig. 2, the proposed architecture supports sev-

eral risk assessment task modules including hazard identifica-tion, cause analysis, effect analysis, risk estimation andcost/benefit tradeoff analysis. Each task module uses domain-independent reasoning skill along with relevant domainknowledge about a specific application system to perform thetask. For example, the cause analysis module searches for allpossible causes of a given system hazard (a situation that can

User/SystemInterface

Software SystemModel

Physical SystemModel

Hardware SystemModel

SystemEnvironment

Models

Domain-specificModules

Library ofSoftware Component

Risk Profiles

Generic Modules

Hardware ComponentFault Models

Sensors/Actuators External Devices

Human Operators Users

HazardIdentification

Risk Assessment Task ModulesEffect

AnalysisCause

Analysis

Cost/BenefitTradeoffAnalysis

Risk Estimation• Probabilistic• Qualitative

Inference ModulesReasoning

Mechanisms

System &EnvironmentConditions

Problem-Solving States

0-7803-9139-X/05/$20.00 ©2005 IEEE. 576

cause harm) by instantiating the hazard in a working module,searching for the hazardous attribute or component in a rele-vant specific system model, and traversing the system con-nection (e.g., physical connection, and logical, data or controlflow) upstream to the next component that could produce thehazard. Next, it instantiates the found component in theworking model together with all possible fault hypotheses ob-tained either from a corresponding fault model or a library ofsoftware component risk profiles. The process repeats foreach component interacting or connecting with the given haz-ardous attribute or component until all possibilities have beenexhausted or a terminal condition is satisfied. The workingmodel contains a (partial) solution of the cause analysis. Notethat the reasoning steps required in the cause analysis moduleare general in that they are applicable to any application sys-tem under investigation. However, its use is context depend-ent as it incorporates information from the specific systemmodel into the assessment. Similarly, other task modules arealso domain-independent. Thus, the risk assessment taskmodules are domain-independent but task specific.

A variety of risk assessment techniques exist includingFailure modes and effects analysis (FMEA), hazard and oper-ability (HAZOP) studies, fault tree analysis (FTA) and eventtree analysis (ETA) [12]. Recent work has applied these tech-niques to safety-critical and computer-based systems [7, 8,11]. However, they do not specifically address scalability andefficiency issues. In the context of our reasoning mechanisms,we can implement these techniques using operations based onbackward reasoning (to identify causes, e.g., FTA), or forwardreasoning (to identify effects, e.g., ETA), or a combination ofthe two (e.g., FMEA, HAZOP). Thus, existing risk assess-ment techniques can be easily incorporated into the proposedframework.

IV. ILLUSTRATIONS

In this section we give two examples to show how wemight use a risk assessment tool developed from the proposedframework. The first example gives a simple illustration of arisk assessment task in a physical device and the other exam-ple demonstrates how the tool can be utilized to assist deci-sion-making in the development of large evolving softwaresystems. For the purpose of these illustrations, developingsystem modules would be beyond the scope of this paper. In-stead, we will assume existence of information models in thesystem modules.

A. Hazard AnalysisThe techniques to assess system safety include hazard

analysis and risk analysis. The former identifies situations inwhich the system could cause harm to humans or the envi-ronment whereas the latter assesses their importance in orderto take appropriate actions (e.g., removing the problems ormitigating their effects). Hazard analysis is an important partof risk assessment. The HAZOP study is a common methodof analyzing hazards. The study starts by identifying the rele-vant interconnections between system components. These in-terconnections may be in the form of logical flow of systemfunctioning, or a flow of physical (e.g., electricity, signal data)

or abstract means (e.g., control flow). Each has propertiesmeasured by values of corresponding variables (or attributes).The process systematically assess possible deviations of theattribute values by using the “guide words” (e.g., more, less)to define different types of deviation. Each attribute of eachinteraction within the system would be investigated by deter-mining the effect of each relevant guide word. Partial exam-ple outcomes of the HAZOP study of a sensor (modified from[12]) are shown in Fig. 3. The sensor input and output interactwith the rest of the system with voltage and current as attrib-utes of interest. The cause column identifies possible hazardswhile the effect column helps determine if they are actuallyhazardous situations. When the HAZOP study is applied tosoftware, we can consider additional attributes (e.g., datavalue, data rate, response time) and guide words (e.g., incor-rect, out of range) [7, 8, 11,12].

Fig. 3. Partial results of the HAZOP study on a sensor.

The proposed framework facilitates semi-automation of theHAZOP process by employing appropriate inference reason-ing mechanisms (e.g., backward and forward reasoning) di-rectly or using risk assessment task modules (e.g., causeanalysis). The main difference between the two is that thelatter includes controls of the reasoning process. For example,Fig. 4 shows the semi-automated HAZOP procedure, whichcan be implemented in the proposed framework.RELIANCE

Fig. 4. Hazard analysis process.

Intercon-nection Attribute Guide

word Cause Effect Recommen-dation

SensorInput

InputVoltage No Regulator or

Cable faultMissing signal,

System shuts downMore Regulator

faultSensor damage Over-voltage

ProtectionLess Regulator

faultIncorrect Reading Add voltage

MonitoringSensorCurrent More Sensor fault ......... .........

Less Sensor fault .......... .........SensorOutput

OutVoltage ......... ......... ......... .........

Input: a system component SOutput: hazards related to S

For each relevant interaction with SFor each attribute involved in the interaction

For each guideword appropriate to the variable doBegin

Use the guideword to generate a deviationfrom the expected attribute value,

Identify causes (i.e., FTA, which is supportedby the cause analysis task module),

Identify consequences (i.e., ETA, which issupported by the effect analysis task rea-soning module),

Report results,Recommend protection or appropriate ac-

tions, if applicable.End

0-7803-9139-X/05/$20.00 ©2005 IEEE. 577

can use a system model to identify the hazards and their se-verity by reasoning forward to identify consequences of a givencondition of “system function” (including component failures,deviations from normal operations). It can then use the genericfault model and the library of risk profiles together with thesystem model to identify root causes by reasoning backwardfrom a hazardous event to identify all possible causes. Be-cause users may have particular interests and insights aboutcertain parts of the system, a completely automated tool wouldbe limited in use and the benefits it can offer.

B. Evolutionary Software DevelopmentConsider the following scenario. An aerospace engineer

wants to develop an autonomous software system for naviga-tion control of an exploration vehicle on Mars’ surface. Upontesting the implemented software, he finds that there are situa-tions in which the mission could fail even if the controllerfunctions correctly (e.g., speeding while turning sharply couldmake the vehicle tip over). He has to modify the vehicle andthe software to accommodate these situations. Meanwhile, heis asked to add a new capability to the software, controlling anarm attached to the vehicle for sample collection. He thenstarts another cycle of software development to create anothermodule for this additional capability. After testing that thearm control module works correctly, he integrates these twopieces of software and finds that there could be action con-flicts (e.g., moving the vehicle for navigation vs. stopping forsample collection), competing resources (e.g., vision for navi-gation vs. for sample collection), and control tradeoffs (e.g.,speed vs. precision) that could result in a mission failure.Fixing the problem would require re-coding if not a redesignof either or both pieces of the software. This is clearly a time-consuming and costly process.

The ability to assess risks of software while it is being de-signed (as opposed to while it is in operation) can help reducewasted resources and prevent unacceptable risks due to safetyfailures caused by usage of software. The earlier in the proc-ess of software development we can identify risks, the lowerthe cost of consequent damages and the more likely that theproblems can be fixed in a principal fashion rather than bypatching. As a result, we will have a software system that iseasier to maintain in the long run. A risk assessment tool canbe used to help the system designer in assessing risks based onthe design of software components in order to provide addi-tional capabilities. Fig. 5 shows a software developmentprocess that employs a risk assessment tool such asRELIANCE to support incremental and evolutionary devel-opment of large and complex software systems.

V. CONCLUSIONS

We present a framework for developing a risk assessmenttool that aims to help engineers and managers expedite theirefforts in identifying hazards and risks associated with soft-ware and their impacts on an entire system. The architectureof the framework integrates inference mechanisms with ge-neric task modules and information-based modules containingvarious system knowledge models to support automated risk

Fig. 5. Use of a risk assessment tool in software development.

assessment and analysis. The proposed framework has a desirable property in that it

provides flexible use of knowledge models (e.g., the samesystem model can be used for cause analysis as well as effectanalysis) and that it supports reuse of abstract generic infor-mation about system components (e.g., a risk profile of a sim-ple programmable component can be reused in multiple appli-cations). The latter is useful for incremental risk assessmentand reconfiguration of system design. Results are preliminaryand much work remains to be done. Our ongoing and futureresearch includes development of framework and information-based modules for a specific application system, identificationof heuristics to control model search for efficient risk analysis,and development of semi-automated techniques forcost/benefit tradeoff to facilitate decision-making processes.

REFERENCES

[1] J. Durkin, Expert Systems: Design and development, Englewood Cliffs,NJ: Prentice-Hall, 1994.

[2] C. Garrett and G. Apostolakis, “Automated hazard analysis of digitalcontrol systems,” Reliability Engineering and System Safety, Vol. 77,pp. 1-17, 2002.

[3] C. Garrett and G. Apostolakis, “Context in the Risk Assessment ofDigital Systems,” Risk Analysis, Vol. 19, No. 1, pp. 23-32, 1999.

[4] Y. Haimes, Risk Modeling, Assessment and Management, Hoboken, NJ:John Wiley & Son, 2004.

[5] B. Hayes-Roth, “A blackboard architecture for control,” Artificial Intel-ligence, 26:251-321, 1985.

[6] R. Hewett and R. Seker, “A risk assessment model of embedded soft-ware systems,” in Proceeding of IEEE/NASA Workshop on SoftwareEngineering, June 2005.

[7] T. Kletz, Computer Control and Human Error, Rugby: Institute ofChemical Engineers, 1995.

[8] J. Lawrence and J. Gallagher, “A proposal for performing softwaresafety hazard analysis,” Reliability Engineering System Safety, 55: 267-81, 1997.

[9] N. Leveson, Software: System Safety and Computers, Reading, MA:Addison-Wesley, 1995.

[10] T. Rauma, “Diagnosis information in meta-rule adaptive fuzzy systems,in Proceedings of Euromicro Conference, pp. 564-569, 1997.

[11] J. Reese and N. Leveson, “Software deviation analysis,” in Proceedingof the International Conference on Software Engineering, Boston, MA,1994.

[12] N. Storey, Safety-Critical Computer Systems, Harlow, England: Pearsonand Prentice Hall, 1996.

A design of a new software componentwith a new capability

Hazard Identification

Cause Analysis Effect Analysis

Risk Estimation

Cost/Benefits ofthe new capability

Tradeoff Analysis Decision whether toadd the new component

0-7803-9139-X/05/$20.00 ©2005 IEEE. 578

Documents

[IEEE 2005 IEEE International Engineering Management Conference, 2005. - St. John's, Newfoundland & amp; Labrador, Canada (Sept. 11-13, 2005)] Proceedings. 2005 IEEE International