[IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

Rule Extraction as a Formal Method for the Verificationand Validation ofNeural Networks

Brian J. Taylor, Maijorie A. DarrahInstitute for Scientific Research, Inc.

Fairmont, WV 26555-2720E-mail: [email protected], [email protected]

Abstract-The term formal method refers to the use oftechniques from formal logic and discrete math in thespecification, design, and construction of computer systemsand software. These techniques enable the formalization ofsoftware for development and testing so that it may be verifiedand validated in a more thorough way. Although notspecifically identified in the literature as a verification andvalidation (V&V) formal method technique, neural networkrule extraction fits the basic definition by using techniquesfrom formal logic to formalize neural network software so thatit may be examined more completely. This paper identifiesseveral areas where rule extraction can be an effective tool forthe V&V of neural networks.

I. INTRODUCTION

One example of an adaptive neural network used in asafety- and mission-critical system comes from theIntelligent Flight Control (IFC) program. This program is acollaborative effort of the NASA Dryden Flight ResearchCenter, the NASA Ames Research Center, Boeing PhantomWorks, and the Institute for Scientific Research, Inc. (ISR).This program seeks to flight-demonstrate, on board an F-1 5aircraft, a research flight control system that can adapt toaccommodate changing aerodynamic conditions, includingloss of aircraft surfaces or improper surface control. Thefirst two generations of the IFC system have utilized neuralnetworks. Since this experiment is flown with a humanpilot, the system is both safety- and mission-critical.

Safety- and mission-critical uses of neural networks arenot limited to flight control systems. Neural networks areused in vehicle health monitoring, power generation andtransmission control systems, fault detection andidentification in industrial processes, and medical diagnosissystems [1]. They have the potential for use in spaceexploration as part of a decision and control process forplanetary rovers or for improving autonomous systems.Yet while neural networks are slowly being used in a fewfields requiring high assurance, the limitations associatedwith their certification restricts their widespreadacceptance.

Consider the IFC program's use of an adaptive neuralnetwork. Since the flight control system is a research, andnot commercial, system, the certification process to approvethe adaptive neural network is not as rigorous as it would befor Federal Aviation Administration (FAA) approval.

Existing NASA standards, documentation and testingprocedures, and a physical isolation of the neural networksoftware from critical aircraft systems have thus far beenenough to qualify as adequate assurance. However, asthese adaptive control systems show promise, it is hopedsuch systems can eventually be given greater roles andeventually adopted in commercial airliners to improve theirsafety. The FAA review procedures for adaptive systemuse in a civilian aircraft will likely be very stringent andinhibit their use.

Developers in general of neural network systems havebeen cautious about expanding their use into safety- andmission-critical domains due to the complexities anduncertainties associated with these complex, adaptivesoftware systems [2]. Since adaptive neural networks arebeginning to be used within high-assurance systems, likethe IFC program, the NASA Independent Verification andValidation (IV&V) facility has encouraged research in thearea of neural network V&V to answer the question: Howcan we be sure that any system that includes neural networktechnology is going to behave in a known, consistent andcorrect manner?

Common formal method techniques like modelchecking and theorem proving appear unable to completelyaddress the V&V requirements of neural networks. Modelchecking starts from an initial state and repeatedly appliesthe transition relation to search all reachable states for aproperty violation, while remembering explored states toavoid looping [3]. Model checking seems less applicablewhen the state space is infinite or extremely large, apossibility with an adaptive neural network. In [4], the ideaof using model checking to verify properties of recurrentneural networks is discussed. The system presented in thatpaper, by the author's own conclusion, was clearlyundecidable and therefore could not be automated.

Theorem proving is the use of logical induction over theexecution steps of the program to prove systemrequirements. System requirements are translated intocomplex mathematical equations and solved by verificationexperts to prove the system is accurate [5]. Like modelchecking, theorem proving in the traditional sense does notseem to be applied to adaptive neural networks. Ratherthan an approach where the proof of requirements is doneby logical induction over the structure of the program, the

2915

approaches for adaptive neural networks deal with provingconvergence and stability. Lyapunov or stochastic methodscan be used and take the place of theorem proving forneural networks.

A formal approach that seems more suited for neuralnetwork V&V is rule extraction. Rule extraction has beenresearched by the neural network scientific community forat least the past two decades and fits the basic definition fora formal method because it uses techniques from formallogic to formalize neural network software so that it may beexamined more completely. In essence, it changes a blackbox system into a white box system by translating theinternal knowledge of a neural network into a set ofsymbolic rules. These rules have the potential for severaluses in the V&V and certification of neural networks, manyof which are discussed in the following sections.

A. Rule Extraction Overview

By design, neural networks change while training on adata set. After training, some networks are fixed whileothers are allowed to adapt during operation. It is achallenge to understand how the network will handleadditional input. Testing can give some level of confidencebut may not provide a satisfactory level in safety- ormission-critical cases.

A solution, rule extraction, is the process of developingnatural language-like syntax that describes the behavior of aneural network [6]. These techniques can convert theneural network structure to a prepositional if-then formatoffering the possibility of requirements traceability.Without this representation, it becomes difficult to identifydesign aspects of the neural network since the internalknowledge does not undergo traditional design.

The same techniques used to map rules from the neuralnetwork in rule extraction can also be used in twoadditional ways: rule initialization and rule insertion.

Rule initialization [7] is the process of giving a neuralnetwork some pre-system knowledge, possibly throughearly training or configuration. A system developer mayhave improved confidence if the starting condition of theneural network is known, which may lead to a constrainedpath of adaptation.

Rule insertion [8] is the method of moving symbolicrules back into a neural network, forcing the knowledge toincorporate rule modifications or additional rules. Asystem developer can use this scheme to exert a conditionor reinforce conditions within an adaptive neural network.Examples of this include restricting the neural network to aregion of the input space or instructing it to deliberatelyforget some data it has already seen.

B. Rule Formats

There are several main rule formats. Rule extractionalgorithms will generate rules of either conjunctive form or

subset selection form, commonly referred to as M-of-Nrules named for the primary rule extraction that makes useof the form. All rules follow the natural languagesyntactical if-then prepositional form.

Conjunctive rules follow the format:

IF condition I AND condition 2 AND condition 3THEN RESULT

Here the RESULT can be of a binary value(TRUE/FALSE or YES/NO), a classification value(RED/WHITE/BLUE), or a real number value (0.18).

The condition can be either discrete (flower is RED,ORANGE or YELLOW) or continuous (0.25 < diameter <0.6). The rule extraction algorithm will search through thestructure of the network, and/or the contents of a network'straining data, and narrow down values across each inputlooking for the antecedents (conditions) that make up therules.

Subset rules, or M-of-N rules, follow the format:

IF (M of the following N antecedents are TRUE)THEN RESULT

Cravin and Shavlik explain that the M-of-N rule formatprovides more concise rule sets in contrast to the potentiallylengthy conjunctive rule format [9]. This can be especiallytrue when a network uses several input parameters.

C. Rule Extraction Categories

Andrews [7] identifies three categories for ruleextraction procedures: decompositional, pedagogical, andeclectic.

Decompositional rule extraction involves the extractionof rules from a network in a neuron-by-neuron series ofsteps [10]. This process can be tedious and result in largeand complex descriptions. The drawbacks todecompositional extractions are time and computationallimitations. The advantages of decompositional techniquesare that they do seem to offer the prospect of generating acomplete set of rules for the neural network.

Pedagogical rule extraction [11] is the extraction of anetwork description by treating the entire network as ablack box. In this approach, inputs and outputs are matchedto each other and the rule extraction algorithm is a machine-learner approach. The decompositional approaches canproduce intermediary rules that are defined for internalconnections of a network, possibly between the input layerand the first hidden layer. Pedagogical approaches do notresult in these intermediary terms. Pedagogical approachescan be faster than the decompositional, but they aresomewhat less likely to accurately capture all of the validrules describing a network's contents.

The eclectic approach is merely the use of thosetechniques that incorporate some of a decompositional

2916

approach with some of a pedagogical approach. The Rule-Extraction-As-Learning (REAL) method [9], for example,is designed such that it can use either technique.

II. RULES FOR V&V

For purposes of V&V, rule extraction is used to modelthe knowledge that the neural network has gained whiletraining or adapting [2, 12]. The rules give insight into theworkings of the neural network and may also be used tocheck against basic system requirements. The rulesextracted are generally represented by a set of if-thenstatements that may be examined by a human. If the neuralnetwork is fixed after training, then the extracted rulesmodel the way the neural network will handle other datathat is processed with a confidence level based upon therule extraction technique used. If the neural network is a

real-time adaptive neural network, then rule extraction can

be done for one point in time to establish what the systemlooks like at that instance. Repeated application of ruleextraction will yield an understanding of the progression ofthe network during adaptation.

Rule extraction, rule initialization, and rule insertion can

all be used for V&V purposes throughout the developmentlife cycle, including the activities of Concept,Requirements, Design, Implementation, and Testing.

Requirements for adaptive systems are difficult to writebecause the system changes during training and operationand developers may be at a loss as to how to define theintended system knowledge. If a rule-like syntax is used tocreate knowledge requirements, then extracted rules fromthe neural network structure can be used for directcomparison against the requirements.

The extracted rules can also undergo design teamreview and analysis to detect improper network behaviorsor missing knowledge. A system analyst might be able toascertain novel learning behaviors that had not beenpreviously recognized. By translating these features intocomprehensible logical statements, the analyst can gain notonly a better understanding of the network's internalknowledge, but perhaps of the input domain as well.

Rule extraction algorithm uses in several specific areas

of V&V are discussed below. Specific examples showingrule extraction in V&V can be found in [10].

A. Symbolic Rulesfor Knowledge Requirements

As mentioned before, creating adaptive systemrequirements can be difficult. The initial requirements foran adaptive system may be intentionally incomplete (hencethe need to include an adaptive system). The requirementsshould address the two aspects of the system shown in Fig.1, namely control and knowledge.

The control type requirements are the type that wouldtypically be generated for any software system. Theknowledge type requirements are more difficult to write and

must fully address the adaptive nature of the system andhow it can evolve over time.

Fig. 1. Requirements for Adaptive Systems

Early in the process of developing adaptive systemrequirements, the adaptive behavior must be clearly definedand documented. Project documents at early stages shouldcontain high-level goals for the adaptive system behaviorand knowledge. These high-level goals should be stated inearly documents, such as the Project Plan, and then betraceable through Systems Requirements, Software andInterface Requirements, and Design documents. The high-level goals are a representation of the problem to be solved,possibly in the form of informal descriptions.

From the high-level goals, the system levelrequirements can be developed. One approach todeveloping requirements related to knowledge in neuralnetworks, and other adaptive systems, is to model, or

describe the knowledge to be acquired in the form of rules[13]. Kurd's model is summarized in the next section.Experts, or those involved in development, can translateknowledge requirements into symbolic rules labeled as

initial knowledge.In Fig. 2, the top row shows the typical requirements

developed for the system. The second row of boxes showshow this initial knowledge can lead to refined knowledge as

the system is trained and then either to intermediateknowledge given to the adaptive system before deployment

System Software

Requirements RequirementsControl

Knowleg Initeal Kexmde RefinedKn owledge

Fig. 2. Requirements for Adaptive Systems (Controland Knowledge)

or final knowledge if the system is fixed. These two rows

represent the difference between the control and knowledgetype requirements.

2917

Requirements for Adaptive Systems

Control Knowledge

How the adaptive system VVhat knowledge theacquires knowledge and adaptive system is toacts on the knowledge acquire

\1 I/ j~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

B. Rule Insertionfor Hazard Mitigation

Kurd, Kelley, and Austin [13] describe a model thatuses rule initialization, extraction, and insertion and tieshazard analysis into the development of the neuralnetworks' knowledge and specifically addresses neuralnetworks developed for safety-critical applications.

The following information is summarized from [13].The authors describe a process for developing neuralnetworks considering the safety criteria that must beenforced to justify the safety of the neural network inoperation.

Fig. 3 illustrates the development and safety life cycle.The description of the three main levels and the majorstages in the diagram are described below.

> Requirements DeliveredInitfal Hazard List Safe Platform

Re2Ftsn A ~~~SymbolicRe'''-;Xt / Level

/ Level

\,/ \/ ~~~~Neural LeamiNngFig. 3. The Safety Life Cycle for Artificial Neural

Networks 1131

Symbolic Level: This level is associated with thesymbolic information. At this level the gathering andprocessing of initial knowledge occurs, as well asevaluating extracted knowledge gathered post-learning.

Translation Level: This level is where the symbolicknowledge and the neural architectures are combined orseparated through the use of rule insertion and ruleextraction techniques.

Neural Learning Level: This level uses neural learningto refine the symbolic knowledge through the training ofthe neural network using learning algorithms.

The major stages that outline the process follow the "W"model in the development life cycle. These stages are* Determination of requirements: These requirements

describe the problem to be solved in informal termsand are intentionally incomplete.

* Sub-initial knowledge: Knowledge given by domainexperts during a Preliminary Hazard Identification(PHI) is translated into logical rules.

* Initial knowledge: The rules that describe the sub-initial knowledge are converted into symbolic formsduring Functional Hazard Analysis (FHA). Theseforms are compatible for translation into the neuralnetwork structure.

* Dynamic learning: Suitable learning algorithms andtraining sets are used to refine the initial symbolicknowledge and add new rules to reduce the error in the

output. This may result in topological changes to theneural network.

* Refined symbolic knowledge: Knowledge refined bythe learning is extracted using appropriate algorithms.This results in a new set of rules that can be analyzed.

* Static learning: Refined symbolic knowledge may bemodified by domain experts and re-inserted into theneural network. This further refines the knowledge inthe neural network but does not allow topological orarchitectural changes.

* Knowledge extraction: This can be performed at anytime during static learning to get a modified rule set.

C. Rule Extractionfor Traceability

Neural network rule insertion/extraction can facilitaterequirements traceability of top-level system requirements,in the form of knowledge requirements, to softwarerequirements in the form ofmore specific rule bases.

For the neural network knowledge the traceability fromsource code to design specifications is more difficult. Asthe neural network learns and adapts, the changes that occurin the internal parameters and structure act like source codemodifications. However, implementation traceability canbe achieved by running rule extraction algorithms from atrained neural network. The extracted rules can becompared directly against the design rules, or the extractedrules can be compared to higher-level software and systemknowledge rules.

The refinement of requirements into software rules anddesign rules are presented here and shown in Fig. 4.

Controle

KnowAedge

Fig. 4. Use of Rule Extraction to Develop SystemKnowledge

One approach to developing the functional requirementrelated to the neural network knowledge may be to model,or describe the knowledge to be acquired, in the form ofrules. This would allow the neural network requirements tohave a sufficient level of detail so they can be tracedthroughout the life cycle documentation. The followingthree-step process outlines how this can be achieved.

Step 1. Translate System Specifications dealing withthe function, basic knowledge, and constraints of the neuralnetwork into initial symbolic information in the form of arule base for the system. These rules can also includeinformation from the PHI and the FHA to address what theneural network is not allowed to do or learn. These initial

2918

rules should be translated into a format that can be insertedinto the neural network (architecture specific).

Step 2. Refine initial rules and translate them intosoftware requirements. The process of refining the rulesmay include prototyping the neural network, inserting theinitial symbolic knowledge, training the neural network toallow the rules to adapt to the training sets, and extractingthe rules.

Step 3. Translate refined rules into a format (neuralnetwork architecture specific) that can be inserted into theneural network.

For projects that lack well defined knowledgerequirements, another method for performing traceabilitymay be had by using machine learners on training data inconjunction with rule extraction techniques on trainedneural networks. After the training sets have beendeveloped to reflect requirements, the neural network istrained and a set of rules is extracted. Machine-learningtechniques, like a decision tree, are then used on the sametraining set to produce a second set of rules. The neuralnetwork extracted rules can be compared against themachine learning results, enabling a kind of traceabilitybetween design intent (the machine learning rules) andimplementation (the neural network rules.)

D. Rule Extractionfor Testing

Rule extraction can be used in several ways to improvethe testing V&V activities for trained neural networks,whether they be fixed prior to system usage or just given abasic set of knowledge before deployment and onlineadaptation. Proof of correct operation (and thus correctlearning) can be evidenced through knowledge evaluationcriteria achieved via extracted rules. These rules can beassessed with their own pass/fail criteria like the number ofrules, complexity of the rules (e.g., number of antecedents),or rule size (e.g., graphical dimensions - too large/toosmall).

Extracted symbolic rules can direct test data generationand highlight specific areas within an input domain fortesting. An example is the use of symbolic rules to act as aform of data generation. If the rules are of a mathematicalequation form, randomly generating inputs for theantecedent part of the rule allows computation of aconsequent. Consequents can be collected, combined withthe randomly generated antecedents, and formed into testdata sets.

System testing can be difficult due to a lack of well-specified knowledge requirements. If there are only fewsystem requirements related to knowledge, the system testdesign should focus on paying special attention to boundarydata. In this circumstance, extracted symbolic rulesdescribing the neural network can guide the generation ofdata near the system knowledge boundaries.

If the neural network uses real-time V&V in the form ofan operational monitor, the symbolic rules of the neural

network can aid in testing the monitor. The symbolic rulesexpose the behavior of the neural network that in turnfacilitates selection of input domain regimes to exercise themonitor.

Assertion testing is another area within testing that ispossible once rules are extracted. An assertion is astatement describing a specific condition against thesystem. Assertions can be stated that represent system orsoftware requirements and possible hazards.

Once an assertion is created, it can be checked againstthe extracted symbolic rules allowing domain experts theability to analyze the rules to identify discrepancies fromthe intended design. Testers can use assertions to determineif certain scenarios like specific input combinations orpossible outputs can occur given the neural networkknowledge.

E. Rule Insertion/Refinementfor System Stability and FaultTolerance

Online adaptive neural networks continue to learn andchange during system operation. In some cases, a neuralnetwork may be pre-trained prior to deployment andadditional learning directs the network away from thisoriginal knowledge foundation. If the neural network is tobe initialized with some a priori knowledge that representsthe design specification, then this knowledge can be writtenin rule format and techniques used for rule insertion can berepeated during operation to steer the neural network backto this basic set of knowledge.

This method provides a sense of system stabilitybecause the network can be allowed to adapt to whateverthe operational environment encounters but the network canalso be reminded of important core knowledge.

Fault recovery could also be accomplished via symbolicrules like rule refinements, rule re-insertion, or somecombination. If a system is somehow flagged as beingerroneous, basic knowledge can be re-inserted back into thenetwork to try to recover from the problem. Another optionis the complete reloading of a neural network withknowledge that is deemed acceptable from a previoussystem state or from the bare knowledge given at systemdelivery and installation.

F. Rule Extractionfor Operational V& V

A prevalent technique to evaluate correct systembehavior is the use of operational monitors, sometimesreferred to as run-time monitors. These separate softwaremodules exist as oracles, continuously judging the neuralnetwork performance based upon some pre-defined criteriaestablished by the system designers. These observationslead to a system specific decision-making process orcontrol, especially if a problem is found.

One possible operational monitor design is the real-timeextraction of rules from a neural network for comparison

2919

against a set of acceptable rule conditions. Allowedconditions can be based on system requirements, hazardlimits, or perhaps boundary conditions.

The feasibility of this approach is still underinvestigation. Rule extraction offline is well documentedand easily accomplished. Rule extraction on a complex,continually adapting neural network at high computationspeeds remains to be established.

III. CONCLUSIONS

Neural networks lack the ability to explain how theyreached a specific output. This is one of the main reasonsneural networks are not trusted. In most applications userswant to know the reasoning behind the conclusion of thelearning system or expert system.

Rule extraction algorithms provide a means for eitherpartially or completely decompiling a trained neuralnetwork. This is seen as a promising vehicle for at leastindirectly achieving the required goal of enabling acomparison to be made between the extracted rules and thesoftware specifications.

One of the remaining issues with rule extraction is thelack of tools to assist with extracting and using rules. NaYverule extraction could result in the creation of large sets ofrules that are too unwieldy for human comprehension.Automated tools, like the one mentioned in [10] couldreduce that problem.

Rule extraction from neural networks may have greaterutility for fixed neural networks than for dynamic neuralnetworks. Fixed neural networks proceed through the stepsof training and testing until they reach an acceptable errorthreshold and only then are they used within a system. Theknowledge of the domain is considered embedded insidethe weights and connections of the network. If the networkis no longer encouraged to adapt, the symbolic rulesextracted to describe it can be a useful tool to validate thatnetwork at the time of extraction.

With a dynamic neural network, symbolic ruleextraction may be required at intermediate stages in thelearning. At some intermediate points symbolic ruleswould be extracted and passed through an oracle or systemmonitor to confirm that the network is still "correct." Itmay be that the maximum benefits for dynamic systems liewith rule insertion or rule initialization.

ACKNOWLEDGMENT

This research was sponsored by NASA GoddardResearch Center through the NASA IndependentVerification & Validation Facility, under Research GrantNAG5- 12069.

REFERENCES

[1] Lisboa, P. 2001. Industrial use of safety-related artificialneural networks. Health and Safety Executive ContractResearch Report 327.

[2] Taylor, Brian J., Marjorie Darrah, Laura Pullum, et. al.2004. Methods and Procedures for the IndependentVerification and Validation of Neural Networks. TechnicalReport prepared by Institute for Scientific Research, Inc. forNASA Independent Verification and Validation Facilityunder grant NAG5-12069.

[3] Pecheur, Charles. 2000.Verification and Validation ofAutonomy Software at NASA. NASA/TM 2000-209602.

[4] Rodrigues, Pedro, Jose Felix Costa, and Hava T.Siegelmann. 2001. Verifying Properties ofNeural Networks.IWANN (1) 2001: 158-165.

[5] Pecheur, Charles and Stacy Nelson. 2002. V&V ofAdvanced Systems at NASA.

[6] Tickle, Alan B.; R. Andrews; M. Golea; and J. Diederich.1998. The truth is in there: directions and challenges inextracting rules from trained artificial neural networks.

[71 Andrews, Robert; J. Diederich; and A. B. Tickle. 1995. ASurvey and Critique Of Techniques For Extracting RulesFrom Trained Artificial Neural Networks. Knowledge BasedSystems 8:373-389.

[8] Tan, A. 1997. Cascade ARTMAP: Integrating NeuralComputation and Symbolic Knowledge Processing. IEEETransactions on Neural Networks, 8 (2) 237-250.

[9] Craven, Mark; and J.W. Shavlik. 1994. Using Sampling andQueries to Extract Rules from Trained Neural Networks. InProceedings of the 1 Ith International Conference onMachine Learning 37-45.

[10] Darrah, Marjorie, Brian Taylor and Michael Webb. AGeometric Rule Extraction Approach used for Verificationand Validation of a Safety Critical Application. 2005Florida Artificial Intelligence Research Society Conference,Clear Water Beach, FL, May 16-18, 2005.

[11] Nayak R., R. Hayward and J. Diederich. 1997."Connectionist Knowledge Base Representation by GenericRules from Trained Feedforward Neural Networks",Proceeding of Connectionist Systems for KnowledgeRepresentation and Deduction Workshop, Townsville,Australia, July 13-19.

[12] Taylor, B. J., Marjorie Darrah, Spiro Skias. 2004. Weavingit All Together - A Methodology for the Verification andValidation of Adaptive Neural Networks. NIPS-2004Workshop on Verification, Validation, and Testing ofLearning Systems. Whistler, British Columbia, December2004.

[13] Kurd, Zeshan, and Tim Kelley. 2003. Safety Lifecycle forDeveloping Safety Critical Artificial Neural Networks. InProceedings of 22nd International Conference on ComputerSafety, Reliability, and Security (SAFECOMP'03) 23-26September 2003.

2920

Documents

[IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural