A Constraint-Based Genetic Algorithm Approach for Rule ...w3.et.uch.edu.tw/hsupl/upload/Pei-1.pdf · Rule induction is one of the most common forms of knowledge discovery. It is a

A Constraint-Based Genetic Algorithm Approach for Rule Induction Abstract

Data Mining aims to discover hidden valuable knowledge from a database. Existing genetic algorithms (GAs) designed for rule induction evaluate the rules via a fitness function. The major drawbacks of applying GAs for rule induction include computation inefficiency, accuracy and rule expression. In this paper we propose a constraint-based genetic algorithm (CBGA) approach to reveal more accurate and significant prediction rules. This approach allows the constraints to be specified as relationships among attributes according to predefined requirements, the user’s preferences, or partial knowledge in the form of a constraint network. Constraint-based reasoning is employed to produce valid chromosomes using constraint propagation to assure that the genes comply with the predefined constraint network. The proposed approach is compared with a regular GA using a medical data set. Better computational efficiency and more accurate prediction results from the CBGA are demonstrated.

Keywords:

Constraint-Based Reasoning; Genetic Algorithms; Rule Induction

1. INTRODUCTION

Revealing valuable knowledge hidden in corporate data becomes more critical for enterprise decision making. When more data is collected and accumulated, extensive data analysis is not easy without effective and efficient data mining methods. In addition to statistical and other machine learning methods, the recent development of novel or improved data mining methods such as Bayesian networks [7], frequent patterns [30], decision or regression trees [13,14], and evolution algorithms [2,4,8,11] have drawn more attention from academics and industry.

Rule induction is one of the most common forms of knowledge discovery. It is a method for discovering a set of "If / Then" rules that can be used for classification or estimation. That is, rule induction is able to convert the data into a rule-based representation that can be used either as a knowledge base for decision support or as an easily understood description of the system behavior. Basically rule induction features the capability to search for all possible interesting patterns from data sets. Most rule induction methods discover the rules by adopting local heuristic search techniques. Some other rule induction methods employ global search techniques, such as the genetic algorithm (GA) [17] evaluate the entire rule set via a fitness function rather than evaluating the impact of adding or removing one condition to or from a rule. However the GA’s major drawback is its heavy computation load when the

1

search space is huge. In general rule induction method is designed aims to produce a rule set that can

predict the expected outcomes as accurately as possible. However the emphasis on revealing novel or interesting knowledge has become a recent research issue in data mining. These concerns could result in additional rule discovery constraints, and thereby produce additional computation overhead. For regular GAs operations, constraint validation is proceeded after a candidate chromosome is produced. That is, several iterations may be required to determine a valid chromosome. One way to improve the computation load problem is to prevent the production of invalid chromosomes before a chromosome is generated; thereby accelerating the evolution process. In other words, this could be improved by embedding a well-designed constraint mechanism into the chromosome-encoding scheme.

In this research we propose a novel approach that integrates constraint-based reasoning with GAs to discover rule sets. Constraint-based reasoning is a process that incorporates various inference techniques including local propagation, backtrack free search and tree-structured reduction. The constraint-based reasoning mechanism is used to push constraints along with data insights into the rule set construction. This research applies hybrid techniques from local propagation and tree search approaches to assure local consistency before continuing the search in a GA process. Local propagation can reduce the search space from possible gene values that can not meet predefined constraints. This approach allows constraints to be specified as relationships among attributes according to predefined requirements, user preferences, or partial knowledge in the form of a constraint network. In essence, this approach provides a chromosome-filtering mechanism prior to generating or evaluating a chromosome. Thus insignificant or irreverent rules can be precluded in advance via the constraint network.

Proposition logic [11] is a popular representation used in rule induction systems. However, our proposed approach allows first order logic to formulate the knowledge in the form of linear inequations. This enhances the expressive power of the rules set to model the data behavior in a more effective way.

The remainder of this paper is organized as follows. In Section 2, previous research works and related techniques are reviewed. The detailed constraint-based GA procedure is then introduced in Section 3. Section 4 presents the experiments and results with medical data records followed by discussion and conclusions.

2. Literature Review

2.1 Genetic Algorithm for Rule Induction

2

Rule induction methods may be categorized into either tree based or non-tree based methods. Quinlan [29] introduced techniques to transform an induced decision tree into a set of production rules. Some of the often-mentioned decision tree induction methods include C4.5 [29], CART [3] and GOTA [16] algorithms.

Michalski et al. [24] proposed AQ15 algorithms to generate a disjunctive set of classification rules. The CN2 rule induction algorithms also use a modified AQ algorithm that involves a top-down beam search procedure [6]. It adopts entropy as its search heuristic and is only able to generate an ordered list of rules. The Basic Exclusion Algorithm (BEXA) is another type of rule induction method proposed by Theron and Cloete [30]. It follows a general-to-specific search procedure in which disjunctive conjunctions are allowed. Every conjunction is evaluated using the Laplace error estimate. More recently Witten and Frank [31] described covering algorithms for discovering rule sets in a conjunctive form.

GAs have been successfully applied to data mining for rule discovery. Greene and Smith [15] and Noda et al. [26] proposed algorithms that used one-rule-per-individual encoding. In this approach, a chromosome can usually be identical to a linear string of rule conditions, where each condition is often an attribute-value pair. Although the individual encoding is simpler and syntactically shorter, the problem is that the fitness of a single rule is not necessarily the best indicator of the quality of the discovered rule set. The several-rules-per-individual approach [9,20] has the advantage of considering its rule set as a whole by taking the rule interactions into account. However, this approach makes the chromosome encoding more complicated and syntactically longer, usually requiring more complex genetic operators.

Hu [19] proposed a Genetic Programming (GP) approach in which a program can be represented by a tree with rule conditions and/or attribute values in the leaf nodes and functions in the internal nodes. The challenge is that a tree can grow in size and shape in a very dynamic way. An efficient tree-pruning algorithm would be required to prune unsatisfactory parts from the tree to avoid infeasible solutions. Bojarczuk et al. [2] proposed a constrained–syntax GP approach to build a decision model. The emphasis was on the discovery of comprehensible knowledge. The constraint-syntax mechanism was applied to verify the relationship between operators and operand data types during tree building.

To discover high-level prediction rules, Freitas [11] applied first-order relationships such as “Salary > Age” by checking an Attribute Compatibility Table (ACT) during the discovery process with GA-Nuggests. ACT was claimed especially effective in knowledge representation. However, other complicated attribute relationships such as the linear or non-linear quantitative relationships among multiple

3

attributes were not discussed in the ACT.

2.2 Constraint Satisfaction in Genetic Algorithms Constraint satisfaction involves finding values for problem variables associated

with constraints on acceptable solutions to satisfy these constraints. Problem solving in a constraint satisfaction problem (CSP) that belongs to the NP-Complete problems normally lack suitable methods. A number of different approaches have been developed for solving the CSP problems. Some of them adopted constraint propagation to reduce the solutions space. Others tried “backtracking” to directly search for possible solutions. Some applied a combination of these two techniques including tree-search and consistent algorithms to efficiently determine one or more feasible solutions. Nadel [25] compared the performance of several algorithms including ‘generate and test’, ‘simple backtracking’, ‘forward checking’, ‘partial lookahead’, ‘full lookahead’, and ‘really full lookahead.’ The major difference in these algorithms is the degree of consistency performed at the node during the tree-solving process. Besides the ‘generate and test’ method, others performed hybrid techniques. In other words, whenever a new value is assigned for a variable, the domains of all unassigned variables are filtered and left only with those values that are consistent with the variable being assigned. If the domains of any of these uninstantiated variables become empty, the contradiction is recognized and backtracking occurs. Freuder [12] mentioned that if a given CSP has a tree-structured graph, then it could be solved without any backtracking. That is, solutions can be retrieved in a backtrack-free manner. Dechter and Peral [10] used this theory coupled with the notion of directional consistency to generate backtrack-free solutions more efficiently.

Dealing with constraints for search space reduction seems to be an important research issue in many artificial intelligence areas. GAs maintain a set of chromosomes (solutions), called a population. The population consists of parents and offspring. When the evolution process proceeds, the best N chromosomes in the current population are selected as parents. Through genetic operators, offspring are selected according to a filtering criterion that is usually expressed as fitness functions along with some predefined constraints. The GA evolves over several generations until the stopping criteria are met. However valid chromosomes are usually produced by trial and error. That is, a candidate chromosome is produced and tested against the filtering criterion. Therefore a GA may require more computation, especially in dealing with complicated or severe filtering criterion. To resolve this problem, an effective chromosome construction process can be applied to the initialization, crossover, and mutation stages.

4

Garofalakis [13] provided a model constraint-based algorithm inside the mining process to specify the expected tree size and accuracy. In other words, constraints can be used to express the trade-off between the model accuracy and computation efficiency associated with the tree-building or tree-pruning process. Similar research work, CADSYN [23], adopted a constraint satisfaction algorithm for case adaptation in case based reasoning. Purvis [28] applied a repair-based constraint satisfaction algorithm to aid in case adaptation. By pushing constraints into the case adaptation process, the constraint satisfaction algorithm is able to aid a CBR system to produce decent solution cases more efficiently for the discrete based constraints problem.

Barnier and Brisset [1] developed a hybrid system of genetic algorithm and constraint satisfaction techniques. This method was adopted to resolve the optimization problems for vehicle routing and radio link frequency assignments. This approach applied a GA to reduce the search space for a large CSP instead of applying constraint satisfaction techniques to improve the GA’s computation efficiency. Kowalczyk [21] pointed out the concept of using constraint satisfaction principles to support a GA in handling constraints. However few research works applied constraint based reasoning to effectively handle GA computation inefficiency or on how the user’s knowledge can be presented and processed within a constraint network.

3. The Proposed Rule Induction System

We developed a rule induction system that consists of three modules: the

user-interface, the symbol manager, and constraint-based GA (CBGA). According to Fig. 1, the user interface module allows users to execute the following system operations including:

Loading a constraint program, Adding or retracting the constraints, Controlling the GA’s parameter settings, and Monitoring the best solutions. The constraint program here is a set of any first order logic sentence (atomic,

compound or quantified) about a many-sorted universe of discourse that includes integers, real numbers, and arbitrary application-specific sorts. The details can be found in Lai [22].

The symbol manager examines the syntax of the first order logic sentences in the constraint program and translates the syntax into a constraint network for further processing.

In the CBGA module, the constraint-based reasoning filters each gene value and processes both the GA initial population and regular populations. To speed up the

5

reasoning process, both the variable ordering [27] and backtrack-free search methods [12] are adopted in the CBGA to derive contradiction-free chromosomes.

(INSERT Fig. 1. The Conceptual Diagram of the Proposed Rule Induction

System)

As shown in Fig. 2, each chromosome Ci of a regular population with size N is processed by constraint-based reasoning in sequence. The genes in a chromosome can be viewed as a subset of variables. The valuation scope of each gene gij is restricted via constraint–based reasoning. This system efficiently transforms human knowledge, such as expert experience or common sense into a constraint network leading to more significant rule sets. The search efforts for valid chromosomes can be reduced without having to activate the fitness evaluation procedure on each candidate chromosome.

(INSERT Fig. 2. The Framework of Constraint-Based Preprocessing for GA

Operators)

Fig. 3 illustrates the details of the chromosome construction process. In essence, the CBGA applies local propagation for chromosome filtering. The basic concept of local propagation [12] involves using the information local to a constraint to validate gene values. Local propagation also restricts the valid gene range referenced in the constraint. The satisfactory gene values (i. e., the satisfactory universe) are propagated through the network thus enabling other constraints to impound other valid range sets for the remaining genes.

According to the restricted valid range denoted by the satisfactory universe SGj, the valuation process for gene gij is then activated to examine the inferred gene value. The new g’ij value is replaced by a value randomly selected from SGj if the inferred gene value is inconsistent with the constraint network. By repeatedly applying local propagation and the valid valuation process on chromosome Ci in the sequence gi1, gi2, … , gim, the new chromosome C’i is thus able to satisfy the constraint network. As a result local propagation offers an efficient way to guide the GA toward searching for the best chromosome by reducing the search space already filtered by the constraint network.

(INSERT Fig. 3. The Detail Illustration for Chromosome Screening)

4. Design of the CBGA for Rule Induction

6

4.1 The Medical Data Set A synthetic medical database involving patient information including the age,

sex, blood pressure (BP), Cholesterol (Cho) status, Na and K values and the quantity (Qty) and frequency (Freq) for taking a drug. The prediction attribute is one of five drug types, including Drug A, Drug B, Drug C, Drug D and Drug E.

This data set assumes that physicians may prefer a certain medical prescription expressed in the following description:

Drug A and Drug B are used if the blood pressure is high. If the blood pressure is low and Cholesterol is high, then Drug C is used. If the Cholesterol is low and Drug D is being used, the quantity is suggested as

up to 3 units. If the blood pressure or Cholesterol are high, Drug E is used under the condition that “the frequency is less than twice a day.”

4.2 Knowledge Representation as the First Order Logic

The above medical professional preferences can be considered predefined rules that can be represented by a constraint program. The rule can be represented using propositional logic, so that a data mining algorithm is able to perform more efficiently. However the employment of propositional logic in the rules set is rather limited for the expressive power to model the data behavior. For example, a rule between two different attributes such as the condition “Na > K” can not be determined. Instead, the first order logic representation can conquer this limitation, but it usually requires more computation efforts to search for any possible relationship between any two attributes.

In the CBGA the relationship between any two attributes could be formulated as a first order logic except for the relationship between the quantitative and qualitative attributes. For instance, the age with a numerical value (1~100) cannot simultaneously be compared with a BP status that uses symbolic features {High, Normal, and Low}. To consider the different numerical ranges among attributes, we propose a weighting linear form such as “Attribute i <= w * Attribute j” to determine further data insights not available from conventional classification techniques such as decision trees.

In addition to allowing the relationship between any two attributes to be presented, rule sentences based on first order logic to model common human knowledge can be easily formulated using a constraint program. The program is then translated into a constraint network to be processed by the constraint-based inference engine developed in our earlier work [18,22]. In this example the predefined rules can be described using the following universal quantification.

ALL X: DrugA(X) and X.BP=High; ALL X: DrugB(X) and X.BP=High;

7

ALL X: DrugC(X) and X.BP=Low and X.Cho=High; ALL X: DrugD(X) and X.Cho=Low implies X.Qty=3; ALL X: DrugE(X) and (X.Cho=High or X.BP=High) implies X.Freq<=2; Domain CompareVar =:= {Age, Na, K, Qty, Freq}; ALL X, Y: CompareVar(X) and CompareVar(Y) and X<>Y.

4.3 A Chromosome Encoding

In this method a rules set is encoded as a chromosome based on several-rules-per-individual approach. This approach allows a rules set to describe all possible conditions associated with a single drug type to consist of at least one rule within a rules set. Each drug type should possess it’s own rules set, typically represented in the following format:

(“IF cond1,1 AND … AND cond1,n THEN drug= A ” or “IF cond2,1 AND … AND cond2,n THEN drug =A” or “IF cond3,1 AND … AND cond3,n THEN drug=A” ) and

(“IF cond41 AND … AND cond4,n THEN drug= B ” or “IF cond51 AND … AND cond5,n THEN drug =B”) … and

(“IF condm-1,1 AND … AND condm-1,n THEN drug= E ” or “IF condm,1 AND … AND condm,n THEN drug =E”)

For example, the rule set for Drug A can be encoded in the following rules shown in Table 1.

IF (Age >=25) AND (Sex=”M”) AND (K>=0.03) THEN Drug A or IF (BP=High) AND (Freq>=3) AND (Na<=1.2*K) THEN Drug A

(INSERT Table 1: The Rule Representation for Drug A)

Table 2 presents the predefined range (domain) for each user-defined gene in one

rule. The Gene Name “Attribute enabled/disabled” denotes whether an attribute is adopted in the condition part within a rule.

(INSERT Table 2: The Detail Specification of Gene Encoding for One Rule)

4.4 Initial Population

8

To generate a valid chromosome in the beginning, an initial population can be derived through an individual screening process using constraint-based reasoning. Whenever the gene is randomly instantiated from the satisfied universe the potential values applied to the remaining genes will be restricted by the constraint propagation to assure constraint network consistency. In this way the constructed chromosomes in the CBGA can be guaranteed to be valid. 4.5 Fitness Function

When using a rule to classify a given patient record, four types of results can be observed for the prediction model. These include:

True positive: this rule predicts that the patient uses a given drug and the patient does use it;

False positive: this rule predicts that the patient uses a given drug but the patient does not use it;

True negative: this rule predicts that the patient does not use a given drug but the patient does not use it;

False negative: this rule predicts that the patient does not use a given drug, but the patient does use it.

In our approach, the fitness function is defined as the number of errors consisting of false positive and false negative cases. The formal definition that specifies the valid rules can be stated as follows.

),()( ,1

,1

jk

l

jji

l

jRR

ki

¬∧∨==

where niik ,...,1,1,...,1 +−= mnlll n *...21 <=+++ .

Rij denotes the rule associated with Drug i, ¬

4.6. Generic Operators

To exchange information between different rules a uniform crossover is used to swap the gene values to generate new valid offspring. The mutation operator is applied by replacing random values based on the gene type whose values are randomly selected from the predefined range.

For example, the value of a certain gene for the following offspring is

9

determined by applying a uniform crossover according to the flag that denotes if a certain gene value will be swapped. The detailed operations are illustrated as follows.

Step1: the parent is defined as follows.

Parent 1

Age<=10 Sex=F BP=High Cho=Low Na<=0.67 K>=0.3 Qty=2 Freq=2 Age>0.5*Qty

Parent 2

Age>=20 Sex=M BP=Normal Cho=High Na<=0.3 K>=0.5 Qty<3 Freq=5 Qty>1.2*Freq

Step2: the outcome of the offspring is determined by swapping the gene values based on the flags.

Flags 1 0 0 1 1 0 1 0 0 Offsrping1 Age>=20 Sex=F BP=High Cho=High Na<=0.3 K>=0.3 Qty=2 Freq=2 Age>0.5*QtyOffspring2 Age<=10 Sex=M BP=Normal Cho=Low Na<=0.67 K>=0.5 Qty<3 Freq=5 Qty>1.2*Freq

Step 3: mutation is incurred at the BP attribute of offspring 1 and its value is

replaced by a randomly assigned value ‘LOW’ described as follows..

Offsrping1 Age>=20 Sex=F BP=Low Cho=High Na<=0.3 K>=0.3 Qty<3 Freq=2 Age>0.5*QtyOffspring2 Age<=10 Sex=M BP=Normal Cho=Low Na<=0.67 K>=0.5 Qty=2 Freq=5 Qty>1.2*Freq

5. Experimental Results and Discussions This research adopted 600 synthetic medical data records. Each record contains

eight variables including the age, sex, blood pressure, degree of Cholesterol, Na and K degrees, the quantity and frequency for taking a given drug.

Among these data sets, 2/3 data were selected for training examples and the remaining were used for the test examples. The model evaluation was based on a three-fold cross validation. One hundred organisms in the population were used for the GA control parameters. The crossover rate was 0.6 and the mutation rate ranged from 0.01 to 0.05 in the initial settings. The entire training process proceeded until either a fixed number of generations or the fixed run time was met.

Fig. 4 shows the results up to 200 generations. The X-axis denotes the GA generations and the Y-axis represents the average errors based on the results from three-fold cross validation. Fig. 5 displays the comparison results from a regular GA and CBGA with a fixed run time. The results indicate that the CBGA produces better performance with a smaller number of generations and less time required.

10

(INSERT Fig. 4. Experimental Results for Fixed Generation with Mutation Rate 0.01)

(INSERT Fig. 5. Experimental Results for Fixed Computation Time with

Mutation Rate 0.01) The accuracy is simply the ratio of the number of correctly classified training (or

testing) examples over the total number of training (or test) examples. The CBGA achieved an average rate of 81.07% for the training data and 79.5% for the test data. Both of these results outperformed the regular GA. The details for the results are shown in Table 3.

(INSERT Table 3: The Accuracy for Three-Fold Cross Validation (Mutation Rate=0.01))

We also applied the same three-fold cross validation with a mutation rate of 0.05.

The results are shown in Fig. 6, 7 and Table 4. As a result the CBGA exhibited more significant improvement than the regular GA.

(INSERT Fig. 6. Experimental Results for Fixed Number of Generations with Mutation Rate 0.05)

(INSERT Fig. 7. Experimental Results for Fixed Computation Time with

Mutation Rate 0.05) Fig. 8 shows the average accuracy with different mutation rates. For a regular

GA the higher mutation rate resulted in higher accuracy, but lower accuracy for the CBGA for both the training and test data. It seems normal that an increase in the mutation rate can improve the accuracy for a regular GA. However opposite results occurred for the CBGA. This could be the reason that a higher mutation rate produces a greater number of disordered chromosomes that may require more computation during evolution. Because the number of generations was fixed, a higher mutation rate could influence the CBGA when handling more invalid chromosomes thereby reducing CBGA’s effectiveness in rule induction. However, more experiments can be performed to further verify this assumption in future work.

(INSERT Table 4: The Accuracy for Three-Fold Cross Validation (Mutation Rate=0.05))

11

(INSERT Fig. 8. The Comparative Accuracy vs. Two Different Mutation Rates

for the Regular GA and CBGA)

(INSERT Fig. 9. The Comparative Time vs. Two Different Mutation Rates for the Regular GA and CBGA)

6. Conclusions

We introduced the CBGA approach that hybridizes constraint-based reasoning within a genetic algorithm for rule induction. Incorporating user-control information into the mining process is not straightforward and typically requires a novel algorithm design. This approach is able to infer rule sets by pushing partial knowledge into chromosome construction for guiding the rule to reveal more significant meanings. Furthermore the computation efficiency can be improved using the constraint network to prevent invalid chromosome production.

Compared with a regular GA, the CBGA achieves higher predictive accuracy and requires less computation time for rule inductions using a medical data set. The rule sets discovered by the CBGA exhibited higher predictive accuracy with more significant knowledge in accordance with the user’s preferences.

The proposed CBGA is generic and problem independent. It is flexible, incorporating user information or domain knowledge via the expressive power of first order logic into a rule induction process. Proprietary genetic operators or chromosome representation are not needed to interact with the constraint-based reasoning. Most importantly, regular data mining methods construct predictive models based on the data behavior. However, data quality can never be completely assured before the mining results are available. Even when the results are available it is difficult to verify model reliability. The CBGA provides a way to minimize this effect by allowing domain experts to input professional knowledge or constraints to prevent possible anomalies from inappropriate data quality.

At this stage CBGA is able to reveal complex rule sets consisting of a format such as “Attributei <= w * Attributej” that can be extended to express more complicated multivariate ineqations with either a linear or nonlinear format in the future. Further investigation of the proposed CBGA using more real world data sets or data sets from benchmark databanks are required to further demonstrate the CBGA’s generalization capability. ACKNOWLEDGEMENTS This research was partially supported by National Science Council, Taiwan, Republic

12

of China, under the contract number NSC90-2745-P-155-003. The Hybrid of Association Rule Algorithms and Genetic Algorithms for

Tree Induction: An Example of Predicting Students Learning Performance

Abstract

Revealing valuable knowledge hidden in corporate data becomes more critical for enterprise decision making. When more data is collected and accumulated, extensive data analysis won’t be easier without effective and efficient data mining methods. This paper proposes a hybrid of the association rule algorithm and genetic algorithms (GAs) approach to discover a classification tree. The association rule algorithm is adopted to obtain useful clues based on which the GA is able to proceed its searching tasks in a more efficient way. In addition an association rule algorithm is employed to acquire the insights for those input variables most associated with the outcome variable before executing the evolutionary process. These derived insights are converted into GA’s seeding chromosomes. The proposed approach is experimented and compared with a regular genetic algorithm in predicting a student’s course performance. Keywords: Genetic Algorithms; Association Rule; Classification Trees; Student

Course Performance

1. Introduction

Revealing valuable knowledge hidden in corporate data becomes more critical for enterprise decision making. When more data is collected and accumulated, extensive data analysis won’t be easier without effective and efficient data mining methods.

Tree induction is one of the most common methods of knowledge discovery. It is a method for discovering a tree-like pattern that can be used for classification or estimation. Some of the often-mentioned tree induction methods such as C4.5 (Quinlan 1993), CART (Breiman et al., 1984), and Quest (Loh & Shih, 1997) are not evolutionary-based approaches. Basically an ideal tree induction technique has to carefully tackle those aspects, such as model comprehensibility and interestingness, attributes selection, learning efficiency and effectiveness, and etc. Genetic algorithms (GAs), one of the often used evolutionary computation technique, has been increasingly aware for its superior flexibility and expressiveness of problem representation as well as its fast searching capability for knowledge discovery. In the

13

past genetic algorithms were mostly employed to enhance the learning process of data mining algorithms such as neural nets or fuzzy expert systems, but rather to discover models or patterns. That is, genetic algorithms act as a method for performing a guided search for good models in the solution space. While genetic algorithms are an interesting approach for discovering hidden valuable knowledge, they have to handle computation efficiency with large volume data

Generally, tree induction methods are used to automatically produce rule sets for predicting the expected outcomes as accurately as possible. However the emphasis on revealing novel or interesting knowledge has become a recent research issue in data mining. These attempts may impose additional trees discovery constraints, and thereby produce additional computation overhead. For regular GAs operations, constraint validation is proceeded after a candidate chromosome is produced. That is, several iterations may be required to determine a valid chromosome (i. e., patterns). One way to improve the computation load problem is to obtain associated information such as attributes or the attributes values before initial chromosomes are generated; thereby accelerating the evolution efficiency and effectiveness. Potentially, this can be done by applying association rule algorithms to find out these clues that are related to the classification values.

This research proposes a novel approach that integrates an association rule algorithm with a GA to discover a classification tree. An association rule algorithm, which is also knows as market basket analysis, is used for attributes selection; therefore those related input variables can be determined before proceeding the GA’s evolution.

Apriori algorithm is a popular association rule technique for discovering the attributes relationship that is converted to formulate initial GA population. This proposed method attempts to enhance the GA’s searching performance by gaining more important clues leading to the final patterns. A prototype system based on AGA (Association-based GA) approach is developed to predict the learning performance of college students. The application data that consists of student learning profiles and related course information was derived from one university in Taiwan.

The remainder of this paper is organized as follows. In Section 2, previous research works and related techniques are reviewed. The detailed AGA procedures are then introduced in Section 3. Section 4 presents the experiments and results with financial data sets followed by discussion and conclusions.

2. The Literature Review

14

2.1. Genetic Algorithm for Rule Induction

Current rule induction systems typically fall into two categories: “divide and conquer” (Quinlan 1993) and “separate and conquer” (Clark & Niblett, 1989). The former recursively partitions the instance space until regions of roughly uniform class membership are obtained. The latter induces one rule at a time, separates out the covered instances. Rule induction methods may also be categorized into either tree based or non-tree based methods (Abdullah, 1999). Some of the often-mentioned decision tree induction methods include C4.5 (Quinlan 1993), CART (Breiman et al., 1984) and GOTA (Hartmann et al., 1982) algorithms. Both decision trees and rules can be described as disjunctive normal form (DNF) models. Decision trees are generated from data in a top-down, general to specific direction (Chidanand &d Sholom, 1997). Each path to a terminal node is represented as a rule consisting of a conjunction of tests on the path’s internal nodes. These ordered rules are mutually exclusive (Clark & Boswell, 1991). Quinlan (1993) introduced techniques to transform an induced decision tree into a set of production rules.

Michalski et al., (1986) proposed AQ15 algorithms to generate a disjunctive set

of classification rules. The CN2 rule induction algorithms also use a modified AQ algorithm that involves a top-down beam search procedure (Clark & Niblett, 1989). It adopts entropy as its search heuristic and is only able to generate an ordered list of rules. The Basic Exclusion Algorithm (BEXA) is another type of rule induction method proposed by Theron & Cloete (1996). It follows a general-to-specific search procedure in which disjunctive conjunctions are allowed. Every conjunction is evaluated using the Laplace error estimate. More recently Witten and Frank (1999) described covering algorithms for discovering rule sets in a conjunctive form.

GAs have been successfully applied to data mining for rule discovery in literatures. There are some techniques using one-rule-per-individual encoding proposed by Greene & Smith (1993), and Noda et al. (1999). For the one-rule-per-individual encoding approach, a chromosome usually can be identical to a linear string of rule conditions, where each condition is often an attribute-value pair, to represent a rule or a rule set. Although the individual encoding is simpler and syntactically shorter, the problem is that the fitness of a single rule is not necessarily the best indicator of the quality of the discovered rule set. Then, the several-rules-per-individual approach (De Jong et al., 1993; Janikow, 1993) has the advantage by considering its rule set as a whole, by taking into account rule interactions. However, this approach makes the chromosome encoding more complicated and syntactically longer, which usually requires more complex genetic

15

operators. Hu (1998) proposed a Genetic Programming (GP) approach in which a program

can be represented by a tree with rule condition and/or attribute values in the leaf nodes and functions in the internal nodes. The challenge is that a tree can grow in size with a shape in a very dynamical way. Thus, an efficient tree-pruning algorithm would be required to prune unsatisfied parts of within a tree to avoid infeasible solutions. Bojarczuk et al. (2001) proposed a constrained–syntax GP approach to build a decision model, particularly with emphasis on the discovery of comprehensible knowledge. The constrained-syntax mechanism was applied in verifying the relationship between operators and data types of operands during the tree-building.

In order to discover high-level prediction rules, Freitas (1999) applied a first-order relationships such as “Salary > Age” by checking an Attribute Compatibility Table (ACT) during the discovery process with GA-Nuggests. ACT was claimed particularly effective for its knowledge representation capability. By extending the use of ACT, our proposed approach allows other complicated attributes relationships such as linear or non-linear quantitative relationship among multi-attributes. This mechanism attempts to aid reducing the search spaces during the GA’s evolution process.

2.2. Classification Trees

Among data mining techniques, a decision tree is one of the most commonly used methods for knowledge discovery. A decision tree is used to discover rules and relationships by systematically breaking down and subdividing the information contained in data (Chou, 1991). A decision tree features its easy understanding and a simple top-down tree structure where decisions are made at each node. The nodes at the bottom of the resulting tree provide the final outcome, either of a discrete or continuous value. When the outcome is of a discrete value, a classification tree is developed (Hunt, 1993), while a regression tree is developed when the outcome is numerical and continuous (Bala & De Jong, 1996).

Classification is a critical type of prediction problems. Classification aims to examine the features of a newly presented object and assign it to one of a predefined set of classes (Michael & Gordon, 1997). Classification trees are used to predict membership of cases or objects in the classes of a categorical dependent variable from their measurements on one or more predictor variables. The classification trees induction process often selects the attributes by the algorithm at a given node according to such as Quinlan’s information gain (ID3), gain ratio (C4.5) criterion, Quest’s statistics-based approach to determine proper attributes and split points for a

16

tree (Quilan, 1986; 1992; Loh & Shih, 1997).

2.3. Association Rules Algorithms for Attributes Selection

Many algorithms can be used to discover association rules from data in order to identify patterns of behavior. One of the most used and famous is the apriori algorithm that can be detailed in (Agrawal et al., 1993; Agrawal & Srikant; 1994). For instance, an association rule algorithm is able to produce a rule as follows: When people buy Bankers Trust they also buy Dow Chemical 20 percent of the time.

An association rule algorithm, given the minimum support and confidence levels, is able to quickly produce rules from a set of data through the discovery of the so-called itemset. A rule has two measures, called confidence and support. Support (or prevalence) measures how often items occur together, as a percentage of the total records. Confidence (or predictability) measures how much a particular item is dependent on another.

Due to association rule algorithms’ advantage in deriving association among data items efficiently, recent data mining research in classification trees construction has attempted to adopt this mechanism for knowledge preprocessing. For example, apriori algorithm was applied to produce those association rules that can be converted into initial population of GP (Niimi & Tazaki, 2000; Niimi & Tazaki, 2001). Improved learning efficiency for the proposed method was demonstrated when compared with a GP without using association rule algorithms. However, the handling of multivariate classification problems and better learning accuracy were not specified in these researches.

3. The Hybrid of Association Rule Algorithms and Genetic Algorithms (AGA)

The proposed AGA approach consists of three modules. According to Fig. 1, these modules are:

Association Rule Mining; Tree Initialization; and Classification Tree Mining.

17

CategoryAttributes Apriori

Algorithm

A1->C1A2->C2

.

.

.An->Ck

ClassificationTrees

InitializationContinuousAttributes

Search for theOptimal Tree

with GA

...

Association Rule Mining Trees Initialization Classificaiton Tree Mining

Data Set

Fig. 1 The Conceptual Framework of AGA The association rule mining module generates association rules by apriori algorithm. In this research, the items derived from category attributes can be used to construct the association rules. An association rule here is an implication of the form X->Y where X is the conjunction of conditions, and Y is the type of classification. The rule X->Y has to satisfy user-specified minimum support and minimum confidence levels. The tree initialization module constructs the potential candidates of classification trees for better predictive accuracy. Figure 2 illustrates the classification tree with n nodes. Each node consists of two types of attributes: categorical and continuous. Categorical attributes provide the partial combination of conjunction of conditions and the others are contributed by continuous attributes in the form of inequality. The formal form of a tree node is presented by

A and B -> C ; where A denotes the antecedent part of the association rule; B is the conjunction of inequality functions in which the continuous attributes, relational operators and splitting values are determined by the GA; and C is the classification result directly obtained from the association rule.

Fig. 2. The Illustration for the Classification Tree

For example, the x1, x2, x3 are categorical attributes and x4, x5 are continuous attributes. Assume that one association rule “x1=5 and x3=3 -> Ck” is selected as a tree node specified as follows.

Nodei: IF x1=5 and x3=3 and x4<=10 and x5>1 Then Class Ck;

The execution steps of the classification tree initialization module are stated as follows. Step (a): For each node, the antecedent condition is obtained from one association rule

18

that is randomly selected out of the entire association rules set generated in advance. That is, the “A” part of one tree node formal form is determined.

Step (b): By applying the GA, the selected relational operator ((<= or >) and splitting

point are determined for each continuous attribute. Therefore the “B” part of one tree node formal form is determined.

Step (c): The classification result on a tree node part comes directly from the

consequence of the derived association rule. Subsequently each tree node is generated by repeatedly applying Step 1 & 2 for n times; where n is a value automatically determined by the GA.

The classification tree search module applies the GA to search for a superior

classification tree based on the potential candidates generated in the tree initialization module. Fig. 3 shows the proposed AGA approach for classification trees mining. The details of each step are illustrated as follows.

Fig. 3. The Conceptual Diagram of the Proposed Classification Tree Mining.

Step (a): Chromosome Encoding -- The tree nodes can be easily presented in the form of “Node1, Node2, … , Noden” where n is the total number of the tree node, and is automatically determined by the GA.

Step (b): The GA Initialization -- To generate a potential chromosome in the beginning. The initial population is obtained from the tree initialization module.

Step (c): Fitness Evaluation -- Calculate the fitness value for each chromosome in the current population. The fitness function is defined as the total number of misclassification.

Step (d): Stop Condition Met -- If the specified stopping condition is satisfied, then the entire process is terminated, and the optimal classification tree is confirmed; otherwise, the GA operations are continued.

Step (e): GA Operators -- Each GA operation contains chromosomes selection, crossover, and mutation in order to produce offspring generation based on different GA parameter settings.

Step (f): Chromosome Decoding -- The best chromosome is thus transformed to the optimal classification tree.

19

4. The Experiments and Results Before mining the student learning performance data set, the house-votes data

sets from the UCI repository (Blake & Merz, 1998) was used to validate our proposed approach. This data set was derived from the 1984 United States Congressional Voting Records database. For the comparing purpose, a simple GA (SGA) was applied to both the data sets.

Generally association rules extracted by apriori algorithm would be varied depending on the defined support and confidence values. Different association rules extracted may result in different impacts on AGA learning performances. Therefore this research experimented with different sets of minimum support and confidence values to both the credit screening and financial performance prediction problems. The evaluation of those classification trees generated by SGA and AGA approaches was based on a five-fold cross validation. That is, each training stage used 4/5 of the entire data records; with the rest 1/5 data records used for testing stage. The GA parameter settings for both the applications are summarized in Table 1.

Table 1. The GA Parameter Settings

Item Value Population Size 100 Generations 100/200/300/400/500 Crossover rate 0.6 Mutation rate 0.01 Selection method Roulette Wheel Training time (House-Votes) 1.5 Minutes*• Training time (Student Learning Performance) 0.8 Minutes*• * •

The hardware platform is Pentium III 1.0 GHz with 256 MB RAM

The Application of House-Votes Data Set

The collected 435 House-Votes data records consist of 16 categorical attributes expressed by ‘Y’, ‘N’, and ‘?’ for the input part. The output part is a categorical attribute with 2 classes (Democrat, Republican). Data of the 16 categorical attributes were fed into apriori algorithm to produce the association rules. Among the entire data records, 267 are democrats, 168 are republicans. Instead of precluding these records from training, this research denotes these ‘?’ values by ‘Others’. After several trials with different sets of minimum support values and confidence values, the best AGA learning performance is obtained. Both the training and testing results are summarized in Table 2, along with their corresponding representation depicted in Fig. 4 & 5. These results are based on the minimum support value (=20) and confidence

20

value (=100). The derived association rule sets consists of 1743 rules for ‘Democrat’ output category and 2215 association rules for ‘Republican’ output category. In order to obtain more details about the learning progress for the two approaches, learning tract behavior were recorded in sessions. Fig. 6 & 7 depict the entire learning progresses monitored over generations and time. Table 3 presents the detail tree nodes notation for one of the relatively better classification trees derived. Based on this produced classification tree, the accuracy rates for the training and testing stages are able to reach 98.28% and 100.00%, respectively.

Table 2. The 5-Fold Training/Testing Results for SGA and AGA

(House-Votes Data) Gen. 100 200 300 400 500

Train Test Train Test Train Test Train Test Train Test SGA 95.98% 94.25% 97.36% 94.71% 97.82% 94.71% 97.87% 94.71% 97.99% 94.71%

AGA 97.93% 95.63% 98.28% 95.17% 98.51% 95.40% 98.56% 95.40% 98.62% 95.40%

Fig. 4. Training Results with Various Generations

Fig. 5. Testing Results with Various Generations

Fig. 6. The Learning Progress over Generations (based on 5-fold average)

Fig. 7. The Learning Progress over Time (based on 5-fold average)

Table 3. The Notation for Each Tree Node

The Application of Student Learning Performance Data Set The Data Description

In order to better monitor a student’s learning performance, one university in Taiwan designed a pre-warning system that requires each course instructor to input a student’s up-to-present learning performance grade one week after mid-term exam. The student’s mid-term rating is categorized into three levels – ‘A’, ‘B’, and ‘C’ among which ‘A’ implies EXCELLENT; ‘B’ for O.K.; and ‘C’ for POOR. Generally the more ‘C’s’ a students receives, the higher failure probability a student will have for the course taken. However, purely replying on this mid-term grading information is not sufficient to determine whether the student will survive for a course in the end of the semester. Therefore, other supporting information such as the course difficulty, the grading tract records of the instructor, and the student profile information were collected. Each data record contains 5 categorical attributes and 2 continuous attributes for the input part. The output part is one categorical attribute that is the class

21

of ‘pass’ or ‘fail’. The notation for the variables used in this prediction model is specified in Table 4. 410 student records containing both the passed (total of 146) and failed (total of 264) records from 48 freshman and sophomore dropped out students in Engineering School were collected for analysis.

Table 4. The Variables Used in the Model

Descriptions Data Type X1: Department Code (5 categories) category X2: Gender (Male/Female) category X3: Mid-Term Rating (3 levels) category X4: Course Credits (1-3 credits) category X5: Course Type (Required/Optional) category X6: Course Difficulty continuous X7: Instructor’s Track Record of Flunk Ratio continuous

After several trials with different sets of minimum support values and confidence

values, the best AGA learning performance is obtained. Both the training and testing performance are summarized in Table 5, along with their corresponding representation depicted in Fig. 8 & 9. These results are based on the minimum support value (=5) and confidence value (=100). The derived association rule sets consists of 100 rules for “Pass” output category and 58 association rules for “Fail” output category. In order to obtain more details about the learning progress for the two approaches, learning tract behavior were recorded in sessions. Fig. 10 & 11 depict the entire learning progresses monitored over generations and time. Table 6 presents the detail tree nodes notation for one of the superior classification trees derived. Based on this produced classification tree, the accuracy rates for the training and testing stages are able to reach 79.27% and 80.49%, respectively.

Table 5. The 5-Fold Training/Testing Results for SGA and AGA (Student Learning Performance data)

Gen. 100 200 300 400 500 Train Test Train Test Train Test Train Test Train Test

SGA 76.34% 69.76% 76.59% 69.76% 77.20% 69.76% 77.32% 69.51% 77.56% 70.00%AGA 76.89% 73.17% 77.68% 73.17% 78.29% 73.17% 78.78% 73.41% 79.15% 73.66%

5. Discussion According to the results indicated above AGA achieves superior learning

performance than SGA in terms of computation efficiency and accuracy. By applying association rule process, the partial knowledge is extracted and transformed as seeding chromosomes. According to Fig. 6, the initial average number of errors for both SGA and AGA are 230 and 50, respectively, with House-Votes data. In Fig. 10,

22

the initial average number of errors for both SGA and AGA are 127 and 113, respectively, with student learning performance data. This improvement of initial learning performance can be resulted from the derived association rules that are then transformed into GA’s seeding chromosomes.

According to Fig. 6, for the training stage SGA takes 500 generations to reach the similar performance that takes AGA only 40 generations to reach. The outcomes can be attributed to the adoption of apriori algorithm by which the GA search space is substantially reduced. Also it can be seen that AGA consistently outperforms SGA over generations. Further, for the computation time, AGA takes .19 minute to reach the learning performance that takes SGA at least 1.5 minute to reach.

As shown in Fig. 10, for the training stage SGA takes 500 generations to reach the similar performance that takes AGA 100 generations to reach. Also it can be seen that AGA consistently outperforms SGA over generations. Further for the computation time, AGA takes .17 minute to reach the learning performance that takes SGA 0.80 minute to reach. 6. Conclusions and Future Development

We have introduced the AGA approach that hybridizes apriori algorithm and the genetic algorithm for classification tree induction. Incorporating the associated knowledge related to the classification results is crucial for improving evolutionary-based mining tasks. By employing the association rule algorithm to acquire partial knowledge from data, our proposed approach is able to more effectively and efficiently induce a classification tree by converting the derived association rules into the GA’s seeding chromosomes.

Comparing with SGA, AGA achieves higher predictive accuracy and less computation time required for the classification tree induction by experimenting a UCI benchmark data set as well as the student learning performance data set.

Predicting a student’s course performance from the data derived from the student/course profiles as well as the mid-term rating information is a novel way to aid both the students and the university to grasp further information about the approximate student course performance before too late to recover. According to the experiment results, AGA has been proved to be a feasible way to provide a decently acceptable solution for predicting a student’s course performance with near 80% classification accuracy.

In addition, the classification trees discovered by AGA not only obtain higher predictive accuracy and computation efficiency, but also may produce more user transparent or significant knowledge.

The proposed AGA is generic and problem independent. Besides to integrating

23

with association rule algorithms for knowledge preprocessing, AGA is flexible to incorporate the user information or domain knowledge via the expressive power of first order logic into a tree induction process. The proposed approach is not only applicable for binary classification problems, but also applicable for multi-category classification problems.

Currently AGA approach is able to reveal tree splitting nodes that may allowed complex rule sets-like discriminating formats such as “Attributei <= w ∗ Attributej” relationship which can be extended to express more complicated multivariate inequations with either a linear or nonlinear form in the future. Mining Three-Dimensional Anthropometric Body Surface Scanning Data for Hypertension Detection: An Evolutionary-Based Classification Approach

Abstract

Hypertension is a major disease leading to as one of the top ten death causes in Taiwan. The exploration of 3D anthropometry data along with other existing subject medical profiles using data mining techniques becomes an important research issue for medical decision support. This research attempts to construct a prediction model for hypertension using anthropometric body surface scanning data. This research adopts classification trees to reveal the correlationship between a subject’s three-dimensional (3D) scanning data and hypertension disease using the hybrid of the association rule algorithm and genetic algorithms (GAs) approach. The association rule algorithm is adopted to obtain useful clues based on which the GA is able to proceed its searching tasks in a more efficient way. The proposed approach was experimented and compared with a regular genetic algorithm in predicting a subject’s hypertension disease. Better computational efficiency and more accurate prediction results from the proposed approach are demonstrated. Keywords: Hypertension; Anthropometric Data; Genetic Algorithms; Association

Rule; Classification Trees;

1. Introduction

Hypertension was a major disease leading to as one of the top ten causes of death in Taiwan. Hypertension can also lead to some major causes of death in Taiwan such as cardiovascular diseases and is deemed as factors for Syndrome X that has been investigated for years in epidemiologic studies (Srinivasan, 1993; Mykkanen et al., 1997; Chen et al., 2000; Jeppesen et al., 2000). Although earlier identification of this

24

disease is gaining concerns for issues in clinical research, the investigation of factors for prevention and intervention were also crucial issues for preventive medicine. Modifiable factors, such as life-style variables and body measurements, for reducing risk of the disease are especially interesting for public health professionals.

Due to recent development of a new 3D scanning technology that has many advantages over the old system of anthropometric measurements (Coombes et al., 1991; Jones et al., 1994) using tape measures, anthropometers (a special measuring ruler) (Kroemer, 1989), and other similar instruments, the Department of Health Management of Chang Gung Medical Center at Taiwan is able to collect 3D anthropometric body surface scanning data easily and accurately (Lin, et al, 2002). Unlike other anthropometric databases that only contain geometric and demographic data on human subjects, each of the scanned body data sets in Chang Gung Medical Center database connects to the health record and clinical record through the medical center computer.

Therefore the exploration of these 3D data along with other existing subject medical profiles using data mining techniques becomes an important research issue for medical decision support (Jones & Rioux, 1997; Meaney & Farrer, 1986). Traditional approaches usually applied statistic techniques to determine the correlationship between a target disease and corresponding anthropometric data.

Tree induction is one of the most common methods of knowledge discovery. It is a method for discovering a tree-like pattern that can be used for classification or estimation. Some of the often-mentioned tree induction methods such as C4.5 (Quinlan 1993), CART (Breiman et al., 1984), and Quest (Loh & Shih, 1997) are not evolutionary-based approaches. Basically an ideal tree induction technique has to carefully tackle those aspects, such as model comprehensibility and interestingness, attributes selection, learning efficiency and effectiveness, and etc. The genetic algorithm (GA), one of the often used evolutionary computation techniques, has been increasingly aware for its superior flexibility and expressiveness of problem representation as well as its fast searching capability for knowledge discovery. In the past genetic algorithms were mostly employed to enhance the learning process of data mining algorithms such as neural nets or fuzzy expert systems, but rather to discover models or patterns. That is, genetic algorithms act as a method for performing a guided search for good models in the solution space. While genetic algorithms are an interesting approach for discovering hidden valuable knowledge, they have to handle computation efficiency with large volume data.

Generally, tree induction methods are used to automatically produce rule sets for

25

predicting the expected outcomes as accurately as possible. However the emphasis on revealing novel or interesting knowledge has become a recent research issue in data mining. These attempts may impose additional trees discovery constraints, and thereby produce additional computation overhead. For regular GAs operations, constraint validation is proceeded after a candidate chromosome is produced. That is, several iterations may be required to determine a valid chromosome (i. e., patterns). One way to improve the computation load problem is to obtain associated information such as attributes or the attributes values before initial chromosomes are generated; thereby accelerating the evolution efficiency and effectiveness. Potentially, this can be done by applying association rule algorithms to find out the clues related to the classification results.

This research proposes a novel approach that integrates an association rule algorithm with a GA to discover classification trees. An association rule algorithm, which is also known as market basket analysis, is used for attribute selection; therefore those related input variables can be determined before proceeding the GA’s evolution.

Apriori algorithm is a popular association rule technique for discovering the attributes relationship that is converted to formulate initial GA population. This proposed method attempts to enhance the GA’s searching performance by gaining more important clues leading to the final patterns. A prototype system based on AGA (Association-based GA) approach is developed to predict the hypertension disease. The application data that consists of 3D anthropometry data and related profiles information was derived from Chang Gung Memorial Hospital, Taiwan.

Other, in order to compare with often used classification tree techniques, C4.5 is adopted using the same data sets.

The remainder of this paper is organized as follows. In Section 2, previous research works and related techniques are reviewed. The detailed AGA procedures are then introduced in Section 3. Section 4 presents the experiments and results with medical data sets followed by discussion and conclusions.

2. The Background

2.1. Instruments and Procedures

The Chang Gung Whole Body 3D Laser Scanner scans a cylindrical volume 1.9 meters high and 1.0 meter in diameter. These dimensions accommodate the vast

26

majority of human subjects. A platform structure supports the subject and provides alignment for the towers. The system is built to withstand shipping and repeated use without alignment or adjustment. The standard scanning apparel for both men and women included light gray cotton biker shorts and a gray sports bra for women. Latex caps were used to cover the hair on subjects’ heads. Each subject was measured in three different scanning postures. Automatic landmark recognition (ALR) technology was used to automatically extract anatomical landmarks from the 3D body scan data. The landmarks were then placed on the subject. More than 30 measurements of the above results were new anthropometric factors that traditional measurements did not provide. However, not all the collected anthropometric factors were adopted in this our research. Those factors included in this research are body mass index (BMI)、left_arm_volume (LAV), trunk surface area (TSA), weight (W), waist circumference(WC), waist-hip ratio (WHR), and waist width (WW). Other factors and corresponding illustration are detailed in Table 4. The body scanner scans the human body from head to feet by laser in the cohorizontal plane around the whole body. The computer processes 3D data at the speed of 60 laser beams per second. For a body height of 180 cm for computation, when the Gemini 10085 scanner is set up for a 4-mm vertical scanning resolution, it takes 7.5 sec to scan the whole body. If it is configured at a vertical scanning resolution of 2.5-mm, then it takes 12 sec to scan the whole body. Subjects were also asked to provide demographic data such as age, ethnic group, sex, area of residence, education level, present occupation, and family income. Related hospital health records and clinical records were obtained for each subject, if available.

The 3D laser scanning system is based on optical triangulation of reflected photo profiles of an incident, cross-sectional plane of laser light that travels around the segment. The profile is collected by a digitizer camera and used to characterize the 3D spatial surface geometry. The markers developed for use with the ALR algorithm are flat, adhesive backed disks with a concentric circular, high contrast pattern, consisting of a 6-mm diameter black center circle and a 12-mm diameter outer white annulus.

Anthropometric measurements were performed to determine BMI and WHR. Data were coded by the computer. The results were correlated with data on blood pressure, blood glucose, lipid, and uric acid levels. The health index (HI) was determined using following equation:

HI (body weight x 2 x waist profile area) / [body height2 x (breast profile area hip profile area)].


27

Current rule induction systems typically fall into two categories: “divide and conquer” (Quinlan 1993) and “separate and conquer” (Clark & Niblett, 1989). The former recursively partitions the instance space until regions of roughly uniform class membership are obtained. The latter induces one rule at a time, separates out the covered instances. Rule induction methods may also be categorized into either tree based or non-tree based methods (Abdullah, 1999). Some of the often-mentioned decision tree induction methods include C4.5 (Quinlan 1993), CART (Breiman et al., 1984) and GOTA (Hartmann et al., 1982) algorithms. Both decision trees and rules can be described as disjunctive normal form (DNF) models. Decision trees are generated from data in a top-down, general to specific direction (Chidanand & Sholom, 1997). Each path to a terminal node is represented as a rule consisting of a conjunction of tests on the path’s internal nodes. These ordered rules are mutually exclusive (Clark & Boswell, 1991). Quinlan (1993) introduced techniques to transform an induced decision tree into a set of production rules.

GAs have been successfully applied to data mining for rule discovery in literature. There are some techniques using Michigan approach (one-rule-per-individual encoding) proposed by Greene & Smith (1993), and Noda et al. (1999). For the Michigan approach, a chromosome usually can be identical to a linear string of rule conditions, where each condition is often an attribute-value pair, to represent a rule or a rule set. Although the individual encoding is simpler and syntactically shorter, the problem is that the fitness of a single rule is not necessarily the best indicator of the quality of the discovered rule set. Then, the Pittsburg approach (several-rules-per-individual encoding) (De Jong et al., 1993; Janikow, 1993) has the advantage by considering its rule set as a whole, by taking into account rule interactions. However, this approach makes the chromosome encoding more complicated and syntactically longer, which usually requires more complex genetic operators.

In order to discover high-level prediction rules, Freitas (1999) applied a first-order relationship such as “Salary > Age” by checking an Attribute Compatibility Table (ACT) during the discovery process with GA-Nuggests. ACT was claimed particularly effective for its knowledge representation capability. By extending the use of ACT, our proposed approach allows other complicated attributes relationships such as linear or non-linear quantitative relationship among multi-attributes. This mechanism attempts to aid reducing the search spaces during the GA’s evolution process.


28

Among data mining techniques, a decision tree is one of the most commonly used methods for knowledge discovery. A decision tree is used to discover rules and relationships by systematically breaking down and subdividing the information contained in data (Chou, 1991). A decision tree features its easy understanding and a simple top-down tree structure where decisions are made at each node. The nodes at the bottom of the resulting tree provide the final outcome, either of a discrete or continuous value. When the outcome is of a discrete value, a classification tree is developed (Hunt, 1993), while a regression tree is developed when the outcome is numerical and continuous (Bala & De Jong, 1996).

Classification is a critical type of prediction problems. Classification aims to examine the features of a newly presented object and assign it to one of a predefined set of classes (Michael & Gordon, 1997). Classification trees are used to predict membership of cases or objects in the classes of a categorical dependent variable from their measurements on one or more predictor variables. The classification trees induction process often selects the attributes by the algorithm at a given node according to such as Quinlan’s information gain (ID3), gain ratio (C4.5) criterion, Quest’s statistics-based approach to determine proper attributes and split points for a tree (Quilan, 1986; 1993; Loh & Shih, 1997).


Many algorithms can be used to discover association rules from data in order to identify patterns of behavior. One of the most used and famous is the apriori algorithm that can be detailed in (Agrawal et al., 1993; Agrawal & Srikant; 1994). For instance, an association rule algorithm is able to produce a rule as follows:

When gender is female and age greater than 55 then having hypertension 20 percent of the time.


Due to association rule algorithms’ advantage in deriving association among data items efficiently, recent data mining research in classification trees construction has attempted to adopt this mechanism for knowledge preprocessing. For example,

29

apriori algorithm was applied to produce those association rules that can be converted into initial population of GP (Niimi & Tazaki, 2000; Niimi & Tazaki, 2001). Improved learning efficiency for the proposed method was demonstrated when compared with a GP without using association rule algorithms. However, the handling of multivariate classification problems and better learning accuracy were not specified in these researches.

3. The Hybrid of Association Rule Algorithms and Genetic Algorithms (AGA) 4. The Experiments and Results

Before mining the hypertension data set, the heart disease data set from Cleveland Clinic Foundation in UCI repository (Blake & Merz, 1998) was used to validate our proposed approach. For the comparing purpose, a simple GA (SGA) was applied to both the data sets.

Generally association rules extracted by apriori algorithm would be varied depending on the defined support and confidence values. Different association rules extracted may result in different impacts on AGA learning performances. Therefore this research experimented with different sets of minimum support and confidence values to both the credit screening and financial performance prediction problems. The evaluation of those classification trees generated by SGA and AGA approaches was based on a five-fold cross validation. That is, each training stage used 4/5 of the entire data records; with the rest 1/5 data records used for testing stage. The GA parameter settings for both the applications are summarized in Table 1. Other, the same data sets were learned by C4.5 technique as well. The Application of Hypertension Data Set The Data Description

This study has collected 1152 subjects data from Department of Health Examination, Chang Gung Memorial Hospital from Jul. 2000 to Jul. 2001 years. 489 subjects have hypertension (abnormal); 650 are without hypertension (normal). All subjects all Taiwan local residents and without history of systemic disease. Standardized 3D anthropometry scanning protocols were performed, and the data were collected by trained staffs. Subjects were instructed to fast for 12 to 14 hr prior to testing, and compliance regarding fasting was determined by interview on the morning of the examination. Measurements of height (to 0.1 cm) and weight (to 0.1 kg) were performed according to specified protocols (Berenson et al., 1979). BMI [weight (kg)/height (m)2] was used as an indicator of obesity in this study. Blood pressure measurements were made on the right arm in seated, relaxed subjects. For both systolic and diastolic blood pressure, the average of six replicate mercury

30

readings taken by two randomly assigned and trained nurses were used in the analyses. Blood pressure levels were classified according to 1999 World Health Organization-International Society of Hypertension guidelines (Guidelines, 1999). Hypertension was defined as a systolic blood pressure (SBP) of 140 mmHg or greater and/or a diastolic blood pressure (DBP) of 90 mmHg or greater. High-normal blood pressure was defined as an SBP of 130–139 mmHg or a DBP of 85–89. Normal blood pressure was defined as an SBP of 120–129 mmHg and a DBP of 80–84 mmHg; optimal blood pressure, as an SBP less than 120 mmHg and a DBP of less than 80 mmHg. When SBP and DBP fell into different categories, the higher category was applied.

Each data record contains 5 categorical attributes and 11 continuous attributes for the input part. The output part is one categorical attribute whose value is either normal or abnormal.

After several trials with different sets of minimum support values and confidence values, the best AGA learning performance is obtained. Both the training and testing performance are summarized in Table 5, along with their corresponding representation depicted in Fig. 8 & 9. These results are based on the minimum support value (=8) and confidence value (=100). The derived association rule sets consists of 75 rules for “normal” output category and 51 rules for “abnormal” output category. In order to obtain more details about the learning progress for the two approaches, learning tract behavior were recorded in sessions. Fig. 10 & 11 depict the entire learning progresses monitored over generations and time. Table 6 presents the detail tree nodes notation for one of the superior classification trees derived. Based on this produced classification tree, the accuracy rates for the training and testing stages are able to reach 98.37% and 97.84, respectively. Those records matched with no rules are considered as ”NKNOW” that are also counted as misclassification. The accuracy rate of 5-fold testing for C4.5 is 88.1%.

Table 4. The Variables Used in Hypertension Detection Model

Variable Types Variables Description Data TypeDemographic Data

Age Sex

Age Sex (Male/Female)

Continuous Category

Biochemical Tests

TC UT

Total Cholesterol Urine Turbidity (Level 1,2, or 3)

Continuous Category

31

Three DimensionAnthropometry

BMI LAV TSA W WC WHR WW

Body Mass Index Left_Arm_Volume Trunk Surface Area Weight Waist Circumference Waist-Hip Ratio Waist Width

Continuous Continuous Continuous Continuous Continuous Continuous Continuous

Risk Factors Diet SMOK WINE

Dietary Pattern (Yes/No) Number of Cigarettes/per day Number of Cups in Drinking Wine/per day

Category Continuous Continuous

Family History of Disease

FHHT FHSK

Family Hypertension History (Yes/No) Family Paralysis History (Yes/No)

Category Category

In order to find out the prediction accuracy of the classification trees that are based on the entire hypertension data set and same data set excluding 3D information, we have proceed another experiment to construct the classification trees using data without 3D information. The results are indicated in Table 7 indicating that the model based on data without 3D information exhibits inferior prediction accuracy. The relatively better testing result show in Table 7 is 82.11%.

5. Discussion According to the results indicated above AGA achieves superior learning

performance than SGA in terms of computation efficiency and accuracy. By applying association rule process, the partial knowledge is extracted and transformed as seeding chromosomes. According to Fig. 6, the initial average accuracy for both SGA and AGA are 60% and 80%, respectively, with heart disease data. In Fig. 10, the initial average accuracy for both SGA and AGA are 43% and 80%, respectively, with 3D data. This improvement of initial learning performance can be resulted from the derived association rules that are then transformed into GA’s seeding chromosomes.

Table 6. The Notation for Each Tree Node for a Given Data Fold (Hypertension Data) Tree Node Number of

Records Matched

Rule Illustration

Node 1 17 IF Sex = Female AND UT = 2 AND FHHT = No AND FHSK = No AND Age <= 64 AND WHR <= 2.16152 AND LAV > 2065.04 AND TC > 189 AND WINE <= 57 THEN Normal

Node 2 503 IF W <= 90.8105 AND TSA <= 7620.91 AND W <= 2.56355 * BMI THEN Normal

Node 3 27 IF FHHT = No AND BMI > 28.9181 AND TSA > 4601.92 THEN Abnormal

32

Node 4 1 IF FHHT = Yes AND Age > 77 AND WHR <= 1.43129 AND WINE > 34 THEN Abnormal

Node 5 358 IF WW > 20.517 AND BMI <= 2.5251 * Age THEN Abnormal ELSE Unknown

According to Fig. 6, for the training stage SGA takes 500 generations to reach the similar performance that takes AGA only 245 generations to reach. The outcomes can be attributed to the adoption of apriori algorithm by which the GA search space is substantially reduced. Also it can be seen that AGA consistently outperforms SGA over generations. Further, for the computation time, AGA takes 0.4 minute to reach the learning performance that takes SGA at least 0.8 minute to reach.

As shown in Fig. 10, for the training stage SGA takes 500 generations to reach the similar performance that takes AGA 50 generations to reach. Also it can be seen that AGA consistently outperforms SGA over generations. Further for the computation time, AGA takes 0.1 minute to reach the learning performance that takes SGA 1.5 minute to reach.

As compared with C4.5 in 5-fold cross-validation testing results for Heart Disease data, SGA has the performances ranging from 77.1% to 78.1%; AGA has the performances ranging from 80.46% to 81.48% among 5 different generation types. Both SGA and AGA have better results than C4.5 whose 5-fold cross-validation testing accuracy is 76.8%.

As compared with C4.5 in 5-fold cross-validation testing results for Hypertension (with 3D) data, SGA has the performances ranging from 82.64% to 83.51%; AGA has the performances ranging from 85.77% to 92.71% among 5 different generation types. Only AGA’s results outperform C4.5 whose 5-fold cross-validation testing accuracy is 88.1%.

The results reveal clues for preventive medicine that shows distinction and coherence with current knowledge in hypertension. Body shape played a major role in predicting hypertension on those subjects without family disease history but apparently with very high BMI (>28.9) and relatively larger trunk surface area (> 4601.9 cm2). The data has also demonstrated that a hypertension will be found in subjects with a larger waist width (> 20.5 cm) and small body mass index relative to their age. The mechanism behind the relationship of three dimension body measures and hypertension is still imperfectly understood. Albeit, the findings forecast that a potential set of indicators to break through our current knowledge boundary on diagnostic decision support system in medicine is close at hand. 6. Conclusions and Future Development

We have introduced the AGA approach that hybridizes apriori algorithm and the genetic algorithm for classification tree induction. Results from predicting hypertension disease by anthropometrical and 3D measurements are promising and

33

innovative in field of biomedical sciences. Specifically, significant predictors for hypertension are AGE, FHHT, WHR, WINE, WW, W, BMI, TSA, TC, respectively.

Incorporating the associated knowledge related to the classification results is crucial for improving evolutionary-based mining tasks. By employing the association rule algorithm to acquire partial knowledge from data, our proposed approach is able to more effectively and efficiently induce a classification tree by converting the derived association rules into the GA’s seeding chromosomes.

Comparing with SGA, AGA achieves higher predictive accuracy and less computation time required for the classification tree induction by experimenting a UCI benchmark data set as well as hypertension data set.

Predicting a subject’s hypertension disease from the 3D anthropometry data and other related profile information is a novel way for medical decision support. Without 3D information, the rest of the hypertension data won’t be able to result in better prediction models. According to the experiment results, AGA has been proved to be a feasible way to provide a decently acceptable solution for predicting a subject’s hypertension disease with about 90% classification accuracy. It also indicated that the accuracy rates of both the derived models for heart disease and 3D data sets outperformed the results of the models constructed by C4.5 approach.

In addition, the classification trees discovered by AGA not only obtain higher predictive accuracy and computation efficiency, but also may produce more user transparent or significant knowledge.

The proposed AGA is generic and problem independent. Besides to integrating with association rule algorithms for knowledge preprocessing, AGA is flexible to incorporate the user information or domain knowledge via the expressive power of first order logic into a tree induction process. The proposed approach is not only applicable for binary classification problems, but also applicable for multi-category classification problems.

Currently AGA approach is able to reveal tree splitting nodes that may allow complex rule sets-like discriminating formats such as “Attributei <= w ∗ Attributej” relationship which can be extended to express more complicated multivariate inequations with either a linear or nonlinear form in the future. A Constraint-Based Evolutionary Classification Tree (CECT) for Financial Performance Prediction

Abstract

Most of evolutionary computation methods for discovering hidden valuable knowledge from large volume data have to handle computation efficiency, prediction

34

accuracy and the expressiveness of the derived models. In this paper we propose a constraint-based evolutionary classification tree (CECT) approach that combines both the constraint-based reasoning and evolutionary techniques to generate useful patterns from data in a more effective way. Constraint-based reasoning is employed to reduce the solution-irrelevant search spaces by filtering invalid chromosomes. CECT approach allows the problem constraints to be specified as relationships among attributes according to predefined requirements, the user’s preferences, or partial knowledge in the form of a constraint network. In addition an association rule algorithm is employed to acquire the insights for those input variables most associated with the outcome variable before executing the evolutionary process. These derived insights are then converted into partial knowledge that can be translated into constraint network. The proposed approach is experimented, tested and compared with a regular genetic algorithm (GA) to predict corporate financial performance using data from Taiwan Economy Journal (TEJ). Keywords: Constraint-Based Reasoning; Genetic Algorithms; Rule Induction;

Classification Trees; Financial Performance; Association Rule

1. Introduction Revealing valuable knowledge hidden in corporate data becomes more critical

for enterprise decision making. When more data is collected and accumulated, extensive data analysis won’t be easier without effective and efficient data mining methods. In addition to statistic and other machine learning methods, recent development of novel or improved data miming methods such as Bayesian networks (Cooper, 1991), frequent patterns (Srikant, 1996), decision or regression trees (Gehrke, 1999; Garofalakis, 2000), and evolutionary computation algorithms (Bojarczuk, 2001; Carvalho, 1999; Correa, 2001; Freitas, 1999), have drawn more attention from academics and industries.

Rule induction is one of the most common methods of knowledge discovery. It is

a method for discovering a set of "If/Then" rules that can be used for classification or estimation. That is, rule induction is able to convert the data into a rule-based representation that can be used either as a knowledge base for decision support or as an easily understood description of the system behavior. Basically an ideal technique for rules induction has to carefully tackle those aspects, such as model comprehensibility and interestingness, attributes selection, learning efficiency and effectiveness, and etc. Genetic algorithms (GAs), one of the often used evolutionary computation technique, has been increasingly aware for its superior flexibility and

35

expressiveness of problem representation as well as its fast searching capability for knowledge discovery. In the past genetic algorithms were mostly employed to enhance the learning process of data mining algorithms such as neural nets or fuzzy expert systems, but rather to discover models or patterns. That is genetic algorithms acted as a method for performing a guided search for good models in the solution space. While genetic algorithms are an interesting approach to optimizing models, they add a lot of heavy computation load when the search space is huge.

Generally, rule induction methods are used to automatically produce rule sets for predicting the expected outcomes as accurately as possible. However the emphasis on revealing novel or interesting knowledge has become a recent research issue in data mining. These attempts may impose additional rule discovery constraints, and thereby produce additional computation overhead. For regular GAs operations, constraint validation is proceeded after a candidate chromosome is produced. That is, several iterations may be required to determine a valid chromosome. One way to improve the computation load problem is to prevent the production of invalid chromosomes before a chromosome is generated; thereby accelerating the efficiency and effectiveness of evolution processes. Potentially, this can be done by embedding a well-designed constraint mechanism into the chromosome-encoding scheme.

In this research we propose a novel approach that integrates an association rule algorithm and constraint-based reasoning with GAs to discover classification trees. An association rule algorithm, which is also knows as market basket analysis, is used for attributes selection; therefore those related input variables can be determined before proceeding the GA’s evolution. Constraint-based reasoning is a process that incorporates various inference techniques including local propagation, backtrack free search and tree-structured reduction. The constraint-based reasoning mechanism is used to push constraints along with data insights into the rule set construction. This research applies hybrid techniques of local propagation and tree search approaches to assure local consistency before continuing the search in the GA process. Local propagation can reduce the search space from possible gene values that can not meet predefined constraints. This approach allows constraints to be specified as relationships among attributes according to predefined requirements, user preferences, or partial knowledge in the form of a constraint network. In essence, this approach provides a chromosome-filtering mechanism prior to generating or evaluating a chromosome. Thus insignificant or irreverent rules can be precluded in advance via the constraint network.

Proposition logic is a popular representation used in rule induction systems. Our proposed approach allows first order logic to be extended to formulate the knowledge in the form of linear inequations. This enhances the expressive power of the rules set

36

to model the data behavior in a more effective way. A prototype system based on CECT (Constraint-based Evolutionary

Classification Tree) approach is developed to predict corporate financial performance using TEJ finance data of year 2001.

The remainder of this paper is organized as follows. In Section 2, previous research works and related techniques are reviewed. The detailed CECT procedures are then introduced in Section 3. Section 4 presents the experiments and results with financial data sets followed by discussion and conclusions.

2. The Literature Review 2.1. Genetic Algorithm for Rule Induction

Current rule induction systems typically fall into two categories: “divide and conquer” (Quinlan 1993) and “separate and conquer” (Clark & Niblett, 1989). The former recursively partitions the instance space until regions of roughly uniform class membership are obtained. The latter induces one rule at a time, separates out the covered instances. Rule induction methods may also be categorized into either tree based or non-tree based methods (Abdullah, 1999). Some of the often-mentioned decision tree induction methods include C4.5 (Quinlan 1993), CART (Breiman et al., 1984) and GOTA (Hartmann et al., 1982) algorithms. Both decision trees and rules can be described as disjunctive normal form (DNF) models. Decision trees are generated from data in a top-down, general to specific direction (Chidanand &d Sholom, 1997). Each path to a terminal node is represented as a rule consisting of a conjunction of tests on the path’s internal nodes. These ordered rules are mutually exclusive (Clark & Boswell, 1991). Quinlan (1993) introduced techniques to transform an induced decision tree into a set of production rules.

Michalski et al., (1986) proposed AQ15 algorithms to generate a disjunctive set

of classification rules. The CN2 rule induction algorithms also use a modified AQ algorithm that involves a top-down beam search procedure (Clark & Niblett, 1989). It adopts entropy as its search heuristic and is only able to generate an ordered list of rules. The Basic Exclusion Algorithm (BEXA) is another type of rule induction method proposed by Theron & Cloete (1996). It follows a general-to-specific search procedure in which disjunctive conjunctions are allowed. Every conjunction is evaluated using the Laplace error estimate. More recently Witten and Frank (1999) described covering algorithms for discovering rule sets in a conjunctive form.

GAs have been successfully applied to data mining for rule discovery in literatures. There are some techniques using one-rule-per-individual encoding

37

proposed by Greene & Smith (1993), and Noda et al. (1999). For the one-rule-per-individual encoding approach, a chromosome usually can be identical to a linear string of rule conditions, where each condition is often an attribute-value pair, to represent a rule or a rule set. Although the individual encoding is simpler and syntactically shorter, the problem is that the fitness of a single rule is not necessarily the best indicator of the quality of the discovered rule set. Then, the several-rules-per-individual approach (De Jong et al., 1993; Janikow, 1993) has the advantage by considering its rule set as a whole, by taking into account rule interactions. However, this approach makes the chromosome encoding more complicated and syntactically longer, which usually requires more complex genetic operators.

Hu (1998) proposed a Genetic Programming (GP) approach in which a program can be represented by a tree with rule condition and/or attribute values in the leaf nodes and functions in the internal nodes. The challenge is that a tree can grow in size with a shape in a very dynamical way. Thus, an efficient tree-pruning algorithm would be required to prune unsatisfied parts of within a tree to avoid infeasible solutions. Bojarczuk et al. (2001) proposed a constrained–syntax GP approach to build a decision model, particularly with emphasis on the discovery of comprehensible knowledge. The constrained-syntax mechanism was applied in verifying the relationship between operators and data types of operands during the tree-building.

In order to discover high-level prediction rules, Freitas (1999) applied a first-order relationships such as “Salary > Age” by checking an Attribute Compatibility Table (ACT) during the discovery process with GA-Nuggests. ACT was claimed especially effective for its knowledge representation capability. By extending the use of ACT, our proposed approach allows other complicated attributes relationships such as linear or non-linear quantitative relationship among multi-attributes. This mechanism attempts to aid reducing the search spaces during the GA’s evolution process.


Among data mining techniques, a decision tree is one of the most commonly used methods for knowledge discovery. A decision tree is used to discover rules and relationships by systematically breaking down and subdividing the information contained in data (Chou, 1991). A decision tree features its easy understanding and a simple top-down tree structure where decisions are made at each node. The nodes at the bottom of the resulting tree provide the final outcome, either of a discrete or continuous value. When the outcome is of a discrete value, a

38

classification tree is developed (Hunt, 1993], while a regression tree is developed when the outcome is numerical and continuous (Bala & De Jong, 1996).

Classification is a critical type of prediction problems. Classification aims to examine the features of a newly presented object and assign it to one of a predefined set of classes (Michael & Gordon, 1997). Classification trees are used to predict membership of cases or objects in the classes of a categorical dependent variable from their measurements on one or more predictor variables. The classification trees induction process often selects the attributes by the algorithm at a given node according to such as Quinlan’s information gain (ID3), gain ratio (C4.5) criterion, Quest’s statistics-based approach to determine proper attributes and split points for a tree (Quilan, 1986; 1992; Loh & Shih, 1997).


Many algorithms can be used to discover association rules from data in order to identify patterns of behavior. One of the most used and famous is the apriori algorithm that can be detailed in (Agrawal et al., 1993; Agrawal & Srikant; 1994). For instance, an association rule algorithm is able to produce a rule as follows: When people buy Bankers Trust they also buy Dow Chemical 20 percent of the time.


Due to association rule algorithms’ advantage in deriving association among data items efficiently, recent data mining research in classification trees construction has attempted to adopt this mechanism for knowledge preprocessing. For example, apriori algorithm was applied to produce those association rules that can be converted into initial population of GP (Niimi & Tazaki, 2000; Niimi & Tazaki, 2001). Improved learning efficiency for the proposed method was demonstrated when compared with a GP without using association rule algorithms. However, the handling of multivariate classification problems and better learning accuracy were not specified in these researches.

2.4. Constraint Satisfaction in Genetic Algorithms

39

Constraint satisfaction involves finding values for problem variables associated with constraints on acceptable solutions to satisfy these constraints. Problem solving in a constraint satisfaction problem (CSP) that is basically belongs to NP-Complete problems normally lacks suitable methods. A number of different approaches have been developed for solving the CSP problems. Some of them adopted constraint propagation to reduce the solutions space. Others tried “backtrack” to directly search for possible solutions. Some applied the combination of these two techniques including tree-search and consistent algorithms to efficiently find out one or more feasible solutions. Nadel (1988) compared the performance of the several algorithms including “generate and test”, “simple backtracking”, “forward checking”, “partial lookahead”, “full lookahead”, and “really full lookahead.” The major differences of these algorithms are the degree of consistency performed at the node during the tree-solving process. Besides the “generate and test” method, others performed hybrid techniques. In other words, whenever a new value is assigned for the variable, the domains of all unassigned ones are filtered and left only with those values that are consistent with the one already being assigned. If the domains of any of these uninstantiated variables become empty, the contradiction is recognized, and backtracking occurs. Freuder (1982) mentioned that if a given CSP has a tree-structured graph, then it could be solved without any backtracking once. That is, solutions can be retrieved in a backtrack-free manner. Dechter & Peral (1988) used this theory coupled with the notion of directional consistency to generate backtrack-free solutions more efficiently.

Dealing with constraints for search space reduction seems to be an important research issues for many artificial intelligence areas. GAs maintain a set of chromosomes (solutions), called population. The population consists of parents and offspring. When the evolution process proceeds, the best N chromosomes in the current population are selected as parents. Through performing genetic operators, offspring are selected according to the filtering criterion that is usually expressed as fitness functions along with some predefined constraints. The GA evolves over generations until stopping criterions are met. However valid chromosomes are usually produced by trials and errors. That is, a candidate chromosome is produced and tested against the filtering criterions. Therefore a GA may require more computation, especially in dealing with complicated or severe filtering criterions. To resolve this problem, an effective chromosome construction process can be applied to the initialization, crossover, and mutation stages respectively.

Garofalakis (2000) provided a model constraints-based algorithm inside the mining process to specify the expected tree size and accuracy. In other words, constraints can be used to express the trade-off between the model accuracy and

40

computation efficiency that is associated with the tree-building or tree-pruning process. Similar research work, CADSYN (Maher, 1993), adopted a constraint satisfaction algorithm for case adaptation in case based reasoning. Purvis (1995) also applied a repair-based constraint satisfaction algorithm to aid case adaptation. By pushing constraints into the case adaptation process, the constraint satisfaction algorithm is able to aid a case-based reasoning system to produce more decent solution cases more efficiently for the discrete based constraints problem.

Barnier & Brisset (1998) developed a hybrid system of a genetic algorithm and constraint satisfaction techniques. This method is adopted to resolve the optimization problems for vehicle routing and radio link frequency assignment. Basically this approach applied a GA to reduce the search space for a large CSP, instead of applying constraint satisfaction techniques to improve the GA’s computation efficiency. Kowalczyk (1996) pointed out the concept of using constraint satisfaction principles to support a GA in handling constraints. However rare research works applied constraint based reasoning to effectively handle GAs computation inefficiency, nor on how the user’s knowledge can be presented and processed within a constraint network.

3. The Proposed Constraint-based Evolutionary Classification Tree (CECT) Approach

The proposed CECT approach consists of three modules: the user-interface, the symbol manager, and constraint-based GA (CBGA). According to Fig. 1, the user interface module allows users to execute the following system operations including:

Loading a constraint program, Adding or retracting the constraints, Controlling the GA’s parameter settings, and Monitoring the best solutions.

The constraint program here is a set of any first order logic sentence (atomic, compound or quantified) about a many-sorted universe of discourse that includes integers, real numbers, and arbitrary application-specific sorts. The details can be found in Lai (1992).

41

Constraint-Based GA

(a) GAInitialization

(c) FitnessEvaluation

(e) GA Operators

CrossoverSelection

(b) ChromosomesFiltering

using Constraint-Based

Reasoning

(b) ChromosomesFiltering

using Constraint-Based

Reasoning

(d) StopCondition met

Yes

ConstraintProgram

SymbolManager

ConstraintNetwork

User Interface

User

Mutation

no

Control FlowData Flow

Return the Best Solution

Fig. 1. The Conceptual Diagram of the Proposed CECT System

Fig.2 indicates the knowledge preprocessing for constructing the constraint

program. Three types of data sources: GA parameter settings, human knowledge, and data sets are converted into the constraint programs. Each gene in a GA here is equal to each object of constraint program. The range for each gene can be viewed as a domain constraint for each object. The predefined hard constraints are represented by the first order logic sentences.

The human knowledge specifies the user preferences, or partial expert experiences. For example, the user’s preference such as “people high blood pressure cannot take certain drugs” can be treated as one type of expert knowledge. It can be translated into the user’s defined constraints in the form of first order logic sentence.

The association rule mining module generates association rules by apriori algorithm. In this research the derived association rules has to satisfy user-specified minimum support and minimum confidence constraints. The format of the developed constraints is given by:

A1 ^ … ^ Ak -> Ci ; where A1, A2,.., Ak are conjunctions of conditions and each condition is gi=vi ( vi is value from the domain of the gene gi) , and Ci is the classification result.

42

Fig. 2. The Knowledge Preprocessing for Constraint Program Construction

The symbol manager examines the syntax of the first order logic sentences in the

constraint program and translates the syntax into a constraint network for further processing.

In the CBGA module, the constraint-based reasoning filters each gene value and processes both the GA initial population and regular populations. The details of each module are illustrated as follows.

Module (a) -- GA Initialization: To generate a valid chromosome in the beginning, an initial population can be derived through a chromosome filtering process using constraint-based reasoning to produce valid candidates.

Module (b) -- Chromosome Filtering: During the chromosome construction process, any determined gene might affect the valuation of remaining genes. Whenever the gene is randomly instantiated from the satisfied universe, the possible values for the remaining genes will be restricted by the constraint propagation to assure constraint network consistency. In this way the constructed chromosomes in the CBGA are thus guaranteed as valid; and the hard and soft constraints can all be examined and solved during the chromosome construction process.

Module (c) -- Fitness Evaluation: Calculate the fitness value for each chromosome in the current population.

Module (d) -- Stop Condition Met: If a specified stopping condition is satisfied, then terminate the CBGA and return the best solution; otherwise, continue the GA operations.

Module (e) -- GA operators: Each operation such as crossover or mutation generates new offspring. The offspring produced by CBGA needs to be validated by the chromosome filtering operation to assure valid individuals are derived for further process.

To speed up the reasoning process, both the variable ordering (Purdom, 1983) and backtrack-free search methods (Freuder, 1982) are adopted in the CBGA to derive contradiction-free chromosomes.

As shown in Fig. 3, each chromosome Ci of a regular population with size N is processed by constraint-based reasoning in sequence. The genes in a chromosome can be viewed as a subset of variables. The valuation scope of each gene gij is restricted via constraint–based reasoning. This system efficiently transforms human knowledge, such as expert experience or common sense into a constraint network leading to more

43

significant rule sets. The search efforts for valid chromosomes can be reduced without having to activate the fitness evaluation procedure on each candidate chromosome.

Fig. 4 illustrates the details of the chromosome construction process. In essence, CECT system applies local propagation for chromosome filtering. The basic concept of local propagation (Freuder, 1982) involves using the information local to a constraint to validate gene values. Local propagation also restricts the valid gene range referenced in the constraint. The satisfactory gene values (i. e., the satisfactory universe) are propagated through the network thus enabling other constraints to impound other valid range sets for the remaining genes. According to the restricted valid range denoted by the satisfactory universe SGj, the valuation process for gene gij is then activated to examine the inferred gene value. The new g’ij value is replaced by a value randomly selected from SGj if the inferred gene value is inconsistent with the constraint network. By repeatedly applying local propagation and the valid valuation process on chromosome Ci in the sequence gi1, gi2, … , gim , the new chromosome C’i is thus able to satisfy the constraint network. As a result local propagation offers an efficient way to guide the GA toward searching for the best chromosome by reducing the search space already filtered by the constraint network.

Fig. 3. The Framework of Constraint-Based Preprocessing for GA Operators

Fig. 4. The Detail Illustration for Chromosome Screening

4. The Experiments and Results Before analyze the TEJ data sets, the credit screening data sets from the UCI

repository (Blake & Merz, 1998) was adopted to validate our proposed approach. Besides, two other approaches: a simple GA (SGA) and apriori algorithm with GA (AGA) were employed to compare our proposed approach that is denoted by ACECT (i. e., apriori algorithm with CECT).

Generally association rules extracted by apriori algorithm could be varied depending on the defined support and confidence values. Different association rules extracted may result in different impacts on CECT learning performances. Therefore this research experimented with different sets of minimum support and confidence values to both the credit screening and financial performance prediction problems. The evaluation of those classification trees generated by each of the three approaches was based on a five-fold cross validation. That is, each training stage used 4/5 of the entire data records; with the rest 1/5 data records used for the testing stage. The GA parameter settings for both the applications are summarized in Table 1.

44

Application to Credit Screening Data Sets

The collected 690 credit screening data records consist of nine categorical attributes and six continuous attributes for the input part. The output part is one categorical attribute. Data of the nine categorical attributes are fed into apriori algorithm to produce the association rules. Among the entire data records, 37 incomplete data records out of the entire data records were precluded in the entire learning processes. After several trials with different sets of minimum support values and confidence values, the best learning performance for AGA and ACECT is obtained. Both the training and testing results are summarized in Table 2, along with their corresponding representation depicted in Fig. 5 & 6. These results are based on the minimum support value (=3) and confidence value (=100). The derived association rule sets consists of 480 rules for “Approve” output category and 4096 association rules for “Disapprove” output category. In order to obtain more details about the learning progress for the three approaches, learning tract behavior were recorded in sessions. Fig. 7 & 8 depict the entire learning progresses monitored over generations and time. The optimal classification tree derived and its detail notation are specified in Fig. 9 and Table 3.


Item Value Population Size 100 Generations 100/200/300/400/500 Crossover rate 0.6 Mutation rate 0.01 Selection method Roulette wheel Training time (Credit Screening) 1.4 Minutes*• Training time (Financial Performance Prediction) 1.7 Minutes*• * •

The hardware platform is Pentium III 800 MHz with 512 MB RAM Table 2. The 5-Fold Learning Results for SGA, AGA, and ACECT

Application to Financial Performance Data Sets Financial ratios are commonly employed to measure a corporate financial

performance. In recent years a considerable amount of research has been directed towards the analysis of the predictive power of financial ratios as influential factors of corporate stock market behavior. Some of the financial ratios, such as Current Ratio, Receivables Turnover, and Times Interest Earned, Capital, were used for bankruptcy prediction (Shah & Murtaza, 2000), financial distress prediction (Coats & Fant, 1991; Ganesalingam, 2001). This research applies CECT approach to construct a

45

classification model for predicting the corporate finance performance using various financial ratios. The notation for the variables (i. e., the seven financial ratios) is specified in Table 4.

The dependent variable is a categorical type of data labeled by either “Good” or “Bad” according to the Tobin’s Q value. Tobin’s Q is a measure for evaluating a corporate’s financial performance (Ciccolo & Fromm, 1979; Lindenberg & Ross, 1981). The higher value a Tobin’s Q is, the better a corporate financial performance is. On the other hand, the lower value a Tobin’s Q is, the inferior a corporate financial

Table 4. The Various Financial Ratios Used in the Model

Descriptions Data Type X1: Industry type (22 types) category X2: Credit rating (1-10 rating) category X3: Employee size (1-4 level) category X4: Capital scope (1-4 level) category X5: Current ratio continuous X6: Debt ratio continuous X7: Times interest earned continuous X8: Receivables turnover continuous

performance is. This research denotes the dependent variable as “Good” if Tobin’s Q > 1; “Bad” if Tobin’s Q =< 1. The data used was derived from Taiwan Economic Journal (TEJ) database, a standard source of financial market database. 502 financial data records of the listed companies on Taiwan Stocks Market for the entire period of year 2001 were collected. Each data record includes eight input financial ratios and one output Tobin’s Q. Tobin’s Q value is converted into either “Good” or “Bad” before executing the learning process. Among the entire data records, 181 records are “Good” while 329 records are “Bad”.

After several trials with different sets of minimum support values and confidence values, the best ACECT learning performance is obtained. Both the training and testing performance are summarized in Table 5, along with their corresponding representation depicted in Fig. 10 & 11. These results are based on the minimum support value (=7) and confidence value (=100). The derived association rule sets consists of 12 rules for “Good” output category and 14 association rules for “Bad” output category. In order to obtain more details about the learning progress for the three approaches, learning tract behavior were recorded in sessions. Fig. 12 & 13 depict the entire learning progresses monitored over generations and time. The details of the optimal results derived are illustrated in Appendix A.

Table 5. The Summarized Learning Performance for SGA, AGA, and ACECT

46

5. Discussion According to the results indicated above ACECT achieves superior learning

performance than SGA and AGA in terms of computation efficiency and accuracy. According to Fig. 5 & 6 testing performances become worse for those training models of higher generations, mainly due to over-training.

By applying association rule process, the partial knowledge is extracted and transformed as seeding chromosomes. It can be seen that the initial training results for both AGA and ACECT exhibit significantly higher accuracy and efficiency than SGA.

As displayed in Fig. 7 both AGA and ACECT approach relative convergence within 20 generations, while SGA requires 160 generations to reach the similar result in the training stage. Further, as shown in Fig. 12 both AGA and ACECT approach relative convergence within 50 generations, while SGA requires 500 generations to reach the similar result in the training stage. The outcomes can be attributed to the adoption of apriori algorithm by which the GA search space is substantially reduced.

For the ACECT approach, the derived partial knowledge is not only encoded as seeding chromosomes, but also converted into the constraint network. As shown in the figures displaying learning progresses, ACECT outperforms AGA less significantly than outperforms SGA. Nevertheless, the improvement of ACECT over AGA positively demonstrates its effectiveness for both the applications data.

6. Conclusions and Future Development

We have introduced CECT approach that hybridizes constraint-based reasoning within a genetic algorithm for classification tree induction. Incorporating the partial knowledge or user-control information into mining process is not straightforward and, typically, requires the design of novel approaches. By employing the rule association algorithm to acquire partial knowledge from data, our proposed approach is able to induce a classification tree by pushing the partial knowledge into chromosome construction. Most importantly, the adoption of constraint-based reasoning mechanism into the GA process can filter invalid chromosomes; therefore feasible solutions can be more efficiently derived.

Comparing with SGA and AGA, ACECT achieves higher predictive accuracy and less computation time required for classification trees inductions using a benchmark data set as well as real financial data set. In addition, the classification trees discovered by ACECT not only obtain higher predictive accuracy and computation efficiency, but also may produce more user transparent or significant knowledge.

The proposed CECT is generic and problem independent. It is flexible to incorporate the user information or domain knowledge via the expressive power of first order logic into a tree induction process. In addition, it is not required to design

47

any proprietary genetic operator and chromosome representation to interact with constraint-based reasoning. Besides to integrating with association rule algorithms for knowledge preprocessing, CECT approach provides a potential way to allow domain experts to input professional knowledge or constraints to reveal further interesting knowledge. This approach is not only applicable for binary classification problems, but also applicable for multi-category classification problems, though the experiment examples are binary classification problems.

Currently CECT approach is able to reveal tree splitting nodes that may allowed complex rule sets-like discriminating formats such as “Attributei <= w ∗ Attributej” relationship which can be extended to express more complicated multivariate inequations with either a linear or nonlinear format in the future.

Improving Financial Performance by Exploring the Financial Ratios Basically prediction models map the inputs values to produce the outcome(s).

When the model is complex, it is not possible to easily figure out the appropriate inputs that can best approximate the expected output. Usually this type of research is called parameters design. The classification tree constructed by our proposed CECT approach can be a multivariate-split based classification tree. It would be relatively difficult to find out suitable inputs values in order to match an expected outcome. The mechanism that allows proceeding “what-if” as well as “goal-seeking” analysis can be a useful aid for financial managers in further exploring those financial ratios that are most likely or most unlikely to be adjusted to improve a corporate financial performance. In addition to our proposed CECT approach, this research is currently working on adopting another optimization technique to support the “goal-seeking” function. It is believed that such information provides highly strategic values for the corporate financial management.

48

The Hybrid of Apriori Algorithm and Constraint-Based Genetic Algorithm for Aircraft Electronic Ballasts Troubleshooting Abstract

The maintenance of aircraft components is crucial for avoiding aircraft accidents and aviation fatalities. Reliable and effective maintenance support is vital to the airline operations and flight safety. Sharing previous repair experiences with the state-of-the-art computer technology can improve aircraft maintenance productivity. This research proposes the hybrid of apriori algorithm and constraint-based genetic algorithm (ACBGA) approach to discover a classification tree for electronic ballasts troubleshooting. Constraint-based genetic algorithm (CBGA) is employed to reduce the solution-irrelevant search spaces by filtering invalid chromosomes. Apriori algorithm is used to acquire the insights for those input itemsets most associated with the outcome itemsets before executing CBGA for tree induction. As apposed to simple GA and apriori algorithm plus simple GA, the proposed approach is able to discover the troubleshooting classification tree with superior computation efficiency and prediction accuracy.

Keywords: Constraint-Based Reasoning; Genetic Algorithms; Classification Trees; Aircraft Electronic Ballast; Apriori Algorithm

1. Introduction Every airplane in operation throughout the world calls for appropriate

maintenance in order to assure flight safety. When an aircraft fault emerges, actions for fault diagnosis and troubleshooting must be executed promptly and effectively. An airplane consists of many electronic components among which the electronic ballast is one common component in controlling the cabin fluorescent lamps. The electronic ballast plays an important role in providing proper lights for passengers and flight crews during a flight. Unstable cabin lighting, such as flash and ON/OFF problems, would degrade the flight quality. An airplane usually has hundreds of electronic ballasts mounted in panels such as the light deflector of a fluorescent lamp fixture. When an electronic ballast is abnormal, it has to be removed and sent to the accessory shop for further investigation.

Revealing valuable knowledge hidden in corporate data becomes more critical for enterprise decision making. When more data is collected and accumulated, extensive data analysis won’t be easier without effective and efficient data mining methods. The maintenance records of electronic ballasts generally contain information about the number of defective units found, the procedures taken, and the inspection or repair status. Basically these records were stored and used for assisting mechanics in

49

identifying faults and determining the components as repair or replacement was necessary. This is because previous similar solutions may provide valuable troubleshooting clues for new faults.

Rule induction is one of the most common methods of knowledge discovery. It is a method for discovering a set of "If/Then" rules that can be used for classification or estimation. Basically an ideal technique for rules induction has to carefully tackle those aspects, such as model comprehensibility and interestingness, attributes selection, learning efficiency and effectiveness, and etc. Genetic algorithm (GA), one of the often used evolutionary computation techniques, has been increasingly aware for its superior flexibility and expressiveness of problem representation as well as its fast searching capability for knowledge discovery. In the past genetic algorithms were mostly employed to enhance the learning process of data mining algorithms such as neural nets or fuzzy expert systems, but rather to discover models or patterns. That is genetic algorithms acted as a technique for performing a guided search for suitable pattern s among the solution space. While genetic algorithms are a feasible approach for discovering patterns, they add a lot of heavy computation load when the search space is huge.

Generally, rule induction methods are used to automatically produce rule sets for predicting the expected outcomes as accurately as possible. However the emphasis on revealing crucial or interesting knowledge has become a recent research issue in data mining. These attempts may impose additional rule discovery constraints, and thereby produce additional computation overhead. For regular GA operations, constraint validation is proceeded after a candidate chromosome is produced. That is, several iterations may be required to determine a valid chromosome. One way to improve the computation load problem is to prevent the production of invalid chromosomes before a chromosome is generated; thereby accelerating the efficiency and effectiveness of evolution processes. Potentially, this can be done by embedding a well-designed constraint mechanism into the chromosome-encoding scheme.

In this research we propose a novel approach that integrates apriori algorithm and constraint-based genetic algorithm (ACBGA) to discover a troubleshooting classification tree. Apriori algorithm, one of the common seen association rule algorithms, is used for attributes selection; therefore those related input attributes can be determined before proceeding the GA’s evolution. The constraint-based reasoning is used to push constraints along with data insights into the rule set construction. This research applied tree search and forward checking (Haralick & Elliott, 1980; Brailsford et al, 1999) techniques to reduce the search space from possible gene values that can not meet predefined constraints during the evolution process. This approach allows constraints to be specified as relationships among attributes

50

according to predefined requirements, user preferences, or partial knowledge in the form of a constraint network. In essence, this approach provides a chromosome-filtering mechanism prior to generating and evaluating a chromosome. Thus insignificant or irreverent rules can be precluded in advance via the constraint network.

A prototype troubleshooting system based on ACBGA approach was developed using the aircraft maintenance records of electronic ballasts provided by one major airline company in Taiwan.

2. The Background

2.1. The Aircraft Electronic Ballasts The aircraft electronic ballasts used to drive fluorescent lamps can be mounted

on a panel such as the light deflector of a fluorescent lamp fixture. The fluorescent lamps initially require a high voltage to strike the lamp arc and maintain a constant current subsequently. Usually there is a connector at one end of the unit for the routing of all switching and power connections. As shown in Fig. 1, the electronic ballast operates from control lines of 115-vac/400Hz aircraft power. When the operation power is supplied, the electronic ballast will start and operate two rapid start fluorescent lamps or one single lamp in the passenger cabin of various commercial aircrafts, such as Boeing 747-400, 737-300, 737-400, 747-500 and etc. There are two control lines connecting with the ballast set and control panel for ON/OFF as well as BRIGHT/DIM modes among which DIM mode is used at night when the cabin personnel attempts to decrease the level of ambient light.

To diagnose which component is of malfunction, the mechanics usually measure the alternating current in BRIGHT mode, DIM mode when the electronic ballast turns on or off. In addition, the check of light stability and the illumination status is also important to aid the maintenance decision. The detail maintenance record description for troubleshooting is summarized in Table 1.

Each maintenance record contains seven attributes identified as highly related to abnormal electric ballasts. These data attribute values are all categorical. Each category in the outcome attribute represents a different set of replacement parts. For instance, category C1 denotes the replacement parts of a transformer (illustrated as T101 on a printed circuit board) and a capacitor (illustrated as C307 on a printed circuit board). Category C2 denotes the replaced parts of an integrated circuit (illustrated as U300 on a printed circuit board), a transistor (illustrated as Q301 on a printed circuit board) and a fuse (illustrated as F401 on a printed circuit board).

51

Fluorescent Lamp

Fluorescent Lamp

Fluorescent Lamp

ElectronicBallast

ElectronicBallast

…

…

Lamp Set Ballast Set

Control Lines

Control Panel

…

Control Lines

Fig. 1.The Operational Setup for Electronic Ballast

Table 1. The Record Description

Input Attributes Data Type Range Alternating Current on Bright Mode When Electronic Ballast Turns OnAlternating Current on DIM Mode When Electronic Ballast Turns OnAlternating Current on Bright Mode When Electronic Ballast Turns OffAlternating Current on DIM Mode When Electronic Ballast Turns OffIs Light Unstable When Electronic Ballast Turns On Is It Not Illuminated When Electronic Ballast Turns On

Categorical Categorical Categorical Categorical Categorical Categorical

15 intervals of (amp)11 intervals of (amp)16 intervals of (amp) 16 intervals of (amp)

0 and 1 0 and 1

Outcome Attribute Replacement Parts

Categorical

C1, C2, …, C10


Rule induction methods can be categorized into either tree based or non-tree based methods (Abdullah, 1999). Some of the often-mentioned decision tree induction methods include C4.5 (Quinlan 1993), CART (Breiman et al., 1984) and GOTA (Hartmann et al., 1982) algorithms. Both decision trees and rules can be described as disjunctive normal form (DNF) models. Decision trees are generated from data in a top-down and general to specific direction (Chidanand & Sholom, 1997). Each path to a terminal node is represented as a rule consisting of a conjunction of tests on the path’s internal nodes.

Michalski et al., (1986) proposed AQ15 algorithm to generate a disjunctive set of classification rules. The CN2 rule induction algorithm also used a modified AQ algorithm that involves a top-down beam search procedure (Clark & Niblett, 1989). The Basic Exclusion Algorithm (BEXA) is another type of rule induction method proposed by Theron & Cloete (1996). It follows a general-to-specific search procedure in which disjunctive conjunctions are allowed.

GAs have been successfully applied to data mining for rule discovery in literatures. There are some techniques using one-rule-per-individual encoding

52

proposed by Greene & Smith (1993), and Noda et al. (1999). For the one-rule-per-individual encoding approach, a chromosome usually can be identical to a linear string of rule conditions, where each condition is often an attribute-value pair, to represent a rule or a rule set. However, the more complicated and syntactically longer for the chromosome encoding, more complex genetic operators are required.

Hu (1998) proposed a Genetic Programming (GP) approach in which a program can be represented by a tree with rule condition and/or attribute values in the leaf nodes and functions in the internal nodes. The challenge is that a tree can grow in size with a shape in a very dynamical way. Thus, an efficient tree-pruning algorithm would be needed to readjust unsatisfied parts within a tree to avoid infeasible solutions. Bojarczuk et al. (2001) proposed a constrained–syntax GP approach to build a decision model, particularly with emphasis on the discovery of comprehensible knowledge.

In order to discover high-level prediction rules, Freitas (1999) applied a first-order logic relationships such as “Salary > Age” by checking an Attribute Compatibility Table (ACT) during the discovery process with GA-Nuggests. ACT was claimed especially effective for its knowledge representation capability. By extending the use of ACT, our proposed approach allows other complicated attributes relationships such as linear or non-linear quantitative relationship among attributes. This mechanism attempts to aid reducing the search spaces during the GA’s evolution process. 2.3. Classification Trees

Among data mining techniques, a decision tree is one of the most commonly used methods for knowledge discovery. A decision tree is used to discover rules and relationships by systematically breaking down and subdividing the information contained in data (Chou, 1991). A decision tree features its easy understanding and a simple top-down tree structure where decisions are made at each node. The nodes at the bottom of the resulting tree provide the final outcome, either of a discrete or continuous value. When the outcome is of a discrete value, a classification tree is developed (Hunt, 1993), while a regression tree is developed when the outcome is of a continuous value (Bala & De Jong, 1996).

Classification is a critical type of prediction problems. Classification aims to examine the features of a newly presented object and assign it to one of a predefined set of classes (Michael & Gordon, 1997). Classification trees are used to predict membership of cases or objects in the classes of a categorical dependent variable from their measurements on one or more predictor variables. The classification trees induction process often selects the attributes by the algorithm at a given node according to such as Quinlan’s information gain (ID3), gain ratio (C4.5) criterion,

53

Quest’s statistics-based approach to determine proper attributes and split points for a tree (Quilan, 1986; 1992; Loh & Shih, 1997).

2.4. Apriori Algorithms for Attributes Selection

Many algorithms can be used to discover association rules from data in order to identify patterns of behavior. One of the most often used approach is the apriori algorithm that is detailed in (Agrawal et al., 1993; Agrawal & Srikant; 1994). For instance, apriori algorithm is able to produce a rule as follows: When component A is abnormal AND component B is normal THEN Replace UNIT-l with 70 percent possibility true.

Apriori algorithm, given the minimum support and confidence levels, is able to produce rules from a set of data through the discovery of the so-called itemset. A rule has two measures, called confidence and support. Support (or prevalence) measures how often items occur together, as a percentage of the total records. Confidence (or predictability) measures how much a particular item is dependent on another. Due to apriori algorithm’s advantage in deriving association among data items efficiently, recent data mining research in classification trees construction has attempted to adopt this mechanism for knowledge preprocessing. For example, apriori algorithm was applied to produce those association rules that can be converted into initial population of genetic programming (GP) (Niimi & Tazaki, 2000; Niimi & Tazaki, 2001). Improved learning efficiency for the proposed method was demonstrated when compared with a GP without using association rule algorithms. However, the handling of multivariate classification problems and better learning accuracy were not specified in these researches.

2.5. Constraint Satisfaction in Genetic Algorithms

Constraint satisfaction involves finding values for problem variables associated with constraints on acceptable solutions to satisfy these constraints. Problem solving in a constraint satisfaction problem (CSP) that basically belongs to NP-Complete problems normally lacks suitable methods. A number of different approaches have been developed for solving the CSP problems. Some of them adopted constraint propagation (Bowen et al., 1990; Dechter & Pearl, 1988; Lai, 1992) to reduce the solutions search space. Others tried “backtrack” to directly search for possible solutions (Nadel, 1988). Some applied the combination of these two techniques including tree-search and consistent algorithms to efficiently find out one or more feasible solutions (Purdom, 1983; Freuder, 1982). Nadel (1988) compared the

54

performance of several algorithms including “generate and test”, “simple backtracking”, “forward checking”, “partial lookahead”, “full lookahead”, and “really full lookahead”. The major differences of these algorithms are the degree of consistency performed at the node during the tree-solving process. Besides the “generate and test” method, others performed hybrid techniques. Whenever a new value is assigned for the variable, the domains of all unassigned ones are filtered and left only with those values that are consistent with the one already being assigned. If the domains of any of these unassigned variables become empty, the contradiction is recognized, and backtracking occurs. Freuder (1982) mentioned that if a given CSP has a tree-structured graph, then it could be solved without any backtracking. That is, solutions can be retrieved in a backtrack-free manner. Dechter & Pearl (1988) used this theory coupled with the notion of directional consistency to generate backtrack-free solutions more efficiently.

Dealing with constraints for search space reduction becomes a crucial issue in artificial intelligence research areas. GAs maintain a set of chromosomes (solutions), called population. The population consists of parents and offsprings. When the evolution process proceeds, the best N chromosomes in the current population are selected as parents. Through performing genetic operators, offsprings are selected according to the filtering criterion that is usually expressed as fitness functions along with some predefined constraints. The GA evolves over generations until stopping criterions are met. However valid chromosomes are usually produced by trials and errors. That is, a candidate chromosome is produced and tested against the filtering criterions. Therefore a GA may require more computation, especially in dealing with complicated or severe filtering criterions. To resolve this problem, an effective chromosome construction process can be applied to the initialization, crossover, and mutation stages respectively.

Garofalakis (2000) provided a model constraints-based algorithm inside the mining process to specify the expected tree size and accuracy. In other words, constraints can be used to express the trade-off between the model accuracy and computation efficiency that is associated with the tree-building or tree-pruning process. Similar research work, CADSYN (Maher, 1993), adopted a constraint satisfaction algorithm for case adaptation in case-based reasoning. Purvis (1995) also applied a repair-based constraint satisfaction algorithm to aid case adaptation. By pushing constraints into the case adaptation process, the constraint satisfaction algorithm is able to aid a case-based reasoning system to produce suitable solution cases more efficiently for the discrete based constraints problem. Barnier & Brisset (1998) developed a hybrid system of a genetic algorithm and constraint satisfaction techniques. This method is adopted to resolve the optimization problems for vehicle

55

routing and radio link frequency assignment. Basically this approach applied a GA to reduce the search space for a large CSP, rather than applying constraint satisfaction techniques to improve the GA’s computation efficiency. Kowalczyk (1996) pointed out the concept of using constraint satisfaction principles to support a GA in handling constraints. However rare research works applied constraint based reasoning to effectively handle GAs computation inefficiency, nor on how the user’s knowledge can be presented and processed within a constraint network. This research attempts to apply constraint-based reasoning to aid a GA to more efficiently to deal with chromosomes screening in stages of initialization and ongoing genetic operations.

3. The Hybrid of Association Rule Algorithms and Constraint-Based Genetic Algorithms (ACBGA)

The proposed ACBGA approach consists of three modules. According to Fig. 2, these modules are:

Rule Association; GA Initialization; Fitness Evaluation; Chromosome Screening; and GA Operation.

By executing apriori algorithm the Rule Association module produces association rules. In this research, the data items of categorical attributes are used to construct the association rules. The association rule here is an implication of the form X→Y where X is the conjunction of conditions, and Y is the classification result. The rule X→Y has to satisfy user-defined minimum support and minimum confidence levels.

Fig. 3 illustrates the classification tree with n nodes. Each node consists of two types of attributes: categorical and continuous. Categorical attributes are used for constructing the partial combination of conditions, while the continuous attributes are used for constructing the inequality relationship. The formal form of a tree node is expressed by the following expression:

A and B → C;

56

Chromosome Screening

Rule AssociationX1->Y1

.

.

.Xn->Yk

ACBGA Approach

Data Set

GAInitialization

FitnessEvaluation

GA operation

Crossover

Selection

Stop ConditionMet ?

Yes

Mutation

no

Optimal Classificaiton Tree

ConstraintNetwork

Constraint-Based

Reasoning

AprioriAlgorithm

Fig. 2. The Conceptual Diagram of ACBGA

Fig. 3. The Illustration for the Classification Tree where A denotes the antecedent part of the association rule; B is the conjunction of inequality functions in which the continuous attributes, relational operators and splitting values are determined by the GA; and C is the classification result directly obtained from the association rule.

For example, the X1, X2, X3 are categorical attributes and X4, X5 are continuous attributes. Assume that one association rule “X1=5 and X3=3 → Ck” is selected first, and then combined with the conjunction of inequality “X4<=10 and X5>1” that is randomly determined by the GA to form a tree node expressed as follows.

Nodei: IF X1=5 and X3=3 and X4<=10 and X5>1 Then the result is Ck; The GA Initialization module generates a chromosome that is equivalent to a

57

classification tree. The tree nodes can be easily presented in the form of “Node1, Node2, … , Noden” where n is the total tree node number of a classification tree, and automatically determined by the GA. The chromosome encoding steps of a classification tree are stated as follows.

Step (a): For each node, the antecedent condition is obtained from one

association rule that is randomly selected out of the entire association rules set generated in advance. That is, the “A” part of one tree node formal form is determined.

Step (b): By applying the GA, the selected relational operator (<= or >) and

splitting point are determined for each continuous attribute. Therefore the “B” part of one tree node formal form is determined.

Step (c): The classification result of a tree node part comes directly from the

consequence of the derived association rule. Subsequently each tree node is generated by repeatedly applying Step 1 & 2 for n times; where n is a value automatically determined by the GA.

The association rules for each classification result are used not only to construct

the candidate classification trees for GA initialization, but also filter the insignificant or irreverent attributes in rules. The process of filtering the insignificant or irreverent attributes is done by applying the hybrid of tree search and forward checking algorithm during the GA operation. In forward checking, when a value is assigned to current gene, any possible value of remaining genes that conflict with current gene is removed from the potential candidates pool. That is, whenever a new gene is considered, all its possible genes remained in the pool are guaranteed to be consistent with the constraint network of past genes assignments. We illustrate this reasoning process by the following example.

Assume that the data set consists of three attributes X1, X2, and Y, where Y is a binary classification result. The domain universe for each attribute is as follows.

},{},{

},,{

2

1

noyesYedX

cbaX

∈∈∈

By applying apriori algorithm, the association rules inferred based on given minimum support and confidence are expressed as follows.

58

noYeXcXnoYaX

yesYeXbXyesYdXbXyesYdXaX

=→=∧==→=

=→=∧==→=∧==→=∧=

21

1

21

21

21

These association rules for different classification results can be translated into a set of constraints (i. e., the constraint network) that is used to confine the generation of feasible relationships among attributes. In regular solving CSP given with a set of constraints, a constraint processing technique (i.e., constraint-based reasoning) is applied to determine whether these constraints can all be satisfied. For handling a dynamic environment, application programs in a constraint-based language consist simply of declarative sets of constraints. The necessary processing is supposed to be performed by general-purpose reasoning techniques.

In this example, the association rules can be translated into a constraint program. The syntax of this program is based on a constraint programming language (Bowen, 1990; Lai, 1992). The program is a set of any first order logic sentences about a many-sorted universe of discourse which includes integer, real numbers, and arbitrary application-specific sorts. The association rules here are simply formulated by a set of first order logic sentences described by a constraint program as follows.

Domain DomY =:= {yes, no}; Domain DomX1 =:={a, b, c}; Domain DomX2 =:={d, e}; Relation assorule1(DomY, DomX1, DomX2) =:= {(yes, a, d), (yes, b, d), (yes, b, e) }; Relation assorule2(DomY, DomX1, DomX2) =:= {(no, a, d),(no, a, e),(no, c, e)}; assorule1(Y, X1, X2); assorule2(Y, X1, X2);

The domain for each attribute is first defined. The domain names “DomX1, DomX2, and DomY” are described by a set of application-specific symbols, and the operator “=:=” stands for “is defined as”. The definitions of relational symbols “assorule1” and “assorule2” are used to describe the relationship of relevant values for the classification results “yes” and “no”, respectively. Further details about the constraint program can be found in (Bowen, 1990; Lai, 1992)

59

To improve computation efficiency in search for the optimal classification tree, effective reduction of domain range for each attribute is required. Fig. 4 illustrates how to reduce the search space by incorporating the hybrid techniques of tree search and constraint propagation. Assume that the nodei is used to predict the classification result “yes”. Then, the satisfied values “a” and “b” for attribute X1 can be inferred by

X X

Yes No

{yes,no}

a b c a b c

X

d e d e

X

d e

Y

X1

X2

d e

{a,b} {a,c}

{d} {d,e} {e}{d,e}

Fig. 4. The Hybrid of Tree Search and Constraint Propagation. constraint propagation with the relation “assorule1”. That is, the infeasible value “c” is removed from domain range of X1 to reduce the search space.

The Fitness Evaluation module calculates the fitness value for each chromosome in the current population. The fitness function is defined as the total number of misclassification. If the specified stopping condition is satisfied, then the entire process is terminated, and the optimal classification tree is confirmed; otherwise, the GA operations are continuing.

Each GA operation contains chromosomes selection, crossover, and mutation in order to produce offspring generation based on different GA parameter settings. The offsprings produced by ACBGA needs to be validated by the chromosome filtering operation to assure valid or potential individuals are derived for further process.

As shown in Fig. 5, each classification tree containing n nodes can be represented by a chromosome. Each node represents a rule that is used to determine one classification result. The rule consists of the conjunction of conditions in the form of equality or inequality relation. The encoding process for a condition requires three genes to specify “the enable/disable status”, “the equality/inequality operator”, and

60

“the splitting value”. When there are k attributes, the number of genes for each nodei is 3*k+1, where the additional one gene (gi1) indicates the classification result. Therefore totally n*(3*k+1) genes are required to form a chromosome. The

Chromosome Filteringusing

Constraint-BaedReasoning

ConstraintNetwork

Chromosome Screening

Node 1

Class ?yes

no Node i

Class ?

Node n

Class ?

...yes yes

no

gi1 gi2 gij... ... gimg11 g12 g1j... ... g1m gn1 gn2 gnj... ... gnm

...

... ...

...(for node1) ...(for nodei) (for noden)

g'i1 g'i2 g'ij... ... g'img'11 g'12 g'1j... ... g'1m g'n1 g'n2 g'nj... ... g'nm... ......(for node1) ...(for nodei) (for noden)

Node 1

Class g'11

yes

no Node i Node n...yes yes

no...Class g'i1 Class g'n1

Fig. 5. The Framework of Constraint-Based Preprocessing for GA Operators

constraint network contains the information extracted from the association rules. The valuation scope of each gene gij is confined via constraint–based reasoning. The search efforts for potential chromosomes can be reduced without having to activate the fitness evaluation procedure for each candidate chromosome.

Fig. 6 illustrates the details of the chromosome construction process. In essence, ACBGA system applies forward checking for chromosome filtering. In forward checking, when a value is assigned to the current gene, any possible value of the remaining genes that conflict with the current gene is removed from the potential candidates pool. Besides, the valid gene values are propagated through the network

61

thus enabling other constraints to impound other valid range sets for the remaining genes. That is, whenever a new gene is considered, all its valid gene values remained in the pool are guaranteed to be consistent with the constraint network and past genes assignments. According to the restricted valid range denoted by SGj, the valuation process for gene gij is then activated to examine the inferred gene value. The new g’ij value is replaced by a value randomly selected from SGj if the inferred gene value is inconsistent with the constraint network.

By repeatedly applying forward checking and the valuation process on chromosome C in the sequence gi1, gi2, … , gnm , the new chromosome C’ is thus able to satisfy the constraint network. As a result the constraint-based reasoning approach offers an efficient way to guide the GA toward searching for the optimal chromosome by reducing the search space already filtered by the constraint network.

ijij gg ←'

Satisfied Gene Values for Gj

(denoted by SGj )Forward Checking

DomainRange for

Genes

ConstraintNetwork

no

yes

j=1Ci =classification result (gi1 )

jGij Sg ∈

Select any value from S

Gj randomly

←ijg '

no

mj ≤j=j+1 yes

no

yes ni ≤i=i+1

i=1

gi1 gi2 gij... ... gimg11 g12 g1j... ... g1m gn1 gn2 gnj... ... gnm... ...

g'i1 g'i2 g'ij... ... g'img'11 g'12 g'1j... ... g'1m g'n1 g'n2 g'nj... ... g'nm... ...

C

C'

62

Fig. 6. The Detail Illustration for Chromosome Screening

4. The Experiments and Results Before experimenting the electronic ballast data, the car evaluation data set from

the UCI repository (Blake & Merz, 1998) was used to validate our proposed approach. Besides, two other approaches: a simple GA (SGA) and apriori algorithm with GA (AGA) were employed to compare against ACBGA approach.

Generally association rules extracted by apriori algorithm could be varied depending on the defined support and confidence values. Different association rules extracted may result in different impacts on ACBGA learning performances. Basically this can be done by trying with different combinations of minimum support and confidence values. However, this research first computed the ratios of the number of each classification results in terms of the total data records. In other words, each ratio is denoted by

Ratioi = (the number of data records whose classification results is Classi )/(the number of entire data records);

where i=1 to n; n is the total number of the classification results.

Then the support level for each classification result is defined as SupportDiscoutj * Ratioi ; where SupportDiscountj = j * 10%; j= 1 to 10.

That is, the support level for Classi is incrementally increased from 10% to 100% of Ratioi. Though each Ratioi may apply different SupportDiscountj, for the simplification purpose, this research applied the same SupportDiscountj to each Ratioi. Similarly, the confidence level for each classification result is set as 100% first; and then dropped down decreasingly if the learned pattern is not satisfied.

The evaluation of those classification trees generated by each of the three approaches was based on a five-fold cross validation. That is, each training stage used 4/5 of the entire data records; with the rest 1/5 data records used for the testing stage. The GA parameter settings for both the applications are summarized in Table 2.


Item Value Population Size 100 Generations 100/200/300/400/500 Crossover rate 0.6

63

Mutation rate 0.01 Crossover method Uniform Selection method Roulette wheel Maximum tree node number (for UCI Car Evaluation Data) 50 Maximum tree node number (for Electronic Ballast Data) 30 Average training time used (for UCI Car Evaluation Data) 1.9 Minutes*• Average training time used (for Electronic Ballast Data) 0.37 Minutes*•

* •

The hardware platform is Pentium III 1.0 MHz with 512 MB RAM

The Car Evaluation Problem

The collected 1728 car evaluation data records consist of six categorical attributes in the input part. The output part is one categorical attribute. Data of the six categorical attributes were fed into apriori algorithm to produce the association rules. Both the final training and testing results are summarized in Table 3, along with their corresponding representation depicted in Fig. 7 & 8. These results are based on SupportDiscount (=10%) and confidence value (=100). The derived association rule sets consist of 62 rules for “unacc”, 83 rules for “acc”, 155 rules for “good”, and 159 association rules for “v-good” output category. To obtain more details about the learning progress for the three approaches, learning behavior were recorded in sessions. Fig. 9 & 10 depict the entire learning progresses monitored over various generations and time. Appendix A presents each tree node details of one relatively better classification tree derived. Based on this produced classification tree, the accuracy rates for the training and testing stages are able to reach 93.2% and 91.91%, respectively.

Table 3. The Summarized Learning Performance for SGA, AGA, and ACBGA

(based on 5-fold average)

Fig. 7. Training Results with Various Generations (based on 5-fold average)

Fig. 8. Testing Results with Various Generations (based on 5-fold average)

Fig. 9. Training Results with Various Generations (based on 5-fold average)

The Electronic Ballast Problem

Two hundred and fifty electric ballast maintenance records of Boeing 747-400 from the accessory shop of one major airline in Taiwan were used to construct the trouble-shooting model. Both the training and testing performance are summarized in

64

Table 4, along with their corresponding representation depicted in Fig. 11 & 12. These results are also based on SupportDiscount (=10%) and confidence value (=100). The derived association rule sets consist of 157 association rules for “C1”, 62 ones for “C2”, 62 ones for “C3”, 117 ones for “C4”, 187 ones for “C5”, 107 ones for “C6”, 119

Fig. 10. Testing Results with Various Generations (based on 5-fold average) ones for “C7”, 178 ones for “C8”, 177 ones for “C9”, and 196 ones for “C10” output category. In order to obtain more details about the learning progress for the three approaches, learning behavior were recorded in sessions. Fig. 13 & 14 depict the entire learning progresses monitored over generations and time. Appendix B presents each tree node details of one superior classification tree derived. Based on this produced classification tree, the accuracy rates for the training and testing stages are able to reach 84.5% and 84%, respectively.

Table 4. The Summarized Learning Performance for SGA, AGA, and ACBGA (based on 5-fold average)

Gen. 100 200 300 400 500 Train Test Train Test Train Test Train Test Train Test

SGA 58.30% 56.40% 67.60% 63.20% 68.80% 64.00% 71.60% 68.00% 72.40% 68.00%AGA 76.90% 76.40% 78.60% 75.20% 79.30% 74.80% 79.30% 74.80% 80.00% 74.80%

ACBGA 83.30% 79.60% 84.10% 79.60% 84.50% 80.40% 85.00% 80.80% 85.10% 80.40%

Fig. 11. Training Results with Various Generations (based on 5-fold average) Fig. 12. Testing Results with Various Generations (based on 5-fold average)

Fig. 13. The Learning Progress over Generations (based on 5-fold average)

5. Discussion According to the results indicated above ACBGA achieves superior learning

performance than SGA and AGA in terms of computation efficiency and accuracy for both the car evaluation and electronic ballast data sets. By employing apriori algorithm the partial knowledge is extracted and transformed as seeding chromosomes. The differences can be seen that the initial training accuracy is greatly improved from 50% to 67% (shown in Fig. 9) for car evaluation data; 21% to 54% (shown in Fig. 13) for electronic ballast data.

Fig. 14. The Learning Progress over Time (based on 5-fold average) As displayed in Fig. 9 ACBGA approach converges around 300 generations, while

AGA and SGA are still progressing at 500 generations with inferior learning accuracy for the car evaluation data. This situation is more salient for the electronic ballast data. As depicted in Fig. 10 & 14, the testing performances for both data sets exhibit

65

similar patterns of performances as in the training stages. For the ACBGA approach, the derived partial knowledge is not only encoded as

seeding chromosomes, but also converted into the constraint network. As shown in those figures displaying learning progresses, ACBGA outperforms AGA less significantly than outperforms SGA. Nevertheless, the improvement of ACBGA over AGA positively demonstrates its learning effectiveness for both the applications data.

6. Conclusions

Aircraft maintenance is now recognized as one of the most important airlines activities for improving flight safety as well as obtaining worldwide competition strengths. Sharing repair experiences with the- state-of-arts computer technology is helpful to improve the productivity of aircraft maintenance. This study has proposed ACBGA approach to aid aircraft electronic ballast maintenance. The ACGBA approach can be employed by maintenance mechanics in the accessory shop to assist them to obtain knowledge, skills and experience required for effective electronic ballast repair and maintenance. In addition to the electronic ballast, there are millions of other components embedded in the aircraft system. Most of them are in need of a short repair time in order to reduce other opportunity costs. This is because that an inefficient aircraft maintenance services will lead to flight delays, cancellations or even flight accidents.

The ACBGA approach hybridizes constraint-based reasoning within a genetic algorithm for classification tree induction. Incorporating the partial knowledge or user-control information into mining process is not straightforward and, typically, requires the design of novel approaches. By employing the rule association algorithm to acquire partial knowledge from data in advance, our proposed approach is able to induce a classification tree by pushing the partial knowledge into chromosome construction. Most importantly, the adoption of constraint-based reasoning mechanism into the GA process can filter invalid chromosomes; therefore feasible solutions can be more efficiently derived.

Comparing with SGA and AGA, ACBGA achieves higher predictive accuracy and less computation time required in constructing a classification tree for electronic ballasts troubleshooting. In addition, the classification trees discovered by ACBGA not only obtain higher predictive accuracy and computation efficiency, but also may produce more user transparent or significant knowledge.

The proposed ACBGA is generic and problem independent. It is not required to design any proprietary genetic operator and chromosome representation to interact with constraint-based reasoning. Besides to integrating with apriori algorithm for knowledge preprocessing, ACBGA approach provides a potential way to allow

66

domain experts to provide professional knowledge or constraints to facilitate revealing further crucial or interesting knowledge. This approach is not only applicable for binary classification problems, but also applicable for multi-category classification problems.

Currently ACBGA approach is able to determine tree splitting nodes that may allowed complex rule sets-like discriminating formats such as “Attributei <= w ∗ Attributej” relationship. This format can be extended to express more complicated multivariate inequations with either a linear or nonlinear format in the future. Other, the GA parameter settings such as the maximum number of tree node allowed, the minimum (or maximum) number of data records in a tree node, the granularity of SupportDiscount are those aspects that can be further investigated in terms of their impacts on the learning time required and corresponding learning effectiveness.

Intelligent Aircraft Maintenance Support System Using Genetic Algorithms and Case-Based Reasoning

Abstract The maintenance of aircraft components is crucial for avoiding aircraft accidents

and aviation fatalities. To provide reliable and effective maintenance support, it is important for the airline companies to utilize previous repair experiences with the aid of advanced decision support technology. Case-Based Reasoning (CBR) is a machine learning method that adapts previous similar cases to solve current problems. For effective retrieving similar aircraft maintenance cases, this research proposes a CBR system to aid electronic ballast fault diagnoses of Boeing 747-400 airplanes. By employing the genetic algorithm (GA) to enhance the dynamic weighting as well as the design of non-similarity functions, the proposed CBR system is able to achieve superior learning performance than those with either equal/varied weights or linear similarity functions.

Keywords: Aircraft Maintenance, Electronic Ballast, Case-Based Reasoning, Genetic

Algorithms 1. INTRODUCTION

Airplanes in operation throughout the world call for appropriate maintenance to assure flight safety and quality. When an aircraft component faults emerge, actions for fault diagnosis and troubleshooting must be executed promptly and effectively. An airplane consists of many electronic components among which the electronic ballast is one common component in controlling the cabin fluorescent lamps. The electronic ballast plays an important role in providing proper lights for passengers and flight crews during a flight. Unstable cabin lighting, such as flash and ON/OFF problems, is a common problem occurred in airplanes. An airplane usually has hundreds of

67

electronic ballasts mounted in panels such as the light deflector of a fluorescent lamp fixture. When an electronic ballast is abnormal, it has to be removed and sent to the accessory shop for further investigation.

The maintenance records of electronic ballasts generally contain information about the number of defective units found, the procedures taken, and the inspection or repair status. Basically these records were stored and used for assisting mechanics in identifying faults and determining the components as repair or replacement was necessary. This is because previous similar solutions may provide valuable troubleshooting clues for new faults.

Similar to analogy, CBR is machine learning method that adapts previous similar cases to solve current problems. CBR shows significant promise for improving the effectiveness of complex and unstructured decision making. It is a problem-solving technique that is similar to the decision making process used in many real world applications. This study considers CBR an appropriate approach to aid aircraft mechanics in dealing with the electronic ballast maintenance problem. Basically CBR systems make inferences using analogy to obtain similar experiences for solving problems. Similarity measurements between pairs of features play a central role in CBR (Kolodner, 1992). However the design of an appropriate case-matching process in the retrieval step is still challenging. For the effective retrieval of previous similar cases, this research develops a CBR system with GA mechanisms used to enhance the dynamic feature weighting as well as the design of non-similarity functions. GA is an optimization technique inspired by biological evolution (Holland, 1975). Based upon the natural evolution concept, GA works by breeding a population of new answers from the old ones using a methodology based on survival of the fittest. In this research GA is used to determine not only the fittest non-linear similarity functions, but also the optimal feature weights.

By using GA mechanisms to enhance the case retrieval process, a CBR system is developed to aid electronic ballast fault diagnoses of Boeing 747-400 airplanes. Three hundred electric ballasts maintenance records of Boeing 747-400 airplanes were gathered from the accessory shop of one major airline in Taiwan. The results demonstrated that the approach with non-linear similarity functions and dynamic weights indicated better learning performance than other approaches with either linear similarity functions or equal/varied weights.

2. LITERATURE REVIEW

2.1 Case-Based Reasoning CBR is a relatively new method in artificial intelligence (AI). It is a general

problem-solving method that takes advantage of the knowledge gained from

68

experiences and attempts to adapt previous similar solutions for solving a particular current problem. As shown in Figure 1, CBR can be conceptually described by a CBR-cycle that composes of several activities (Dhar & Strin, 1997). These activities include (A) retrieving similar cases from the case base, (B) matching the input and retrieved cases, (C) adapting solutions suggested by retrieved similar cases to better fit the new problem; and (D) retaining the new solution once it has been confirmed or validated.

A CBR system gains an understanding of the problem by collecting and analyzing case feature values. In a CBR system, the retrieval of similar cases relies on a similarity metric which is used to compute the distance between pairs of case features. Generally, the performance of the similarity metric and the feature weights are keys to the CBR (Kim & Shin, 2000). A CBR system could be ineffective in retrieving similar cases if the case-matching mechanism is not appropriately designed.

For an aircraft maintenance problem, CBR is a potential approach in retrieving similar cases for diagnosing faults as well as providing appropriate repair solutions. Several researches applied CBR to solve different airlines industry problems. Richard (1997) developed CBR diagnostic software for aircraft maintenance. Magaldi (1994) proposed applying CBR to aircraft troubleshooting on the flight line. Other CBR applications included flight condition monitoring and fault diagnosis for aircraft engine (Vingerhoeds et al., 1995), service parts diagnosis for improving service productivity(Hiromitsu et al., 1994), and data mining for predicting aircraft component replacement (Sylvain et al., 1999).

Figure 1. A CBR Cycle D

Retrieved Cases Input Cases

B

A

CMatching and Retrieval

Case Base

Most of these CBR systems applied n-dimension vector space to measure the similarity distance between input and retrieved cases. For example, Sylvain et al. (1999) adopted the nearest neighborhood method. However, seldom researches attempted to employ dynamic weighting with non-linear similarity functions to develop fault diagnosis models for aircraft maintenances. 2.2 Genetic Algorithms for Feature Weighting

In general, feature weights can be used to denote the relevance of case features to a particular problem. Wettschereck et al. (1997) made an empirical evaluation of feature-weighting methods and summarized that feature-weighting methods have a substantially higher learning rate than un-weighted k-nearest neighbor methods. Kohavi et al. (1995) observed that feature weighting methods have superior

69

performance as compared to feature selection methods. When some features are irrelevant to the prediction task, Langley and Iba (1993) pointed out that appropriate feature weights can substantially increase the learning rate.

Several researches applied GA to determine the most suitable feature weights. GA is a technique of modeling the genetic evolution and natural selection processes. A GA procedure usually consists of chromosomes in a population, a ‘fitness’ evaluation function, and three basic genetic operators of ‘reproduction’, ‘crossover’ and ‘mutations’. Initially, chromosomes in the form of binary strings are generated randomly as candidate solutions to the addressed problem. A fitness value associated with each chromosome is subsequently computed through the fitness function representing the goodness of the candidate solution. Chromosomes with higher fitness values are selected to generate better offspring for the new population through genetic operators. Conceptually, the unfit are eliminated and the fit will survive to contribute genetic material to the subsequent generations.

Wilson and Martinez (1996) proposed a GA-based weighting approach which had better performance than the un-weighted k-nearest neighbor method. For large-scale feature selection, Siedlecki and Sklansky (1989) introduced 0-1 weighting process based on GAs. Kelly and Davis (1991) proposed a GA-based weighted K-NN approach (GA-WK-NN) which had a lower error rates than the standard K-NN one. Brill et al. (1992) demonstrated fast features selection using GAs for neural network classifiers.

Though the above research works used GA mechanisms to determine the feature weights for the case retrieval, seldom had a study applied the GA to simultaneously determine features weights as well as the corresponding similarity functions in a non-linear way. This paper attempts to apply GA mechanisms to determine both the optimal feature weights and the most appropriate non-linear similarity functions for case features. A CBR system is developed to diagnose the faulty accessories of electronic ballasts for the Boeing 747-400 airplanes.

3. METHODOLOGY 3.1 Linear Similarity

From the case base, a CBR system retrieves an old case that is similar to the input case. As shown in Figure 2, the retrieval process is based on comparing the similarities for all feature values between the retrieved case and the input case, where fi

I and fiR are the values of feature i in the input and retrieved case, respectively. There

are many evaluation functions for measuring the degree of similarity. One numerical function using the standard Euclidean distance metric is shown in the following formula (Eq.1), where Wi is the ith feature weight. The feature weights are usually

70

statically assigned to a set of prior known fixed values or all set equal to 1 if no arbitrary priorities determined.

Figure 2. Feature Values

Features

Retrieved Case f1R f2

R f3R … fi

R … fnR

Input Case f1I f2

I f3I … fi

I … fnI

( )21

Ri

Ii

n

ii ffW −×∑

=

(Eq.1)

3.2 Non-Linear Similarity

Based on the formula (Eq.1), this study proposed a non-linear similarity approach. The difference between the linear similarity and non-linear similarity is the distance function definition. For a non-linear similarity approach (fi

I－fiR)2 is replaced

by the distance measurement [(fiI－fi

R)2]k as shown in formula (Eq.2).

( )[ ] kR

iI

i

n

ii ffW 2

1

−×∑=

(Eq.2)

Where k is the exponent of the standard Euclidean distance function for the corresponding input and retrieved feature values. A GA mechanism is proposed to compute the optimal k value for each case feature. The range of exponent k is scaled from 1/2, 1/3, 1/4, 1/5, 1, 2, 3, 4 and 5. Figure 3 depicts an example equation y=xk; where x ∈ [0, 1] with various combinations of k.

0

0.1

0.2

0.3

0.4

5

0.6

0.7

0.8

0.9

1

00.0

50.1

10.1

70.2

30.2

90.3

50.4

10.4

70.5

30.5

90.6

50.7

10.7

70.8

30.8

90.9

51.0

0

X

5

4

3

2

1

1/5

1/4

1/3

1/2

0.Y

71

Figure 3. The Illustration for Linear and Non-Linear Functions

3.3 Static Feature Weighting In addition to the linear or non-linear type of similarity function, feature weights

Wi can also influence the distance metric. Feature weighting can be either static or dynamic. The static weighting approach assigns fixed feature weights for all case features throughout the entire retrieval process. For static feature weighting, each feature’s weight can be either identical or varied. The feature weights are usually statically assigned to a set of prior known fixed values or equal to 1 if no arbitrary priorities determined. For varied feature weighting, this study proposed another GA mechanism to determine the most appropriate weight for each feature. 3.4 Dynamic Feature Weighting

For the dynamic weighting approach, feature weights are determined according to the context of each input case. As shown in Figure 4, for a given input case, there are m retrieved cases in the case base, where i = 1 to n, n is the total number of features in a case, j = 1 to m, m is the total number of retrieved cases in a case base. fij

R is the ith feature value of the retrieved casej, and fiI is the of the ith feature value of

the input case. OjR is the

Figure 4. The Denotation of Features and Outcome feature Values fim

f1mR

outcome feature value of the jth retrieved case and OI is the outcome feature value of the input case.

fi f1

I

fi1R

fi2R

fijR

I

Features Outcome Feature

R…

f11R

f12R

f1jR

…

…

…

…

…

…

Retrieved

Cases

Input Case

O1R

O2R

OjR

OmR

OI

…

…

fn1R

fn2R

fnjR

fnmR

fnI

…

…

…

…

…

Assume that the outcome feature value is categorical data with p categories. For those features of categorical values, their weights are computed using the formula (Eq.3).

=

i

iti E

LMaxW (Eq.3)

where i=1 to n, n is the number of case features in a case; t=1 to p, p is the number of categories for the outcome feature. Ei is the number of retrieved cases of which fij

R is equal to fi

I. Lit is the number of retrieved cases of which fijR is equal to fi

I and OjR is the

72

tth categories. For those features of continuous values, their weights are not generated in the same way as described above unless the feature values are discretized in advance. Though there may exist various ways for discritization, this study proposed another GA mechanism to discretize the continuous feature values. For the ith feature, a GA procedure is used to compute the optimal value, say Ai, to form a range centered on fi

I. Let Ki denote the number of cases whose fij

R is between (fiI－Ai) and (fi

I＋Ai). Thus, Ei is replaced by Ki in the formula (Eq.3). Feature weights are computed as shown in the formula (Eq.4).

=

i

iti K

LMaxW (Eq.4)

Based on formulas (Eq.3) and (Eq.4), each input case has a corresponding set of feature weights in this dynamic weighting approach. 3.5 The Experiment Design

Since both the feature weights and similarity measurements between pairs of features play a vital role in case retrieval, this research investigated the CBR performance by observing the effects resulting from the combinations of different feature weighting approaches and similarity functions.

As indicated in Figure 5, there are six approaches that combine different types of similarity functions and feature weighting methods. These are the Linear Similarity Function with Equal Weights (Approach A); Linear Similarity Function with Varied Weights (Approach B); Non-Linear Similarity Function with Equal Weights (Approach C); Non-Linear Similarity Function with Varied Weights (Approach D); Linear Similarity Function with Dynamic Weights (Approach E); and Non-Linear Similarity Function with Dynamic Weights (Approach F).

The differences between the three feature weighting approaches are described as follows. For the equal weights approach, feature weights are all set equal to 1. For the varied weights approach, there is only one set of feature weights determined by a proposed GA procedure. For the dynamic weights approach, there is a corresponding set of feature weights for each input case. That is, there are sets of feature weights dynamically determined according to the input case.

Figure 5. The Combinations of Similarity Functions and Feature Weighting Methods (F) (E) (D) (C) (B)

Feature Weighting Dynamic Weights

Varied Weights

Equal Weights

Non-Linear Similarity

Linear Similarity

Similarity Function

(A)

73

4. THE EXPERIMENT AND RESULTS 4.1 Case Description

The aircraft electronic ballasts used to drive fluorescent lamps can be mounted on a panel such as the light deflector of a fluorescent lamp fixture. The fluorescent lamps initially require a high voltage to strike the lamp arc and maintain a constant current followingly. Usually there is a connector at one end of the unit for the routing of all switching and power connections. As shown in Figure 6, the electronic ballast operates from control lines of 115-vac/400Hz aircraft power. When the operation power is supplied, the electronic ballast will start and operate two rapid start fluorescent lamps or single lamp in the passenger cabin of various commercial aircrafts, such as Boeing 747-400, 737-300, 737-400, 747-500 and etc. There are two control lines connecting the ballast set and control panel for ON/OFF and BRIGHT/DIM modes among which DIM mode is used at night when the cabin personnel attempts to decrease the level of ambient light in the cabin.

Fluorescent Lamp

Fluorescent Lamp

Fluorescent Lamp

ElectronicBallast

ElectronicBallast

……Lamp Set Ballast Set

Control Lines

Control Panel

…

Control Lines

Figure 5.The Operational Setup for Electronic Ballast

Three hundred electric ballast maintenance records of Boeing 747-400 from the accessory shop of one major airline company in Taiwan were used to construct the trouble-shooting system. Each maintenance case contains seven features identified as highly related to abnormal electric ballast operations. In Table 1, these features are either continuous or categorical. The outcome feature is the categories of the replaced parts set. For instance, category C1 denotes the replaced parts of a transformer (illustrated as T101 on a printed circuit board) and a capacitor (illustrated as C307 on a printed circuit board). Category C2 denotes the replaced parts of an integrated circuit (illustrated as U300 on a printed circuit board), a transistor (illustrated as Q301 on a printed circuit board) and a fuse (illustrated as F401 on a printed circuit board). Each category in the outcome feature represents a different set of replaced parts. 4.2 The GA Implementation

According to the experiment design, this study implements three GA procedures to determine (1) the optimal exponent k in the non-linear similarity functions; (2) the

74

most appropriate set of varied weights for static feature weighting; and (3) sets of feature weights for dynamic feature weighting. Several steps are required in developing a GA computer program. These steps include chromosome encoding, fitness function

Table 1. The Case Description Data Type Range

Input Features Alternating Current on Bright Mode When Electronic Ballast Turns On Alternating Current on DIM Mode When Electronic Ballast Turns On Alternating Current on Bright Mode When Electronic Ballast Turns Off Alternating Current on DIM Mode When Electronic Ballast Turns Off Is Light Unstable When Electronic Ballast Turns On Is It Not Illuminated When Electronic Ballast Turns On

Continuous Continuous Continuous Continuous Categorical Categorical

0 to 2 (amp) 0 to 2 (amp) 0 to 2 (amp) 0 to 2 (amp)

0 and 1 0 and 1

Outcome Feature Components Replacement

Categorical

C1, C2, … ,C10

specification, and internal control parameter specification. The details of each step according to the order of three GA applications are described as follows. Non-Linear Similarity

Chromosomes are designed for encoding the exponent k in the non-linear similarity functions. Because there are six features in a case, a chromosome was composed of six genes to encode the exponents in the six corresponding non-linear functions. Each chromosome is assigned a fitness value based on the formula (Eq.5). The population size was set to 50; population selection method is based on Roulette Wheel; the probability of mutation was 0.06, and the probability of crossover was 0.5. Crossover method is based on uniform; the entire learning process stopped after 10,000 generations.

Minimize

=∑=

q

Cfitness

q

jj

1 (Eq.5)

Where j=1 to q, q is the number of training cases. Cj is set to 1 if the expected outcome feature is equal to the real outcome feature for the jth training case. Otherwise, Cj is set to 0. Varied Weights

Chromosomes are designed for encoding a set of feature weights whose values are ranged between [0..1]. The fitness function is defined as indicated in the formula

75

(Eq.5), too. As to the GA parameters, the population size was set to 50, the probability of mutation was 0.06, and the probability of crossover was 0.5. The entire learning process stopped after 10,000 generations. Dynamic Weights

Chromosomes are designed for encoding values Ai to form a range centered on fiI

for features that are continuous data. Fitness value is also calculated by the formula (Eq.5) for each chromosome in the population. As for the GA parameters, mutation rate was 0.009, and the other settings were the same as the ones used for varied weights. 4.3 The Results

The case base is divided into two data sets for training and testing with the ratio of 2:1. That is, 200 maintenance cases of Boeing 747-400 aircraft electric ballast are for training and the remaining 100 cases are for testing. The results are illustrated in Table 2. All approaches are evaluated with 3-fold cross validation. The result of approach (F) with non-linear similarity functions and dynamic weighs is the best where the Mean Errors (ME) is equal to 0.193 for training and 0.180 for testing.

Table 2. The Mean Errors of Different Approaches Mean Error

Approach training testing

(A) Linear Similarity Function with Equal Weights 0.240 0.223 (B) Linear Similarity Function with Varied Weights 0.213 0.220 (C) Non-Linear Similarity Function with Equal Weights 0.207 0.210 (D) Non-Linear Similarity Function with Varied Weights 0.200 0.203 (E) Linear Similarity Function with Dynamic Weights 0.233 0.230 (F) Non-Linear Similarity Function with Dynamic Weights 0.193 0.180

To further investigate the results, the approach (A) with linear similarity function and equal weights has an inferior training result. There is no obvious difference for the testing results of the approaches (A), (B), and (E) all of which adopt linear similarity functions. However, among those approaches adopting non-linear similarity functions, it seems that approach (F) with dynamic weights has a superior result than both the approach (D) with varied weights and approach (C) with equal weights. It can be inferred that both non-linear similarity functions and the dynamic weighting process are crucial for a CBR system to effectively retrieve previous associated cases.

5. CONCLUSIONS An inefficient aircraft maintenance service may lead to flight delays,

76

77

cancellations or even accidents. Aircraft maintenance is therefore one of the most important activities for the airlines to improve flight safety as well as obtain worldwide competitive strength. To improve the maintenance productivity, this research developed a CBR system with GA mechanisms to enhance the retrieval of similar aircraft electronic ballast maintenance cases. Three GA procedures are proposed to determine the optimal non-similar similarity functions, varied and dynamic feature weights, respectively. The experimental results demonstrated that the approach adopting both non-linear similarity functions and dynamic weights achieves the best performance than approaches with either linear similarity functions or equal/varied weights.

In addition to the electronic ballast, there are numerous components embedded in an aircraft system. The proposed method could also be employed for a shorter repair time and a lower maintenance costs. Besides, aircraft preventative maintenance is also an important issue. In the future, it may be possible to embed such a trouble-shooting component into the aircraft preventive maintenance system based on the history data in flight data recorders (FDR) to help ensuring a safer and comfortable flight.

Documents

A Constraint-Based Genetic Algorithm Approach for Rule ...w3.et.uch.edu.tw/hsupl/upload/Pei-1.pdf · Rule induction is one of the most common forms of knowledge discovery. It is a