12
 Department of Computer Science and Engineering, National Institute of Technology, Warangal 506004 Learning Bayesian Classifiers Using Differential Evolution algorithm for Variable Ordering Project Guide: Dr. S. G. Sanjeevi (Head of the Department)  Associate Professor 12/31/2011

Learning Bayesian Classifiers

Embed Size (px)

Citation preview

Page 1: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 1/12

Department of Computer Science and Engineering, National Institute of Technology, Warangal – 506004

Learning BayesianClassifiersUsing Differential Evolution algorithm for Variable Ordering

Project Guide: Dr. S. G. Sanjeevi (Head of the Department) – Associate Professor12/31/2011

Page 2: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 2/12

P a g e | 1

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

CONTENTS 1. Introduction -- Page 2

a. Bayesian Network -- Page 2 b. K2 Algorithm -- Page 2 c. Learning Variable Ordering (VO) -- Page 3

2. Previous Experiments -- Page 5 a. Evolutionary Algorithms (EAs) -- Page 5 b. VOGA (Variable Ordering Genetic Algorithm) -- Page 5

i. What is VOGA? -- Page 5 ii. How it is implemented? -- Page 5 iii. Experiment -- Page 7

3. Scope -- Page 9 a. Differential Evolution -- Page 9

i. Algorithm -- Page 9 4. Conclusion -- Page 10 5. References -- Page 10

Page 3: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 3/12

P a g e | 2

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

INTRODUCTION

Bayesian Network:

A Bayesian Network (G) has a directed acyclic graph (DAG) structure. Each node in the graph

corresponds to a discrete random variable in the domain. An edge, Y → X, on the graph, describes a

parent and child relation in which Y is the parent and X is the child. All parents of X constitute the parent

set of X which is denoted by ∏ . In addition to the graph, each node has a conditional probability

table (CPT) specifying the probability of each possible state of the node given each possible combination

of states of its parents. If a node contains no parent, the table gives the marginal probabilities of the

node.

In a process of learning BNs from data, the BN variables represent the dataset attributes (or features).

When using algorithms based on heuristic search, the initial order of the dataset attributes may be an

important issue. Some of these algorithms depend on this ordering to determine the arcs direction such

that an earlier attribute (in an ordered list) is a possible parent only of the later ones.

Instead of encoding a joint probability distribution over a set of random variables, a Bayesian Classifier

(BC) aims at correctly predicting the value of a designated discrete class variable given a vector of

attributes (predictors). Learning Bayesian Networks methods may be used to induce BC and it is done in

this work. The BN learning algorithm applied in our experiments is based on the K2 algorithm, which

constructs a BN from data and uses a heuristic search for doing so.

K2 Algorithm:

The K2 algorithm constructs a BN from data using a heuristic search. It receives as input a complete

database and a VO. Considering these assumptions, the K2 algorithm searches for the BN structure that

best represents the database. This algorithm is commonly applied due to its performance in terms of

computational complexity (time) and good results when an adequate VO is supplied.

The attributes preorder assumption is used to reduce the number of possible structures to be learned.

In this sense, K2 uses an ordered list (containing all the attributes including the class), which asserts that

only the attributes positioned before a given attribute A may be parents of A. Hence, the first attribute

in the list has no parent, i.e. it is a root node in the BN.

Page 4: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 4/12

P a g e | 3

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

The algorithm uses a greedy method to search for the best structure. It begins as if every node had no

parent. Then, beginning with the second attribute from the ordered list (the first one is a root node), the

possible parents are tested and those that maximize the whole probability structure are added to the

network. This process is repeated to all attributes in order to get the best possible structure. K2 metric

to test each possible parent set to each variable is defined by the following equation.

∏( ) ∏

Where each attribute has possible values * +. D is a

dataset with m objects. Each attribute has a set of parents ( ), and is the number of

instantiations of ( ). Is the number of objects in D, in which has value and πxi is

instantiated as ( represents the j-th instantiation relative to D of πxi). Finally, =∑ .

With the best structure already defined, the network conditional probabilities are determined. It is done

using a Bayesian estimation of the (predefined) network structure probability.

When dataset D has a distinguished class variable, K2 may be used as a BC learning algorithm. This is

exactly our assumption.

Learning Variable Ordering (VO):Learning a Bayesian Network (BN) from data became an effervescent research topic in the last decade.

The search space for a BN with n variables has an exponential dimension. Therefore, finding the BN

structure that better represents the dependences among the variables is not a trivial task. This is a NP –

Complete problem, thus it is hard to identify the best solution for all the application problems. Trying to

reduce the search space of this process, some restrictions are usually imposed and often the algorithms

obtain good results with acceptable computational effort. A very common restriction when learning a

BN is the definition of a previous variables ordering (VO). The same situation happens when trying to

learn a Bayesian Classifier (BC) from data. We present a genetic algorithm namely VOGA (Variable

Ordering Genetic Algorithm) for the optimization of the learning BC from data process by means of the

identification of a suitable VO. In general, genetic algorithms are capable to identify and explore aspects

of the environment where the problem is inserted and to converge globally to excellent solutions, or

approximately excellent. Therefore, genetic algorithms are considered an efficient search and

Page 5: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 5/12

P a g e | 4

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

optimization tool for most different types of problems. Several works propose hybrid GA/Bayes methods

using a GA to define an adequate VO:

• Presented a genetic algorithm to search for the best variable ordering. Each element of the

population is a possible ordering and their fitness function is the K2 metric.

• Implemented a GA for the problem of permutation of variables in BN learning and inference.

• Considers a subgroup of the set of dependence /independence relations to get the variables

ordering. This process is guided by genetic algorithms and simulated annealing.

Even having a number of works dealing with this issue; most of them are defined to learn unrestricted

BN. Our GA/Bayes hybrid approach (VOGA), on the other hand, is devoted to learn Bayesian Classifiers

from data. In this sense, the class variable may play an interesting role in the variable ordering

definition.

Page 6: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 6/12

P a g e | 5

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

PREVIOUS EXPERIMENTS Genetic algorithms like VOGA and VOGA+ have been used for optimizing the learning of BC from data

process by means of the identification of a suitable VO. In these genetic algorithms each element of thepopulation is a possible ordering and their fitness is the K2 metric (g value). Evolutionary algorithms with

canonical crossover and mutation have also been used to find an appropriate VO.

Evolutionary Algorithms (EAs):

EAs are computational models that solve a given problem by maintaining a changing population of

chromosomes, each with its own level of ‘fitness’. A fitness function is used to measure the quality of

each chromosome. Genetic algorithms are most popular models of EAs. Differential Evolution

algorithms are also a class of Evolutionary Algorithms.

VOGA (Variable Ordering Genetic Algorithm):

What is VOGA?

The main idea in the proposed method is to use a GA and the class variable information to optimize the

variable ordering (VO) which will be used as an input to learn a BC from data. In this sense, we fix the

class variable as the first one in the VO. Subsequently, the GA is used trying to find the best ordering for

the remaining variables. Our method uses a GA in which the chromosomes represent possible variables

ordering. The variables identification (ID) is codified as an integer number. Therefore, each chromosome

has (n – 1) genes, where n is the number of variables (including the class variable) and each gene is

instanced with a variable ID. Thus, each possible ordering may form a chromosome. The fitness function

is given by the Bayesian score (g function) defined in K2 algorithm.

How it is implemented?

VOGA generates a random initial population. Each chromosome is evaluated by the K2 algorithm whose

function g is used as fitness function. The best chromosomes are selected, and using crossover and

mutation operators the next generation is generated. The process is repeated and for each generation

the best ordering is stored. If there is no improvement after 10 generations, the algorithm locks up and

returns the best found ordering. The flowchart summarizes the process all.

Page 7: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 7/12

P a g e | 6

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

Flow Chart

In addition to the aforementioned VOGA algorithm, it was implemented as a slightly different version,

namely VOGA+, in which the initial population is not randomly generated. In VOGA+, more information

about the class variable is used trying to optimize the initial population and, therefore, trying to obtain

better BC structures (mainly in domains having many attributes).

In order to define the VO of the initial population chromosomes, the χ2 (chi -squared) statistical test is

performed using each variable jointly with the class variable (for this reason, VOGA+ can only be applied

in a classification context, where there is a distinguished variable, namely class variable). Thus, the

strength of the dependence relationship between each variable and the class can be measured.

Subsequently, the variables are decreasingly ordered according to their χ2 scores. The first variab le in

the ordered list has the highest χ2 score, i.e. it is the most dependent upon the class. Obviously, the

relation between the χ2 statistical test and the best VO may not hold strictly, but the work, show that

good results can be achieved using this heuristic.

Having defined the VO given by χ2 statistical test, all initial population chromosomes are defined using

this VO (all chromosomes are identical).

Start

Read data

Initial o ulation Generation

ChromosomesEvaluation

Selection

Crossover andMutation

ChromosomesEvaluation

Returns thebest VO

Sto ?

End

Page 8: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 8/12

P a g e | 7

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

Experiment:

Seven domains were used in our simulations. Two well-known Bayesian Network domain (Engine Fuel

System and Asia) and five benchmark problems from the U. C. Irvine repository1 were used in the VO

and classification task, namely, Balance, Breast – w, Congressional Voting Records (Voting), Vehicle and

Iris. The following table summarizes the data set features.

Asia Balance Breast – w Engine Iris Vehicle Voting

AT 8 5 10 9 5 19 17

IN 15000 625 683 15000 150 846 232

CL 2 3 2 2 3 4 2

Datasets Description with dataset name (Data), number of attributes plus class (AT), number of instances (IN) and number of classes (CL).

The experiments were conducted following the steps below.

1. Initially, the datasets had been used as input to the K2 algorithm. The VO was the original one

given in the file. The Bayesian score (g) obtained to each dataset was stored.

2. The same datasets used in step 1 had been used as input to VOGA and VOGA+. The Bayesian

score (g) obtained to each dataset and the number of generations necessary to reach the

solution were stored.

Results achieved in steps 1 and 2 are presented in the following tables respectively.

Asia Balance Breast – w Engine Iris Vehicle Voting

K2 -33610 -4457 -8159 -33809 -2026 -10357 -1749

VOGA -33610 -4457 -8159 -33755 -2026 -10006 -1727

VOGA+ -33608 -4457 -8159 -33755 -2026 -9956 -1724Bayesian Score (g function) of each achieved Bayesian Network Structure. The best results in each dataset are in bold face.

Analyzing results presented in the above Table, it is possible to infer that, as far as the Bayesian score (g

function) is concerned, in all performed experiments, VOGA produced results at least as good as the

Page 9: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 9/12

P a g e | 8

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

ones produced by K2 and in 3 out of the 7 datasets VOGA improved the results obtained using K2. In

addition, VOGA+ performed at least as well as VOGA and in 3 out of the 7 datasets VOGA+ improved the

results obtained using VOGA.

Another interesting issue revealed in Table 2 is that datasets having higher number of attributes, namelyVehicle (19 attributes) and Voting (17 attributes) favored the proposed method (VOGA), mainly when

using the enhanced version VOGA+.

Asia Balance Breast – w Engine Iris Vehicle Voting

VOGA 11 11 11 13 11 11 38

VOGA+ 19 11 11 12 11 15 6

Number of generations needed until convergence.

When the number of generations is concerned, in 4 (Balance, Breast-w, Engine and Iris) out of the 7

datasets VOGA and VOGA+ presented (mostly) the same results. The other 3 datasets (Asia, Vehicle and

Voting) revealed that, when the number of generations was not the same for VOGA and VOGA+, the

Bayesian score obtained by the later one was always better.

Page 10: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 10/12

P a g e | 9

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

SCOPE Scope: Replacing Genetic Algorithm with Differential Evolution algorithm for better convergence and

for a better Variable Ordering (possibly).

Differential Evolution:A basic variant of the DE algorithm works by having a population of candidate solutions (called agents).

These agents are moved around in the search-space by using simple mathematical formulae to combine

the positions of existing agents from the population. If the new position of an agent is an improvement

it is accepted and forms part of the population, otherwise the new position is simply discarded.

Algorithm:Differential Evolution Algorithm:

• Let designate candidate solution in the population. The basic DE algorithm can then be

described as follows:

• Initialize all agents with random positions in the search-space.

• Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness

reached), repeat the following:

• For each agent in the population do:

• Pick three agents , and from the population at random, they must be

distinct from each other as well as from agent

• Pick a random index * +(n being the dimensionality of the problem to

be optimized).

• Compute the agent's potentially new position * +as follows:

• Pick a uniformly distributed number

• If r i < CR or i = R then set y i = a i + F (b i − ci ) otherwise set y i = x i

Page 11: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 11/12

P a g e | 10

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

• If then replace the agent in the population with the improved

candidate solution, that is, replace with in the population.

• Pick the agent from the population that has the highest fitness or lowest cost and return it as

the best found candidate solution.

Note that , -is called the differential weight and , -is called the crossover probability ,

both these parameters are selectable by the practitioner along with the population size

CONCLUSION

Experiments for the usage of differential evolution to find a suitable variable ordering (possibly) and to

extend the results for Bayesian networks.

Page 12: Learning Bayesian Classifiers

8/2/2019 Learning Bayesian Classifiers

http://slidepdf.com/reader/full/learning-bayesian-classifiers 12/12

P a g e | 11

Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773

REFERENCES

SANTOS, E. B.; HRUSCHKA JR., EBECKEN,Evolutionary Algorithm using Random Multi-point Crossover

Operator for Learning Bayesian Network Structures , In 9th INTERNATIOAL CONFERENCE ON MACHINE

LEARNING AND APPLICATIONS, 2010.

SANTOS, E. B.; HRUSCHKA JR., ER.VOGA: Variable ordering genetic algorithm for learning Bayesian

classifiers . In: 6TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS- HIS2006, 2006,

Auckland.