Upload
phamkien
View
225
Download
1
Embed Size (px)
Citation preview
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Bayesian Classifiers
Probabilistic Graphical Models
L. Enrique Sucar, INAOE
(INAOE) 1 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Outline
1 Introduction
2 Bayesian Classifiers
3 Naive Bayes Classifier
4 TAN and BAN
5 Semi-Naive Bayesian Classifiers
6 Multidimen. Bayesian ClassifiersBayesian Chain Classifiers
7 Hierarchical Classification
8 Applications
9 References
(INAOE) 2 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Introduction
Classification
Classification consists in assigning classes or labels toobjects. There are two basic types of classificationproblems:Unsupervised: in this case the classes are unknown, so the
problem consists in dividing a set of objects inton groups or clusters, so that a class is assignedto each different group. It is also known asclustering.
Supervised: the possible classes or labels are known apriori, and the problem consists in finding afunction or rule that assigns each object to oneof the classes.
(INAOE) 3 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Introduction
Probabilistic Classification
• Supervised classification consists in assigning to aparticular object described by its attributes,A1,A2, ...,An, one of m classes, C = {c1, c2, ..., cm},such that the probability of the class given the attributesis maximized:
ArgC [MaxP(C | A1,A2, ...,An)] (1)
• If we denote the set of attributes as A = {A1,A2, ...,An}:ArgC [MaxP(C | A)]
(INAOE) 4 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Introduction
Classifier Evaluation
Accuracy: it refers to how well a classifier predicts thecorrect class for unseen examples (that is,those not considered for learning the classifier).
Classification time: how long it takes the classificationprocess to predict the class, once the classifierhas been trained.
Training time: how much time is required to learn theclassifier from data.
Memory requirements: how much space in terms ofmemory is required to store the classifierparameters.
Clarity: if the classifier is easily understood by a person.
(INAOE) 5 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Introduction
Class Imbalance
• In general we want to maximize the classificationaccuracy; however, this is only optimal if the cost of awrong classification is the same for all the classes
• When there is imbalance in the costs ofmisclassification, we must then minimize the expectedcost (EC). For two classes, this is given by:
EC = FN × P(−)C(− | +) + FP × P(+)C(+ | −) (2)
Where: FN is the false negative rate, FP is the falsepositive rate, P(+) is the probability of positive, P(−) isthe probability of negative, C(− | +) is the cost ofclassifying a positive as negative, and C(+ | −) is thecost of classifying a negative as positive
(INAOE) 6 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Bayesian Classifiers
Bayes Classifier (I)
• Applying Bayes rule:
P(C | A1,A2, ...,An) =P(C)P(A1,A2, ...,An | C)
P(A1,A2, ...,An)(3)
• Which can be written more compactly as:
P(C | A) = P(C)P(A | C)/P(A) (4)
(INAOE) 7 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Bayesian Classifiers
Bayes Classifier (II)
• The classification problem can be formulated as:
ArgC [Max [P(C | A) = P(C)P(A | C)/P(A)]] (5)
• Equivalently:• ArgC [Max [P(C)P(A | C)]]• ArgC [Max [log(P(C)P(A | C))]]• ArgC [Max [(logP(C) + logP(A | C)]]
Note that the probability of the attributes, P(A), doesnot vary with respect to the class, so it can beconsidered as a constant for the maximization.
(INAOE) 8 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Bayesian Classifiers
Complexity
• The direct application of the Bayes rule results in acomputationally expensive problem
• The number of parameters in the likelihood term,P(A1,A2, ...,An | C), increases exponentially with thenumber of attributes
• An alternative is to consider some independenceproperties as in graphical models, in particular that allattributes are independent given the class, resulting inthe Naive Bayesian Classifier
(INAOE) 9 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Naive Bayes
• The naive or simple Bayesian classifier (NBC) is basedon the assumption that all the attributes areindependent given the class variable:
P(C | A1,A2, ...,An) =P(C)P(A1 | C)P(A2 | C)...P(An | C)
P(A)(6)
where P(A) can be considered, as mentioned before, anormalization constant.
(INAOE) 10 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Complexity
• The naive Bayes formulation drastically reduces thecomplexity of the Bayesian classifier, as in this case weonly require the prior probability (one dimensionalvector) of the class, and the n conditional probabilitiesof each attribute given the class (two dimensionalmatrices)
• The space requirement is reduced from exponential tolinear in the number of attributes
• The calculation of the posterior is greatly simplified, asto estimate it (unnormalized) only n multiplications arerequired
(INAOE) 11 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Graphical Model
(INAOE) 12 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Parameter Learning
• The probabilities can be estimated from data using, forinstance, maximum likelihood estimation
• The prior probabilities of the class variable, C, are givenby:
P(ci) ∼ Ni/N (7)
• The conditional probabilities of each attribute, Aj can beestimated as:
P(Ajk | ci) ∼ Njki/Ni (8)
(INAOE) 13 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Inference
• The posterior probability can be obtained just bymultiplying the prior by the likelihood for each attribute
• Given the values for m attributes, a1, ...am, for eachclass ci , the posterior is proportional to:
P(ci | a1, ...am) ∼ P(ci)P(a1 | ci)...P(am | ci) (9)
• The class ck that maximizes the previous equation willbe selected
(INAOE) 14 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Example - Classifier for Golf
Outlook Temperature Humidity Windy Playsunny high high false nosunny high high true no
overcast high high false yesrain medium high false yesrain low normal false yesrain low normal true no
overcast low normal true yessunny medium high false nosunny low normal false yesrain medium normal false yes
sunny medium normal true yesovercast medium high true yesovercast high normal false yes
rain medium high true no
(INAOE) 15 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Example - NBC for Golf
(INAOE) 16 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Example - inference for Golf
Given: Outlook=rain, Temperature=high, Humidity=normal,Windy=noP(Play = yes | overcast ,medium,normal ,no) =k × 0.64× 0.33× 0.22× 0.67× 0.67 = k × 0.021P(Play = no | overcast ,medium,normal ,no) =k × 0.36× 0.40× 0.40× 0.2.× 0.40 = k × 0.0046k = 1/(0.021 + 0.0046) = 39.27P(Play = yes | overcast ,medium,normal ,no) = 0.82P(Play = no | overcast ,medium,normal ,no) = 0.18
(INAOE) 17 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Naive Bayes Classifier
Analysis
• Advantages:• The low number of required parameters, which reduces
the memory requirements and facilitates learning themfrom data.
• The low computational cost for inference (estimating theposteriors) and learning.
• The relatively good performance (classificationprecision) in many domains.
• A simple and intuitive model.• Limitations:
• In some domains, the performance is reduced given thatthe conditional independence assumption is not valid.
• If there are continuous attributes, these need to bediscretized (or consider alternative models such as thelinear discriminator).
(INAOE) 18 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
TAN and BAN
TAN
• The Tree augmented Bayesian Classifier, or TAN,incorporates some dependencies between theattributes by building a directed tree among the attributevariables
• The n attributes form a graph which is restricted to adirected tree that represents the dependency relationsbetween the attributes
(INAOE) 19 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
TAN and BAN
BAN
• The Bayesian Network augmented Bayesian Classifier,or BAN, which considers that the dependency structureamong the attributes constitutes a directed acyclicgraph (DAG)
(INAOE) 20 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
TAN and BAN
Inference and Learning
• The TAN and BAN classifiers can be considered asparticular cases of a more general model, that is,Bayesian networks
• Techniques for inference and for learning Bayesiannetworks, which can be applied to obtain the posteriorprobabilities (inference) and the model (learning) for theTAN and BAN classifiers
(INAOE) 21 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Semi-Naive Bayesian Classifiers
Semi-Naive Bayesian Classifiers (SNBC)
• Another alternative to deal with dependent attributes isto transform the basic structure of a naive Bayesianclassifier, while maintaining a star or tree-structurednetwork
• The basic idea of the SNBC is to eliminate or joinattributes which are not independent given the class
• This is analogous to feature selection in machinelearning
(INAOE) 22 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Semi-Naive Bayesian Classifiers
Structural Improvement
• Two alternative operations to modify the structure of aNBC: (i) node elimination, and (ii) node combination,considering that we start from a full structure
(INAOE) 23 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Semi-Naive Bayesian Classifiers
Node elimination and combination
• Node elimination consists in simply eliminating anattribute, Ai , from the model, this could be because it isnot relevant for the class (Ai and C are independent); orbecause the attribute Ai and another attribute, Aj , arenot independent given the class
• Node combination consists in merging two attributes, Aiand Aj , into a new attribute Ak , such that Ak has aspossible values the cross product of the values of Aiand Aj (assuming discrete attributes). For example, ifAi = a,b, c and Aj = 1,2, thenAk = a1,a2,b1,b2, c1, c2. This is an alternative whentwo attributes are not independent given the class
(INAOE) 24 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Semi-Naive Bayesian Classifiers
Node Insertion
• Consists in adding a new attribute that makes twodependent attributes independent
• This new attribute is a kind of virtual or hidden node inthe model, for which we do not have any data
• An alternative for estimating the parameters of hiddenvariables in Bayesian networks, such as in this case, isbased on the Expectation-Maximization (EM) procedure
(INAOE) 25 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers
Multidimensional classification
• Several important problems need to predict severalclasses simultaneously
• For example: text classification, where a document canbe assigned to several topics; gene classification, as agene may have different functions; image annotation, asan image may include several objects
(INAOE) 26 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers
Definition
• The multi-dimensional classification problemcorresponds to searching for a function h that assigns toeach instance represented by a vector of m featuresX = (X1, . . . ,Xm) a vector of d class valuesC = (C1, . . . ,Cd):
ArgMaxc1,...,cd P(C1 = c1, . . . ,Cd = cd |X) (10)
(INAOE) 27 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers
Multi-label classification
• Multi-label classification is a particular case ofmulti-dimensional classification, where all classvariables are binary
• Two basic approaches:• Binary relevance approaches transform the multi-label
classification problem into d independent binaryclassification problems, one for each class variable,C1, . . . ,Cd
• The label power-set approach transforms the multi-labelclassification problem into a single-class scenario bydefining a new compound class variable whose possiblevalues are all the possible combinations of values of theoriginal classes
(INAOE) 28 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers
Multidimensional Bayesian NetworkClassifiers
• A multidimensional Bayesian network classifier (MBC) sa Bayesian network with a particular structure
• The set V of variables is partitioned into two setsVC = {C1, . . . ,Cd}, d ≥ 1, of class variables andVX = {X1, . . . ,Xm}, m ≥ 1, of feature variables(d + m = n)
• The set A of arcs is also partitioned into three sets, AC,AX, ACX, such that AC ⊆ VC × VC is composed of thearcs between the class variables, AX ⊆ VX × VX iscomposed of the arcs between the feature variables,and finally, ACX ⊆ VC ×VX is composed of the arcs fromthe class variables to the feature variables
(INAOE) 29 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers
MDBNC - Graphical Model
(INAOE) 30 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers
Inference
• The problem of obtaining the classification of aninstance with a MBC, that is, the most likelycombination of classes, corresponds to the MPE (MostProbable Explanation) or abduction problem
• This is a complex problem with a high computationalcost
(INAOE) 31 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers
Chain Classifiers
• Chain classifiers are an alternative method formulti-label classification that incorporate classdependencies, while keeping the computationalefficiency of the binary relevance approach
• A chain classifier consists of d base binary classifierswhich are linked in a chain, such that each classifierincorporates the classes predicted by the previousclassifiers as additional attributes
• The feature vector for each binary classifier, Li , isextended with the labels (0/1) for all previous classifiersin the chain
(INAOE) 32 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers
Chain Classifiers - example
(INAOE) 33 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers
Learning and Inference
• Each classifier in the chain is trained to learn theassociation of label li given the features augmented withall previous class labels in the chain
• For classification, it starts at L1, and propagates thepredicted classes along the chain, such that for Li ∈ L(where L = {L1,L2, . . . ,Ld}) it predictsP(Li | X,L1,L2, . . . ,Li−1)
• The class vector is determined by combining theoutputs of all the binary classifiers
(INAOE) 34 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers
Bayesian chain classifiers
• Bayesian chain classifiers are a type of chain classifierunder a probabilistic framework:
ArgMaxC1,...,cd P(C1|C2, . . . ,Cd ,X) . . .P(Cd |X) (11)
• If we consider the dependency relations between theclass variables, and represent these relations as adirected acyclic graph (DAG), then we can simplify theprevious Equation:
ArgMaxC1,...,Cd
d∏i=1
P(Ci |Pa(Ci),x) (12)
(INAOE) 35 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers
• Further simplification by assuming that the mostprobable joint combination of classes can beapproximated by concatenating the individual mostprobable classes
ArgMaxC1P(C1|Pa(C1),X)ArgMaxC2P(C2|Pa(C2),X)
· · · · · · · · · · · · · · · · · · · · ·ArgMaxCd P(Cd |Pa(Cd),X)
• This last approximation corresponds to a Bayesianchain classifier.
• For the base classifier we can use any of the Bayesianclassifiers presented in the previous sections, forinstance a NBC
(INAOE) 36 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Multidimen. Bayesian Classifiers Bayesian Chain Classifiers
BCC
(INAOE) 37 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
Hierarchical Classification
• Hierarchical classification is a type of multidimensionalclassification in which the classes are ordered in apredefined structure, typically a tree, or in general adirected acyclic graph (DAG)
• In hierarchical classification, an example that belongs tocertain class automatically belongs to all itssuperclasses
• Hierarchical classification has application in severalareas, such as text categorization, protein functionprediction, and object recognition
(INAOE) 38 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
Hierarchical Classification - example
(INAOE) 39 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
Basic approaches
• The global approach builds a classifier to predicts allthe classes at once; this becomes too complexcomputationally for large hierarchies.
• The local approaches train several classifiers andcombine their outputs:
• Local Classifier per hierarchy Level, that trains onemulti-class classifier for each level of the class hierarchy
• Local binary Classifier per Node, in which a binaryclassifier is built for each node (class) in the hierarchy,except the root node
• Local Classifier per Parent Node (LCPN), where amulti-class classifier is trained to predict its child nodes
• Local methods commonly use a top-down approach forclassification
(INAOE) 40 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
Types of Methods
(INAOE) 41 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
Hierarchical Structure
(INAOE) 42 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
Chained path evaluation
• Chained Path Evaluation (CPE) analyzes each possiblepath from the root to a leaf node in the hierarchy, takinginto account the level of the predicted labels to give ascore to each path and finally return the one with thebest score
• Additionally, it considers the relations of each node withits ancestors in the hierarchy, based on chain classifiers
(INAOE) 43 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
Training
• A local classifier is trained for each node, Ci , in thehierarchy, except the leaf nodes, to classify its childnodes
• The classifier for each node, Ci , for instance a naiveBayes classifier, is trained considering examples fromall it child nodes, as well as some examples of it siblingnodes in the hierarchy
• To consider the relation with other nodes in thehierarchy, the class predicted by the parent (treestructure) or parents (DAG), are included as additionalattributes in each local classifier
(INAOE) 44 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
Classification
• In the classification phase, the probabilities for eachclass for all local classifiers are obtained based on theinput data
• The score for each path in the hierarchy is calculated bya weighted sum of the log of the probabilities of all localclassifiers in the path:
score =n∑
i=0
wCi × log(P(Ci |Xi ,pa(Ci))) (13)
• The purpose of these weights is to give moreimportance to the upper levels of the hierarchy
• Once the scores for all the paths are obtained, the pathwith the highest score will be selected as the set ofclasses corresponding to certain instance
(INAOE) 45 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Hierarchical Classification
CPE - Classification Example
(INAOE) 46 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Applications
Visual skin detection
• Skin detection is a useful pre-processing stage for manyapplications in computer vision, such as persondetection and gesture recognition, among others
• A simple and very fast way to obtain an approximateclassification of pixels in an image as skin or not − skinis based on the color attributes of each pixel
• Usually, pixels in a digital image are represented as thecombination of three basic (primary) colors: Red (R),Green (G) and Blue (B), in what is known as the RGBmodel. There are alternative color models, such asHSV , YIQ, etc.
(INAOE) 47 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Applications
SNBC
• An alternative is to consider a semi-naive Bayesianclassifier and select the best attributes from the differentcolor models for skin classification by eliminating orjoining attributes
• Then an initial NBC was learned based on data -examples of skin and not-skin pixels taken from severalimages. This initial classifier obtained a 94% accuracywhen applied to other (test) images.
• The classifier was then optimized. Starting from the fullNBC with 9 attributes, the method applies the variableelimination and combination stages until the simplestclassifier with maximum accuracy is obtained. With thisfinal model accuracy improved to 98%.
(INAOE) 48 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Applications
Optimization
(INAOE) 49 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Applications
Example
(INAOE) 50 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Applications
HIV drug selection
• The Human Immunodeficiency Virus (HIV) is thecausing agent of AIDS, a condition in which progressivefailure of the immune system allows opportunisticlife-threatening infections to occur
• The Human Immunodeficiency Virus (HIV) is thecausing agent of AIDS, a condition in which progressivefailure of the immune system allows opportunisticlife-threatening infections to occur
• To combat HIV infection several antiretroviral (ARV)drugs belonging to different drug classes that affectspecific steps in the viral replication cycle have beendeveloped. It is important to select the best drugcombination according to the virus’ mutations in apatient.
(INAOE) 51 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Applications
Multilabel Classification
• Selecting the best group of antiretroviral drugs for apatient can be seen as an instance of a multi-labelclassification problem
• This particular problem can be accurately modeled witha multi-dimensional Bayesian network classifiers
• This model can be learned from data and then use it forselecting the set of drugs according to the features(virus mutations)
(INAOE) 52 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
Applications
MBC for HIV drug selection
(INAOE) 53 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
References
Book
Sucar, L. E, Probabilistic Graphical Models, Springer 2015 –Chapter 4
(INAOE) 54 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
References
Additional Reading (1)
Bielza, C., Li, G., Larranaga, P.: Multi-DimensionalClassification with Bayesian Networks. InternationalJournal of Approximate Reasoning, (2011)
Borchani, H., Bielza, C., Toro, C., Larranaga, P.:Predicting human immunodeficiency virus inhibitorsusing multi-dimensional Bayesian network classifiers.Artificial Intelligence in Medicine 57, 219–229 (2013)
Cheng, J., Greiner, R.: Comparing Bayesian NetworkClassifiers. Proceedings of the Fifteenth Conference onUncertainty in Artificial Intelligence, 101–108 (1999)
Friedman, N., Geiger, D., Goldszmidt, M.: BayesianNetwork Classifiers. Machine Learning 29, 131–163(1997)
(INAOE) 55 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
References
Additional Reading (2)
Kwoh, C.K, Gillies, D.F.: Using Hidden Nodes inBayesian Networks. Artificial Intelligence 88, Elsevier,Essex, UK, 1–38 (1996)
Martinez M., Sucar, L.E.: Learning an optimal naivebayes classifier. International Conference on PatternRecognition (ICPR), Vol. 3, 1236–1239 (2006)
Read, J., Pfahringer, B., Holmes, G., Frank, E.:Classifier chains for multi-label classification.Proceedings ECML/PKDD, 254–269 (2009)
Ramırez, M., Sucar, L.E., Morales, E.: Path Evaluationfor Hierarchical Multi-label Classification. Proceedings ofthe Twenty-Seventh International Florida ArtificialIntelligence Research Society Conference (FLAIRS), pp.502–507 (2014)
(INAOE) 56 / 56
Introduction
BayesianClassifiers
Naive BayesClassifier
TAN and BAN
Semi-NaiveBayesianClassifiers
Multidimen.BayesianClassifiersBayesian ChainClassifiers
HierarchicalClassification
Applications
References
References
Additional Reading (3)
Silla-Jr., C. N., Freitas, A. A.: A survey of hierarchicalclassification across different application domains. DataMin. and Knowledge Discovery, 22(1-2), 31–72 (2011).
Sucar, L.E, Bielza, C., Morales, E., Hernandez P.,Zaragoza, J., Larranaga, P.: Multi-label Classificationwith Bayesian Network-based Chain Classifiers. PatternRecognition Letters 41, 14–22 (2014)
Tsoumakas, G., Katakis, I.: Multi-Label Classification:An Overview. International Journal of Data Warehousingand Mining, 3, 1–13 (2007)
van der Gaag L.C., de Waal, P.R.: Multi-dimensionalBayesian Network Classifiers. Third EuropeanConference on Probabilistic Graphic Models, 107–114,Prague, Czech Republic (2006)
(INAOE) 57 / 56