Upload
trasha-gupta
View
232
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Describes the chapter 5 of data mining by vipinn kumar
Citation preview
Data Mining Classification: Alternative Techniques
Lecture Notes for Chapter 5
Introduction to Data MiningbyTan, Steinbach, Kumar
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Rule-Based ClassifierClassify records by using a collection of ifthen rules
Rule: (Condition) ywhere Condition is a conjunctions of attributes y is the class labelLHS: rule antecedent or conditionRHS: rule consequentExamples of classification rules: (Blood Type=Warm) (Lay Eggs=Yes) Birds (Taxable Income < 50K) (Refund=Yes) Evade=No
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Rule-based Classifier (Example)R1: (Give Birth = no) (Can Fly = yes) BirdsR2: (Give Birth = no) (Live in Water = yes) FishesR3: (Give Birth = yes) (Blood Type = warm) MammalsR4: (Give Birth = no) (Can Fly = no) ReptilesR5: (Live in Water = sometimes) Amphibians
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Application of Rule-Based ClassifierA rule r covers an instance x if the attributes of the instance satisfy the condition of the rule
R1: (Give Birth = no) (Can Fly = yes) BirdsR2: (Give Birth = no) (Live in Water = yes) FishesR3: (Give Birth = yes) (Blood Type = warm) MammalsR4: (Give Birth = no) (Can Fly = no) ReptilesR5: (Live in Water = sometimes) Amphibians The rule R1 covers a hawk => BirdThe rule R3 covers the grizzly bear => Mammal
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Rule Coverage and AccuracyCoverage of a rule:Fraction of records that satisfy the antecedent of a ruleAccuracy of a rule:Fraction of records that satisfy both the antecedent and consequent of a rule
(Status=Single) No Coverage = 40%, Accuracy = 50%
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Tid
Refund
Marital
Status
Taxable
Income
Class
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
How does Rule-based Classifier Work?R1: (Give Birth = no) (Can Fly = yes) BirdsR2: (Give Birth = no) (Live in Water = yes) FishesR3: (Give Birth = yes) (Blood Type = warm) MammalsR4: (Give Birth = no) (Can Fly = no) ReptilesR5: (Live in Water = sometimes) Amphibians A lemur triggers rule R3, so it is classified as a mammalA turtle triggers both R4 and R5A dogfish shark triggers none of the rules
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Characteristics of Rule-Based ClassifierMutually exclusive rulesClassifier contains mutually exclusive rules if the rules are independent of each otherEvery record is covered by at most one rule
Exhaustive rulesClassifier has exhaustive coverage if it accounts for every possible combination of attribute valuesEach record is covered by at least one rule
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
From Decision Trees To RulesRules are mutually exclusive and exhaustiveRule set contains as much information as the tree
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Rules Can Be SimplifiedInitial Rule: (Refund=No) (Status=Married) NoSimplified Rule: (Status=Married) No
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Tid
Refund
Marital
Status
Taxable
Income
Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
Effect of Rule SimplificationRules are no longer mutually exclusiveA record may trigger more than one rule Solution? Ordered rule set Unordered rule set use voting schemes
Rules are no longer exhaustiveA record may not trigger any rulesSolution? Use a default class
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Ordered Rule SetRules are rank ordered according to their priorityAn ordered rule set is known as a decision listWhen a test record is presented to the classifier It is assigned to the class label of the highest ranked rule it has triggeredIf none of the rules fired, it is assigned to the default class
R1: (Give Birth = no) (Can Fly = yes) BirdsR2: (Give Birth = no) (Live in Water = yes) FishesR3: (Give Birth = yes) (Blood Type = warm) MammalsR4: (Give Birth = no) (Can Fly = no) ReptilesR5: (Live in Water = sometimes) Amphibians
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Rule Ordering SchemesRule-based orderingIndividual rules are ranked based on their qualityClass-based orderingRules that belong to the same class appear together
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Rule-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced}, Taxable Income No
(Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Class-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced}, Taxable Income No
(Refund=No, Marital Status={Married}) ==> No
(Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes
Building Classification RulesDirect Method: Extract rules directly from data e.g.: RIPPER, CN2, Holtes 1R
Indirect Method: Extract rules from other classification models (e.g. decision trees, neural networks, etc). e.g: C4.5rules
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Direct Method: Sequential CoveringStart from an empty ruleGrow a rule using the Learn-One-Rule functionRemove training records covered by the ruleRepeat Step (2) and (3) until stopping criterion is met
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example of Sequential Covering
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example of Sequential Covering
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Aspects of Sequential CoveringRule Growing
Instance Elimination
Rule Evaluation
Stopping Criterion
Rule Pruning
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Rule GrowingTwo common strategies
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
(a) General-to-specific
Refund= No
Status = Single
Status = Divorced
Status = Married
Income > 80K
...
{ }
Yes: 0No: 3
Yes: 3No: 4
Yes: 3No: 4
Yes: 2No: 1
Yes: 1No: 0
Yes: 3No: 1
Refund=No,Status=Single,Income=85K(Class=Yes)
Refund=No,Status=Single,Income=90K(Class=Yes)
Refund=No,Status = Single(Class = Yes)
(b) Specific-to-general
Rule Growing (Examples)CN2 Algorithm:Start from an empty conjunct: {}Add conjuncts that minimizes the entropy measure: {A}, {A,B}, Determine the rule consequent by taking majority class of instances covered by the ruleRIPPER Algorithm:Start from an empty rule: {} => classAdd conjuncts that maximizes FOILs information gain measure: R0: {} => class (initial rule) R1: {A} => class (rule after adding conjunct) Gain(R0, R1) = t [ log (p1/(p1+n1)) log (p0/(p0 + n0)) ] where t: number of positive instances covered by both R0 and R1
p0: number of positive instances covered by R0n0: number of negative instances covered by R0p1: number of positive instances covered by R1n1: number of negative instances covered by R1
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Instance EliminationWhy do we need to eliminate instances?Otherwise, the next rule is identical to previous ruleWhy do we remove positive instances?Ensure that the next rule is differentWhy do we remove negative instances?Prevent underestimating accuracy of ruleCompare rules R2 and R3 in the diagram
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
class = +
class = -
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
+
+
+
+
+
+
+
R1
R3
R2
+
Rule EvaluationMetrics:Accuracy
Laplace
M-estimate
n : Number of instances covered by rulenc : Number of instances covered by rulek : Number of classesp : Prior probability
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Stopping Criterion and Rule PruningStopping criterionCompute the gainIf gain is not significant, discard the new rule
Rule PruningSimilar to post-pruning of decision treesReduced Error Pruning: Remove one of the conjuncts in the rule Compare error rate on validation set before and after pruning If error improves, prune the conjunct
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Summary of Direct MethodGrow a single rule
Remove Instances from rule
Prune the rule (if necessary)
Add rule to Current Rule Set
Repeat
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Direct Method: RIPPERFor 2-class problem, choose one of the classes as positive class, and the other as negative classLearn rules for positive classNegative class will be default classFor multi-class problemOrder the classes according to increasing class prevalence (fraction of instances that belong to a particular class)Learn the rule set for smallest class first, treat the rest as negative classRepeat with next smallest class as positive class
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Direct Method: RIPPERGrowing a rule:Start from empty ruleAdd conjuncts as long as they improve FOILs information gainStop when rule no longer covers negative examplesPrune the rule immediately using incremental reduced error pruningMeasure for pruning: v = (p-n)/(p+n) p: number of positive examples covered by the rule in the validation set n: number of negative examples covered by the rule in the validation setPruning method: delete any final sequence of conditions that maximizes v
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Direct Method: RIPPERBuilding a Rule Set:Use sequential covering algorithm Finds the best rule that covers the current set of positive examples Eliminate both positive and negative examples covered by the ruleEach time a rule is added to the rule set, compute the new description length stop adding new rules when the new description length is d bits longer than the smallest description length obtained so far
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Direct Method: RIPPEROptimize the rule set:For each rule r in the rule set R Consider 2 alternative rules:
Replacement rule (r*): grow new rule from scratchRevised rule(r): add conjuncts to extend the rule r Compare the rule set for r against the rule set for r* and r Choose rule set that minimizes MDL principleRepeat rule generation and rule optimization for the remaining positive examples
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Indirect Methods
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
P
Rule Set
r1: (P=No,Q=No) ==> -r2: (P=No,Q=Yes) ==> +r3: (P=Yes,R=No) ==> +r4: (P=Yes,R=Yes,Q=No) ==> -r5: (P=Yes,R=Yes,Q=Yes) ==> +
Q
R
Q
-
+
+
-
+
No
No
No
No
Yes
Yes
Yes
Yes
Indirect Method: C4.5rulesExtract rules from an unpruned decision treeFor each rule, r: A y, consider an alternative rule r: A y where A is obtained by removing one of the conjuncts in ACompare the pessimistic error rate for r against all rsPrune if one of the rs has lower pessimistic error rateRepeat until we can no longer improve generalization error
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Indirect Method: C4.5rulesInstead of ordering the rules, order subsets of rules (class ordering)Each subset is a collection of rules with the same rule consequent (class)Compute description length of each subset Description length = L(error) + g L(model) g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
animals2
NameGive BirthLay EggsCan FlyLive in WaterHave LegsClass
humanyesnononoyesmammals
pythonnoyesnononoreptiles
salmonnoyesnoyesnofishes
whaleyesnonoyesnomammals
frognoyesnosometimesyesamphibians
komodonoyesnonoyesreptiles
batyesnoyesnoyesmammals
pigeonnoyesyesnoyesbirds
catyesnononoyesmammals
leopard sharkyesnonoyesnofishes
turtlenoyesnosometimesyesreptiles
penguinnoyesnosometimesyesbirds
porcupineyesnononoyesmammals
eelnoyesnoyesnofishes
salamandernoyesnosometimesyesamphibians
gila monsternoyesnonoyesreptiles
platypusnoyesnonoyesmammals
owlnoyesyesnoyesbirds
dolphinyesnonoyesnomammals
eaglenoyesyesnoyesbirds
C4.5 versus C4.5rules versus RIPPERC4.5rules:(Give Birth=No, Can Fly=Yes) Birds(Give Birth=No, Live in Water=Yes) Fishes(Give Birth=Yes) Mammals(Give Birth=No, Can Fly=No, Live in Water=No) Reptiles( ) AmphibiansRIPPER:(Live in Water=Yes) Fishes(Have Legs=No) Reptiles(Give Birth=No, Can Fly=No, Live In Water=No) Reptiles(Can Fly=Yes,Give Birth=No) Birds() Mammals
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
C4.5 versus C4.5rules versus RIPPERC4.5 and C4.5rules:RIPPER:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Sheet1
PREDICTED CLASS
AmphibiansFishesReptilesBirdsMammals
ACTUALAmphibians00002
CLASSFishes03000
Reptiles00301
Birds00121
Mammals02104
C4.5AmphibiansFishesReptilesBirdsMammals
Amphibians20000
Fishes02001
Reptiles10300
Birds10030
Mammals00106
C4.5rulesAmphibiansFishesReptilesBirdsMammals
Amphibians20000
Fishes02001
Reptiles10300
Birds10030
Mammals00106
Sheet2
Sheet3
Sheet1
RIPPER:AmphibiansFishesReptilesBirdsMammals
Amphibians00002
Fishes03000
Reptiles00301
Birds00121
Mammals02104
PREDICTED CLASS
AmphibiansFishesReptilesBirdsMammals
ACTUALAmphibians20000
CLASSFishes02001
Reptiles10300
Birds10030
Mammals00106
C4.5rulesAmphibiansFishesReptilesBirdsMammals
Amphibians20000
Fishes02001
Reptiles10300
Birds10030
Mammals00106
Sheet2
Sheet3
Advantages of Rule-Based ClassifiersAs highly expressive as decision treesEasy to interpretEasy to generateCan classify new instances rapidlyPerformance comparable to decision trees
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Instance-Based Classifiers Store the training records Use training records to predict the class label of unseen cases
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Instance Based ClassifiersExamples:Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly
Nearest neighbor Uses k closest points (nearest neighbors) for performing classification
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nearest Neighbor ClassifiersBasic idea:If it walks like a duck, quacks like a duck, then its probably a duck
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nearest-Neighbor ClassifiersRequires three thingsThe set of stored recordsDistance Metric to compute distance between recordsThe value of k, the number of nearest neighbors to retrieve
To classify an unknown record:Compute distance to other training recordsIdentify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Unknown record
Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
1 nearest-neighborVoronoi Diagram
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nearest Neighbor ClassificationCompute distance between two points:Euclidean distance
Determine the class from nearest neighbor listtake the majority vote of class labels among the k-nearest neighborsWeigh the vote according to distance weight factor, w = 1/d2
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nearest Neighbor ClassificationChoosing the value of k:If k is too small, sensitive to noise pointsIf k is too large, neighborhood may include points from other classes
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
X
Nearest Neighbor ClassificationScaling issuesAttributes may have to be scaled to prevent distance measures from being dominated by one of the attributesExample: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nearest Neighbor ClassificationProblem with Euclidean measure:High dimensional data curse of dimensionalityCan produce counter-intuitive results
1 1 1 1 1 1 1 1 1 1 1 00 1 1 1 1 1 1 1 1 1 1 11 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 1vsd = 1.4142d = 1.4142 Solution: Normalize the vectors to unit length
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nearest neighbor Classificationk-NN classifiers are lazy learners It does not build models explicitlyUnlike eager learners such as decision tree induction and rule-based systemsClassifying unknown records are relatively expensive
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example: PEBLSPEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg)Works with both continuous and nominal featuresFor nominal features, distance between two nominal values is computed using modified value difference metric (MVDM)Each record is assigned a weight factorNumber of nearest neighbor, k = 1
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example: PEBLSDistance between nominal attribute values:d(Single,Married) = | 2/4 0/4 | + | 2/4 4/4 | = 1d(Single,Divorced) = | 2/4 1/2 | + | 2/4 1/2 | = 0d(Married,Divorced) = | 0/4 1/2 | + | 4/4 1/2 | = 1d(Refund=Yes,Refund=No)= | 0/3 3/7 | + | 3/3 4/7 | = 6/7
ClassMarital StatusSingleMarriedDivorcedYes201No241
ClassRefundYesNoYes03No34
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Tid
Refund
Marital
Status
Taxable
Income
Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
Example: PEBLSDistance between record X and record Y: where: wX 1 if X makes accurate prediction most of the timewX > 1 if X is not reliable for making predictions
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Tid
Refund
Marital
Status
Taxable
Income
Cheat
X
Yes
Single
125K
No
Y
No
Married
100K
No
10
Bayes ClassifierA probabilistic framework for solving classification problemsConditional Probability:
Bayes theorem:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example of Bayes TheoremGiven: A doctor knows that meningitis causes stiff neck 50% of the timePrior probability of any patient having meningitis is 1/50,000Prior probability of any patient having stiff neck is 1/20
If a patient has stiff neck, whats the probability he/she has meningitis?
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Bayesian ClassifiersConsider each attribute and class label as random variables
Given a record with attributes (A1, A2,,An) Goal is to predict class CSpecifically, we want to find the value of C that maximizes P(C| A1, A2,,An )
Can we estimate P(C| A1, A2,,An ) directly from data?
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Bayesian ClassifiersApproach:compute the posterior probability P(C | A1, A2, , An) for all values of C using the Bayes theorem
Choose value of C that maximizes P(C | A1, A2, , An)
Equivalent to choosing value of C that maximizes P(A1, A2, , An|C) P(C)
How to estimate P(A1, A2, , An | C )?
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nave Bayes ClassifierAssume independence among attributes Ai when class is given: P(A1, A2, , An |C) = P(A1| Cj) P(A2| Cj) P(An| Cj)
Can estimate P(Ai| Cj) for all Ai and Cj.
New point is classified to Cj if P(Cj) P(Ai| Cj) is maximal.
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
How to Estimate Probabilities from Data?Class: P(C) = Nc/Ne.g., P(No) = 7/10, P(Yes) = 3/10
For discrete attributes: P(Ai | Ck) = |Aik|/ Nc
where |Aik| is number of instances having attribute Ai and belongs to class CkExamples:
P(Status=Married|No) = 4/7P(Refund=Yes|Yes)=0k
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
How to Estimate Probabilities from Data?For continuous attributes: Discretize the range into bins one ordinal attribute per bin violates independence assumptionTwo-way split: (A < v) or (A > v) choose only one of the two splits as new attributeProbability density estimation: Assume attribute follows a normal distribution Use data to estimate parameters of distribution (e.g., mean and standard deviation) Once probability distribution is known, can use it to estimate the conditional probability P(Ai|c)
k
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
How to Estimate Probabilities from Data?Normal distribution:
One for each (Ai,ci) pair
For (Income, Class=No):If Class=No sample mean = 110 sample variance = 2975
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example of Nave Bayes ClassifierP(X|Class=No) = P(Refund=No|Class=No) P(Married| Class=No) P(Income=120K| Class=No) = 4/7 4/7 0.0072 = 0.0024
P(X|Class=Yes) = P(Refund=No| Class=Yes) P(Married| Class=Yes) P(Income=120K| Class=Yes) = 1 0 1.2 10-9 = 0
Since P(X|No)P(No) > P(X|Yes)P(Yes)Therefore P(No|X) > P(Yes|X) => Class = NoGiven a Test Record:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nave Bayes ClassifierIf one of the conditional probability is zero, then the entire expression becomes zeroProbability estimation:
c: number of classesp: prior probabilitym: parameter
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example of Nave Bayes ClassifierA: attributesM: mammalsN: non-mammalsP(A|M)P(M) > P(A|N)P(N)=> Mammals
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
animals2
NameGive BirthLay EggsCan FlyLive in WaterHave LegsClass
humanyesnononoyesmammals
pythonnoyesnononoreptiles
salmonnoyesnoyesnofishes
whaleyesnonoyesnomammals
frognoyesnosometimesyesamphibians
komodonoyesnonoyesreptiles
batyesnoyesnoyesmammals
pigeonnoyesyesnoyesbirds
catyesnononoyesmammals
leopard sharkyesnonoyesnofishes
turtlenoyesnosometimesyesreptiles
penguinnoyesnosometimesyesbirds
porcupineyesnononoyesmammals
eelnoyesnoyesnofishes
salamandernoyesnosometimesyesamphibians
gila monsternoyesnonoyesreptiles
platypusnoyesnonoyesmammals
owlnoyesyesnoyesbirds
dolphinyesnonoyesnomammals
eaglenoyesyesnoyesbirds
NameGive BirthLay EggsCan FlyLive in WaterHave LegsClass
humanyesnononoyesmammals
pythonnoyesnonononon-mammals
salmonnoyesnoyesnonon-mammals
whaleyesnonoyesnomammals
frognoyesnosometimesyesnon-mammals
komodonoyesnonoyesnon-mammals
batyesnoyesnoyesmammals
pigeonnoyesyesnoyesnon-mammals
catyesnononoyesmammals
leopard sharkyesnonoyesnonon-mammals
turtlenoyesnosometimesyesnon-mammals
penguinnoyesnosometimesyesnon-mammals
porcupineyesnononoyesmammals
eelnoyesnoyesnonon-mammals
salamandernoyesnosometimesyesnon-mammals
gila monsternoyesnonoyesnon-mammals
platypusnoyesnonoyesmammals
owlnoyesyesnoyesnon-mammals
dolphinyesnonoyesnomammals
eaglenoyesyesnoyesnon-mammals
animals2
NameGive BirthLay EggsCan FlyLive in WaterHave LegsClass
humanyesnononoyesmammals
pythonnoyesnononoreptiles
salmonnoyesnoyesnofishes
whaleyesnonoyesnomammals
frognoyesnosometimesyesamphibians
komodonoyesnonoyesreptiles
batyesnoyesnoyesmammals
pigeonnoyesyesnoyesbirds
catyesnononoyesmammals
leopard sharkyesnonoyesnofishes
turtlenoyesnosometimesyesreptiles
penguinnoyesnosometimesyesbirds
porcupineyesnononoyesmammals
eelnoyesnoyesnofishes
salamandernoyesnosometimesyesamphibians
gila monsternoyesnonoyesreptiles
platypusnoyesnonoyesmammals
owlnoyesyesnoyesbirds
dolphinyesnonoyesnomammals
eaglenoyesyesnoyesbirds
NameGive BirthLay EggsCan FlyLive in WaterHave LegsClass
humanyesnononoyesmammals
pythonnoyesnonononon-mammals
salmonnoyesnoyesnonon-mammals
whaleyesnonoyesnomammals
frognoyesnosometimesyesnon-mammals
komodonoyesnonoyesnon-mammals
batyesnoyesnoyesmammals
pigeonnoyesyesnoyesnon-mammals
catyesnononoyesmammals
leopard sharkyesnonoyesnonon-mammals
turtlenoyesnosometimesyesnon-mammals
penguinnoyesnosometimesyesnon-mammals
porcupineyesnononoyesmammals
eelnoyesnoyesnonon-mammals
salamandernoyesnosometimesyesnon-mammals
gila monsternoyesnonoyesnon-mammals
platypusnoyesnonoyesmammals
owlnoyesyesnoyesnon-mammals
dolphinyesnonoyesnomammals
eaglenoyesyesnoyesnon-mammals
NameGive BirthLay EggsCan FlyLive in WaterHave LegsClass
humanyesnonoyesno?
Nave Bayes (Summary)Robust to isolated noise points
Handle missing values by ignoring the instance during probability estimate calculations
Robust to irrelevant attributes
Independence assumption may not hold for some attributesUse other techniques such as Bayesian Belief Networks (BBN)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Artificial Neural Networks (ANN)Output Y is 1 if at least two of the three inputs are equal to 1.
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
X1
X2
X3
Y
Black box
Output
Input
Artificial Neural Networks (ANN)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
S
X1
X2
X3
Y
Black box
0.3
0.3
0.3
t=0.4
Output node
Input nodes
Artificial Neural Networks (ANN)Model is an assembly of inter-connected nodes and weighted links
Output node sums up each of its input value according to the weights of its links
Compare output node against some threshold t
Perceptron Modelor
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
S
X1
X2
X3
Y
Black box
w1
w2
w3
t
Output node
Input nodes
General Structure of ANNTraining ANN means learning the weights of the neurons
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Activation function g(Si )
Si
Oi
I1
I2
I3
wi1
wi2
wi3
Oi
Neuron i
Input
Output
threshold, t
Input Layer
Hidden Layer
Output Layer
x1
x2
x3
x4
x5
y
Algorithm for learning ANNInitialize the weights (w0, w1, , wk)
Adjust the weights in such a way that the output of ANN is consistent with class labels of training examplesObjective function:
Find the weights wis that minimize the above objective function e.g., backpropagation algorithm (see lecture notes)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Support Vector MachinesFind a linear hyperplane (decision boundary) that will separate the data
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Support Vector MachinesOne Possible Solution
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
B1
Support Vector MachinesAnother possible solution
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
B2
Support Vector MachinesOther possible solutions
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
B2
Support Vector MachinesWhich one is better? B1 or B2?How do you define better?
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
B1
B2
Support Vector MachinesFind hyperplane maximizes the margin => B1 is better than B2
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
B1
B2
b11
b12
b21
b22
margin
Support Vector Machines
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
B1
b11
b12
Support Vector MachinesWe want to maximize:
Which is equivalent to minimizing:
But subjected to the following constraints:
This is a constrained optimization problem
Numerical approaches to solve it (e.g., quadratic programming)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Support Vector MachinesWhat if the problem is not linearly separable?
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Support Vector MachinesWhat if the problem is not linearly separable?Introduce slack variables Need to minimize:
Subject to:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nonlinear Support Vector MachinesWhat if decision boundary is not linear?
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Nonlinear Support Vector MachinesTransform data into higher dimensional space
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Ensemble MethodsConstruct a set of classifiers from the training data
Predict class label of previously unseen records by aggregating predictions made by multiple classifiers
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
General Idea
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Original Training data
....
D1
D2
Dt-1
Dt
D
Step 1: Create Multiple Data Sets
C1
C2
Ct -1
Ct
Step 2: Build Multiple Classifiers
C*
Step 3: Combine Classifiers
Why does it work?Suppose there are 25 base classifiersEach classifier has error rate, = 0.35Assume classifiers are independentProbability that the ensemble classifier makes a wrong prediction:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Examples of Ensemble MethodsHow to generate an ensemble of classifiers?Bagging
Boosting
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
BaggingSampling with replacement
Build classifier on each bootstrap sample
Each sample has probability (1 1/n)n of being selected
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
BoostingAn iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified recordsInitially, all N records are assigned equal weightsUnlike bagging, weights may change at the end of boosting round
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
BoostingRecords that are wrongly classified will have their weights increasedRecords that are classified correctly will have their weights decreased
Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example: AdaBoostBase classifiers: C1, C2, , CT
Error rate:
Importance of a classifier:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Example: AdaBoostWeight update:
If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeatedClassification:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Illustrating AdaBoost
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Boosting Round 1
+
+
+
-
-
-
-
-
-
-
B1
0.0094
0.0094
0.4623
a = 1.9459
Illustrating AdaBoost
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002
Boosting Round 1
+
+
+
-
-
-
-
-
-
-
Boosting Round 2
0.3037
-
-
-
-
-
-
-
-
+
+
Boosting Round 3
+
+
+
+
+
+
+
+
+
+
Overall
+
+
+
-
-
-
-
-
+
+
0.0009
0.0422
0.0276
0.1819
0.0038
B1
0.0094
0.0094
0.4623
B2
B3
a = 1.9459
a = 2.9323
a = 3.8744
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002