Upload
oscar-silman
View
215
Download
0
Embed Size (px)
Citation preview
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
•ML: Classical methods from AI
–Decision-Tree induction
–Exemplar-based Learning
–Rule Induction
–TBEDL
•ML: Classical methods from AI
–Decision-Tree induction
–Exemplar-based Learning
–Rule Induction
–TBEDL
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Decision TreesDecision TreesDecisionTreesDecisionTrees
• Decision trees are a way to represent rules underlying training data, with hierarchical sequential structures that recursively partition the data.
• They have been used by many research communities (Pattern Recognition, Statistics, ML, etc.) for data exploration with some of the following purposes: Description, Classification, and Generalization.
• From a machine-learning perspective: Decision Trees are n -ary branching trees that represent
classification rules for classifying the objects of a certain domain into a set of mutually exclusive classes
• Acquisition: Top-Down Induction of Decision Trees (TDIDT)
• Systems: CART (Breiman et al. 84), ID3, C4.5, C5.0 (Quinlan 86,93,98), ASSISTANT, ASSISTANT-R (Cestnik et al. 87; Kononenko et al. 95)
• Decision trees are a way to represent rules underlying training data, with hierarchical sequential structures that recursively partition the data.
• They have been used by many research communities (Pattern Recognition, Statistics, ML, etc.) for data exploration with some of the following purposes: Description, Classification, and Generalization.
• From a machine-learning perspective: Decision Trees are n -ary branching trees that represent
classification rules for classifying the objects of a certain domain into a set of mutually exclusive classes
• Acquisition: Top-Down Induction of Decision Trees (TDIDT)
• Systems: CART (Breiman et al. 84), ID3, C4.5, C5.0 (Quinlan 86,93,98), ASSISTANT, ASSISTANT-R (Cestnik et al. 87; Kononenko et al. 95)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
An ExampleAn Example
A1
A2 A3
C1
A5 A2
A2
A5 C3
C2C1
...
..
....
...
v1
v2
v3
v5v4
v6
v7
DecisionTreesDecisionTrees
small big
SHAPE
pos
circle red
SIZE
Decision TreeDecision Tree
COLOR
triang blue
neg pos neg
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Learning Decision TreesLearning Decision TreesTrainingTraining
Training Set
TDIDTTDIDT+DT
=
TestTest
=DT
Example + ClassClass
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Gen
era
l In
du
cti
on
A
lgori
thm
Gen
era
l In
du
cti
on
A
lgori
thm
function TDIDT (X:set-of-examples; A:set-of-features) var: tree1,tree2: decision-tree;
X’: set-of-examples; A’: set-of-features end-var if (stopping_criterion (X)) then tree1 := create_leaf_tree (X)
else amax := feature_selection (X,A);
tree1 := create_tree (X, amax);
for-all val in values (amax) do
X’ := select_exampes (X,amax,val);
A’ := A \ {amax};
tree2 := TDIDT (X’,A’);
tree1 := add_branch (tree1,tree2,val)
end-for end-if return (tree1)
end-function
DTsDTs
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Gen
era
l In
du
cti
on
A
lgori
thm
Gen
era
l In
du
cti
on
A
lgori
thm
function TDIDT (X:set-of-examples; A:set-of-features) var: tree1,tree2: decision-tree;
X’: set-of-examples; A’: set-of-features end-var if (stopping_criterion (X)) then tree1 := create_leaf_tree (X)
else amax := feature_selection (X,A);
tree1 := create_tree (X, amax);
for-all val in values (amax) do
X’ := select_examples (X,amax,val);
A’ := A \ {amax};
tree2 := TDIDT (X’,A’);
tree1 := add_branch (tree1,tree2,val)
end-for end-if return (tree1)
end-function
DTsDTs
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Feature Selection CriteriaFeature Selection Criteria Functions derived from Information
Theory:– Information Gain, Gain Ratio (Quinlan86)
Functions derived from Distance Measures– Gini Diversity Index (Breiman et al. 84)
– RLM (López de Mántaras 91)
Statistically-based– Chi-square test (Sestito & Dillon 94)
– Symmetrical Tau (Zhou & Dillon 91)
RELIEFF-IG: variant of RELIEFF (Kononenko 94)
Functions derived from Information Theory:– Information Gain, Gain Ratio (Quinlan86)
Functions derived from Distance Measures– Gini Diversity Index (Breiman et al. 84)
– RLM (López de Mántaras 91)
Statistically-based– Chi-square test (Sestito & Dillon 94)
– Symmetrical Tau (Zhou & Dillon 91)
RELIEFF-IG: variant of RELIEFF (Kononenko 94)
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Information GainInformation GainDecisionTreesDecisionTrees
(Quinlan79)(Quinlan79)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Information Gain(2)Information Gain(2)
DecisionTreesDecisionTrees
(Quinlan79)(Quinlan79)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Gain RatioGain RatioDecisionTreesDecisionTrees
(Quinlan86)(Quinlan86)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
RELIEFRELIEF
DecisionTreesDecisionTrees
(Kira & Rendell, 1992)(Kira & Rendell, 1992)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
RELIEFFRELIEFFDecisionTreesDecisionTrees
(Kononenko, 1994)(Kononenko, 1994)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
RELIEFF-IGRELIEFF-IGDecisionTreesDecisionTrees
(Màrquez, 1999)(Màrquez, 1999)
• RELIEFF + the distance measure used for calculating the nearest hits/misses does not treat all attributes equally ( it weights the attributes according to the IG measure).
• RELIEFF + the distance measure used for calculating the nearest hits/misses does not treat all attributes equally ( it weights the attributes according to the IG measure).
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Extensions of DTsExtensions of DTsDecisionTreesDecisionTrees
• (pre/post) Pruning
• Minimize the effect of the greedy approach: lookahead
• Non-lineal splits
• Combination of multiple models
• etc.
• (pre/post) Pruning
• Minimize the effect of the greedy approach: lookahead
• Non-lineal splits
• Combination of multiple models
• etc.
(Murthy 95)(Murthy 95)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Decision Trees and NLPDecision Trees and NLP
• Speech processing (Bahl et al. 89; Bakiri & Dietterich 99)
• POS Tagging (Cardie 93, Schmid 94b; Magerman 95; Màrquez & Rodríguez 95,97; Màrquez et al. 00)
• Word sense disambiguation (Brown et al. 91; Cardie 93; Mooney 96)
• Parsing (Magerman 95,96; Haruno et al. 98,99)
• Text categorization (Lewis & Ringuette 94; Weiss et al. 99)
• Text summarization (Mani & Bloedorn 98)
• Dialogue act tagging (Samuel et al. 98)
• Speech processing (Bahl et al. 89; Bakiri & Dietterich 99)
• POS Tagging (Cardie 93, Schmid 94b; Magerman 95; Màrquez & Rodríguez 95,97; Màrquez et al. 00)
• Word sense disambiguation (Brown et al. 91; Cardie 93; Mooney 96)
• Parsing (Magerman 95,96; Haruno et al. 98,99)
• Text categorization (Lewis & Ringuette 94; Weiss et al. 99)
• Text summarization (Mani & Bloedorn 98)
• Dialogue act tagging (Samuel et al. 98)
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Decision Trees and NLPDecision Trees and NLP
• Noun phrase coreference (Aone & Benett 95; Mc Carthy & Lehnert 95)
• Discourse analysis in information extraction (Soderland & Lehnert 94)
• Cue phrase identification in text and speech (Litman 94; Siegel & McKeown 94)
• Verb classification in Machine Translation (Tanaka 96; Siegel 97)
• More recent applications of DTs to NLP: but combined in a boosting framework (we will see it in following sessions)
• Noun phrase coreference (Aone & Benett 95; Mc Carthy & Lehnert 95)
• Discourse analysis in information extraction (Soderland & Lehnert 94)
• Cue phrase identification in text and speech (Litman 94; Siegel & McKeown 94)
• Verb classification in Machine Translation (Tanaka 96; Siegel 97)
• More recent applications of DTs to NLP: but combined in a boosting framework (we will see it in following sessions)
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Example: POS Tagging using DTExample: POS Tagging using DT
DecisionTreesDecisionTrees
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back streetNNVBNNVB
JJVBJJVB
NNVBNNVB
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
POS TaggingPOS Tagging
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
POS Tagging using Decision Trees
POS Tagging using Decision Trees
Language Model
Disambiguation Algorithm
Rawtext
Taggedtext
Morphologicalanalysis …
POS tagging
DecisionTreesDecisionTrees
(Màrquez, PhD 1999)(Màrquez, PhD 1999)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Disambiguation Algorithm
Rawtext
Taggedtext
Morphologicalanalysis …
POS tagging
Decision Trees
POS Tagging using Decision Trees
POS Tagging using Decision Trees
DecisionTreesDecisionTrees
(Màrquez, PhD 1999)(Màrquez, PhD 1999)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
…
Language Model
RTTSTT
RELAX
Rawtext
Taggedtext
Morphologicalanalysis
POS tagging
POS Tagging using Decision Trees
POS Tagging using Decision Trees
DecisionTreesDecisionTrees
(Màrquez, PhD 1999)(Màrquez, PhD 1999)
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
DT-based Language ModellingDT-based Language Modelling
root
P(IN)=0.81P(RB)=0.19Word Form
leaf
P(IN)=0.83P(RB)=0.17tag(+1)
P(IN)=0.13P(RB)=0.87tag(+2)
P(IN)=0.013P(RB)=0.987
“As”,“as”
RB
IN
others
others
...
...
“preposition-adverb” tree“preposition-adverb” tree
Statistical interpretation:Statistical interpretation:
P( RB | word=“A/as” & tag(+1)=RB & tag(+2)=IN) = 0.987
P( IN | word=“A/as” & tag(+1)=RB & tag(+2)=IN) = 0.013^
^
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
root
P(IN)=0.81P(RB)=0.19Word Form
leaf
P(IN)=0.83P(RB)=0.17tag(+1)
P(IN)=0.13P(RB)=0.87tag(+2)
P(IN)=0.013P(RB)=0.987
“As”,“as”
RB
IN
others
others
...
...“as_RB much_RB as_IN”
Collocations:Collocations:
“as_RB well_RB as_IN”
“as_RB soon_RB as_IN”
DT-based Language ModellingDT-based Language Modelling“preposition-adverb” tree“preposition-adverb” tree
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Language Modelling using DTsLanguage Modelling using DTs
• Algorithm: Top-Down Induction of Decision Trees (TDIDT). Supervised learning– CART (Breiman et al. 84), C4.5 (Quinlan 95), etc.
• Attributes: Local context (-3,+2) tokens
• Particular implementation:– Branch-merging– CART post-pruning– Smoothing– Attributes with many values– Several functions for attribute selection
• Algorithm: Top-Down Induction of Decision Trees (TDIDT). Supervised learning– CART (Breiman et al. 84), C4.5 (Quinlan 95), etc.
• Attributes: Local context (-3,+2) tokens
• Particular implementation:– Branch-merging– CART post-pruning– Smoothing– Attributes with many values– Several functions for attribute selection
Minimizing the effect of over-fitting, data fragmentation & sparseness
Minimizing the effect of over-fitting, data fragmentation & sparseness
• Granularity? Ambiguity class level– adjective-noun, adjective-noun-verb, etc.
• Granularity? Ambiguity class level– adjective-noun, adjective-noun-verb, etc.
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Model Evaluation Model Evaluation
• 1,170,000 words
• Tagset size: 45 tags
• Noise: 2-3% of mistagged words
• 49,000 word-form frequency lexicon– Manual filtering of 200 most frequent
entries– 36.4% ambiguous words– 2.44 (1.52) average tags per word
• 243 ambiguity classes
• 1,170,000 words
• Tagset size: 45 tags
• Noise: 2-3% of mistagged words
• 49,000 word-form frequency lexicon– Manual filtering of 200 most frequent
entries– 36.4% ambiguous words– 2.44 (1.52) average tags per word
• 243 ambiguity classes
The Wall Street Journal (WSJ) annotated corpusThe Wall Street Journal (WSJ) annotated corpus
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Model Evaluation Model Evaluation The Wall Street Journal (WSJ) annotated corpusThe Wall Street Journal (WSJ) annotated corpus
50% 60% 70% 80% 90% 95% 99% 100%
# ambiguityclasses 8 11 14 19 37 58 113 243
Number of ambiguity classes that cover x% of the training corpusNumber of ambiguity classes that cover x% of the training corpus
2-tags 3-tags 4-tags 5-tags 6-tags
# ambiguityclasses 103 90 35 12 3
Arity of the classification problemsArity of the classification problems
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
12 Ambiguity Classes12 Ambiguity Classes
They cover 57.90% of the ambiguous occurrences!They cover 57.90% of the ambiguous occurrences!
Experimental setting: 10-fold cross validation Experimental setting: 10-fold cross validation
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
N-fold Cross ValidationN-fold Cross ValidationDecisionTreesDecisionTrees
Divide the training set S into a partition of n equal-size disjoint subsets: s1, s2, …, sn
for i:=1 to N do learn and test a classifier using:
training_set := Usj for all j different from i
validation_set :=si
end_forreturn: the average accuracy from the n experiments
Which is a good value for N? (2-10-...)
Extreme case (N=training set size): Leave-one-out
Divide the training set S into a partition of n equal-size disjoint subsets: s1, s2, …, sn
for i:=1 to N do learn and test a classifier using:
training_set := Usj for all j different from i
validation_set :=si
end_forreturn: the average accuracy from the n experiments
Which is a good value for N? (2-10-...)
Extreme case (N=training set size): Leave-one-out
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Size: Number of NodesSize: Number of Nodes
22,095
10,674
5,715
0
5,000
10,000
15,000
20,000
25,000
Nu
mb
er o
f n
od
es
Basic algorithm Merging Pruning
Average size reduction: 51.7% 46.5%Average size reduction: 51.7% 46.5%74.1% (total)74.1% (total)
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
8,49 8,36
28,83
8,3
4
9
14
19
24
29
34
% E
rro
r ra
te
Low er Bound Basic AlgorithmMerging Pruning
AccuracyAccuracy
(at least) No loss in accuracy(at least) No loss in accuracy
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
8,48,358,528,698,588,318,24
11,63
8,9
17,24
02468
101214161820
Err
or
rate
%
Average error rate
Feature Selection CriteriaFeature Selection Criteria
Statistically equivalentStatistically equivalent
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Tree Base = Statistical Component– RTT: Reductionistic Tree based tagger
– STT: Statistical Tree based tagger
Tree Base = Statistical Component– RTT: Reductionistic Tree based tagger
– STT: Statistical Tree based tagger
Tree Base = Compatibility Constraints– RELAX: Relaxation-Labelling based tagger
Tree Base = Compatibility Constraints– RELAX: Relaxation-Labelling based tagger
(Màrquez & Rodríguez 99)(Màrquez & Rodríguez 99)
(Màrquez & Rodríguez 97)(Màrquez & Rodríguez 97)
(Màrquez & Padró 97)(Màrquez & Padró 97)
DT-based POS TaggersDT-based POS Taggers
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
RTT RTT
Rawtext
Morphologicalanalysis
Taggedtext
Classify Update Filter
Language Model
Disambiguation
stop?
(Màrquez & Rodríguez 97)
yesno
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
STTSTT
N-grams (trigrams)
(Màrquez & Rodríguez 99)
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
STTSTT
Contextual probabilities
(Màrquez & Rodríguez 99)
)|( kk CtP
)|(~
kk CtP );( kkAC CtTk
Estimated using Decision Trees
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Taggedtext
Rawtext
Morphologicalanalysis
STTSTT(Màrquez & Rodríguez 99)
Viterbialgorithm
Language Model
Disambiguation
Lexicalprobs. +
Contextual probs.
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Viterbialgorithm
Taggedtext
Rawtext
Morphologicalanalysis
Language Model
Disambiguation
N-gramsLexicalprobs. ++
STT+STT+
(Màrquez & Rodríguez 99)
Contextual probs.
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Tree Base = Statistical Component– RTT: Reductionistic Tree based tagger
– STT: Statistical Tree based tagger
Tree Base = Statistical Component– RTT: Reductionistic Tree based tagger
– STT: Statistical Tree based tagger
Tree Base = Compatibility Constraints– RELAX: Relaxation-Labelling based tagger
Tree Base = Compatibility Constraints– RELAX: Relaxation-Labelling based tagger
(Màrquez & Rodríguez 99)(Màrquez & Rodríguez 99)
(Màrquez & Rodríguez 97)(Màrquez & Rodríguez 97)
(Màrquez & Padró 97)(Màrquez & Padró 97)
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
RELAX RELAX
Relaxation Labelling
(Padró 96)
Taggedtext
Rawtext
Morphologicalanalysis
Language Model
Disambiguation
(Màrquez & Padró 97)
Linguisticrules
N-grams ++
Set of constraints
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
RELAX RELAX(Màrquez & Padró 97)
root
P(IN)=0.81P(RB)=0.19Word Form
leaf
P(IN)=0.83P(RB)=0.17
tag(+1)
P(IN)=0.13P(RB)=0.87
tag(+2)
P(IN)=0.013P(RB)=0.987
“As”,“as”
RB
IN
others
others
...
...
Compatibility values: estimated using Mutual InformationCompatibility values: estimated using Mutual Information
Translating Tress into ConstraintsTranslating Tress into Constraints
-5.81 (IN) (0 “as” “As”) (1 RB) (2 IN)
-5.81 (IN) (0 “as” “As”) (1 RB) (2 IN)
2.37 (RB) (0 “as” “As”) (1 RB) (2 IN)
2.37 (RB) (0 “as” “As”) (1 RB) (2 IN)
Positive constraintPositive constraint Negative constraintNegative constraint
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Experimental EvaluationExperimental Evaluation
• Training set: 1,121,776 words• Test set: 51,990 words• Closed vocabulary assumption• Base of 194 trees
– Covering 99.5% of the ambiguous occurrences– Storage requirement: 565 Kb– Acquisition time: 12 CPU-hours (Common LISP / Sparc10 workstation)
• Training set: 1,121,776 words• Test set: 51,990 words• Closed vocabulary assumption• Base of 194 trees
– Covering 99.5% of the ambiguous occurrences– Storage requirement: 565 Kb– Acquisition time: 12 CPU-hours (Common LISP / Sparc10 workstation)
Using the WSJ annotated corpusUsing the WSJ annotated corpus
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
• 67.52% error reduction with respect to MFT
• Accuracy = 94.45% (ambiguous) 97.29% (overall)
• Comparable to the best state-of-the-art automatic POS taggers
• Recall = 98.22% Precision = 95.73% (1.08 tags/word)
• 67.52% error reduction with respect to MFT
• Accuracy = 94.45% (ambiguous) 97.29% (overall)
• Comparable to the best state-of-the-art automatic POS taggers
• Recall = 98.22% Precision = 95.73% (1.08 tags/word)
RTT resultsRTT results
+ RTT allows to state a tradeoff between precision and recall
+ RTT allows to state a tradeoff between precision and recall
Experimental EvaluationExperimental EvaluationDecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
• Comparable to those of RTT• Comparable to those of RTT
STT resultsSTT results
+ STT allows the incorporation of N-gram information
some problems of sparseness and coherence of the resulting tag sequence can be alleviated
+ STT allows the incorporation of N-gram information
some problems of sparseness and coherence of the resulting tag sequence can be alleviated
• Better than those of RTT and STT• Better than those of RTT and STT
STT+ resultsSTT+ results
Experimental EvaluationExperimental Evaluation
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
• Translation of 44 representative trees covering 84% of the examples = 8,473 constraints
• Addition of:– bigrams (2,808 binary constraints)
– trigrams (52,161 ternary constraints)
– linguistically-motivated manual constraints (20)
• Translation of 44 representative trees covering 84% of the examples = 8,473 constraints
• Addition of:– bigrams (2,808 binary constraints)
– trigrams (52,161 ternary constraints)
– linguistically-motivated manual constraints (20)
Including trees into RELAXIncluding trees into RELAX
Experimental EvaluationExperimental Evaluation
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Accuracy of RELAXAccuracy of RELAX
MFT B T BT C BC TC BTC
Ambig. 85.31 91.35 91.82 91.92 91.96 92.72 92.82 92.55
Overall 94.66 96.86 97.03 97.06 97.08 97.36 97.39 97.29
MFT= baseline, B=bigrams, T=trigrams, C=“tree constraints”
H BH TH BTH CH BCH TCH BTCH
Ambig. 86.41 91.88 92.04 92.32 91.97 92.76 92.98 92.71
Overall 95.06 97.05 97.11 97.21 97.08 97.37 97.45 97.35
H = set of 20 hand-written linguistic rules
91.35 92.7291.82 92.82
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
Decision Trees: SummaryDecision Trees: Summary
• Advantages
– Acquires symbolic knowledge in a understandable way
– Very well studied ML algorithms and variants
– Can be easily translated into rules
– Existence of available software: C4.5, C5.0, etc.
– Can be easily integrated into an ensemble
• Advantages
– Acquires symbolic knowledge in a understandable way
– Very well studied ML algorithms and variants
– Can be easily translated into rules
– Existence of available software: C4.5, C5.0, etc.
– Can be easily integrated into an ensemble
DecisionTreesDecisionTrees
EMNLP’02 11/11/2002 EMNLP’02 11/11/2002
• Drawbacks
– Computationally expensive when scaling to large natural language domains: training examples, features, etc.
– Data sparseness and data fragmentation: the problem of the small disjuncts => Probability estimation
– DTs is a model with high variance (unstable)
– Tendency to overfit training data: pruning is necessary
– Requires quite a big effort in tuning the model
• Drawbacks
– Computationally expensive when scaling to large natural language domains: training examples, features, etc.
– Data sparseness and data fragmentation: the problem of the small disjuncts => Probability estimation
– DTs is a model with high variance (unstable)
– Tendency to overfit training data: pruning is necessary
– Requires quite a big effort in tuning the model
DecisionTreesDecisionTrees
Decision Trees: SummaryDecision Trees: Summary