Upload
ida
View
30
Download
0
Embed Size (px)
DESCRIPTION
The Greedy Prepend Algorithm for Decision List Induction. Deniz Yuret Michael de la Maza. Overview. Decision Lists Greedy Prepend Algorithm Opus search and UCI problems Version space search and secondary structure prediction Limited look-ahead search and Turkish morphology disambiguation. - PowerPoint PPT Presentation
Citation preview
The Greedy Prepend Algorithm for Decision List Induction
Deniz Yuret
Michael de la Maza
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
Introduction to Decision Lists
• Prototypical machine learning problem:– Decide democrat or republican for 435
representatives based on 16 votes.
Class Name: 2 (democrat, republican)1. handicapped-infants: 2 (y,n)2. water-project-cost-sharing: 2 (y,n)3. adoption-of-the-budget-resolution: 2 (y,n)4. physician-fee-freeze: 2 (y,n)5. el-salvador-aid: 2 (y,n)6. religious-groups-in-schools: 2 (y,n)…16. export-administration-act-south-africa: 2 (y,n)
Introduction to Decision Lists
• Prototypical machine learning problem:– Decide democrat or republican for 435
representatives based on 16 votes.
1. If adoption-of-the-budget-resolution = y and anti-satellite-test-ban = n and water-project-cost-sharing = y then democrat2. If physician-fee-freeze = y then republican3. If TRUE then democrat
Alternative Representations
• Decision trees:
Alternative Representations
• CNF:
• DNF:
Alternative Representations
• For 0 < k < n and n > 2,
k-CNF(n) U k-DNF(n) is a subset of k-DL(n)
• For 0 < k < n and n > 2,
k-DT(n) is a subset of k-CNF(n) ∩ k-DNF(n)
• k-DT(n) is a subset of k-DL(n)
Rivest 1987
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
Decision List Induction
• Start with an empty decision list or a default rule.
• Keep adding the best rule that covers the unclassified and misclassified cases.
Design Decisions:
• Where to add the new rules (front, back)
• Criteria for best rule
• Search algorithm for best rule
The Greedy Prepend Algorithm
GPA(data)1. dlist = NIL2. default-class = most-common-class(data)3. rule = [ if true then default-class ]4. while gain(rule, dlist, data) > 05. do dlist = prepend(rule, dlist)6. rule = max-gain-rule(dlist, data)7. return dlist
The Greedy Prepend Algorithm
• Starts with a default rule that picks the most common class
• Prepends subsequent rules to the front of the decision list
• The best rule is the one with maximum gain (increase in number of correctly classified instances)
• Several search algorithms implemented
Rule Search
• The default rule predicts all instances to belong to the most common category
+ -
Correct
Assignments
Partition with respect to the
Base Rule
False Assignments
Training Set
Rule Search
• At each step add the maximum gain rule
+ -
+
+
-
-
Partition with respect to the Decision List
Partition with respect to the
Next Rule
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
Opus Search: Simple tree
Opus Search: Fixed order tree
Opus Search: Optimal pruning
GPA-Opus on UCI Problems
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD??????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD-?????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD-?????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD--????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD--????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD---???????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----H???????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----H???????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HH??????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HHHHHHHHHH------EEEEE------?
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HHHHHHHHHH------EEEEE-------
GPA Rules
• The first three rules of the sequence-to-structure decision list – 58.86% performance (of 66.36%)
GPA Rule 1
• Everything => Loop
GPA Rule 2
HELIX
L4 L3 L2 L1 0 R1 R2 R3 R4
* * !GLY !GLY !ASN !GLY !PRO !PRO !PRO
!PRO !GLY !PRO
!PRO
!SER
(Non-polaror large)
GPA Rule 3
STRAND
L4 L3 L2 L1 0 R1 R2 R3 R4
!LEU !ALA !ASP !ALA CYS !PRO !ARG !LEU !LEU
!LEU
!GLN
!ASP ILE !GLN !MET !MET
!GLU
!GLY LEU !GLU
!PRO PHE !LYS
TRP !PRO
TYR
(Non-Polar and Not
Charged)
VAL
(Non-polar)
GPA-Opus not feasible for secondary structure prediction
• 9 positions
• 20 possible amino-acids per position
• Size of rule space:– With only pos=val type attributes: 21^9– If we include disjunctions: 2^180
GPA Version Space Search
Searching for a candidate rule:• Pick a random instance• If the instance is currently misclassified
and candidate rule corrects it: generalize candidate rule to include instance
• If the instance is currently correct and candidate rule changes classification: specialize candidate rule to exclude instance
GPA Secondary Structure Prediction Results
• PhD 72.3
• NNSSP 71.7
• GPA 69.2
• DSC69.1
• Predator 69.0
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
Morphological Analyzer for Turkish
masalı• masal+Noun+A3sg+Pnon+Acc (= the story)• masal+Noun+A3sg+P3sg+Nom (= his story)• masa+Noun+A3sg+Pnon+Nom^DB+Adj+With (= with
tables)
• Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing
• Oflazer, K., Hakkani-Tür, D. Z., and Tür, G. (1999) Design for a turkish treebank. EACL’99
• Kenneth R. Beesley and Lauri Karttunen, Finite State Morphology, CSLI Publications, 2003
Features, IGs and Tags
• 126 unique features• 9129 unique IGs
• ∞ unique tags• 11084 distinct tags observed
in 1M word training corpus
masa+Noun+A3sg+Pnon+Nom^DB+Adj+With
stemfeatures features
inflectional group (IG) IGderivationalboundary
tag
Morphological disambiguation
• Task: pick correct parse given context1. masal+Noun+A3sg+Pnon+Acc
2. masal+Noun+A3sg+P3sg+Nom
3. masa+Noun+A3sg+Pnon+Nom^DB+Adj+With
– Uzun masalı anlat Tell the long story– Uzun masalı bitti His long story ended– Uzun masalı oda Room with long table
Morphological disambiguation
• Task: pick correct parse given context1. masal+Noun+A3sg+Pnon+Acc
2. masal+Noun+A3sg+P3sg+Nom
3. masa+Noun+A3sg+Pnon+Nom^DB+Adj+With
Key Idea
Build a separate classifier for each feature.
GPA on Morphological Disambiguation
1. If (W = çok) and (R1 = +DA)
Then W has +Det
2. If (L1 = pek)
Then W has +Det
3. If (W = +AzI)
Then W does not have +Det
4. If (W = çok)
Then W does not have +Det
5. If TRUE
Then W has +Det
• “pek çok alanda”(R1)
• “pek çok insan”(R2)
• “insan çok daha”(R4)
GPA-Opus not feasible
Attributes for a five word window:• The exact word string (e.g. W=Ali'nin)• The lowercase version (e.g. W=ali'nin)• All suffixes (e.g. W=+n, W=+In, W=+nIn,
W=+'nIn, etc.)• Character types (e.g. Ali'nin would be
described with W=UPPER-FIRST, W=LOWER-MID, W=APOS-
MID, W=LOWERLAST)
Average 40 features per instance.
GPA limited look-ahead search
• New rules are restricted to adding one new feature to existing rules in the decision list
GPA Turkish morphological disambiguation results
• Test corpus: 1000 words, hand tagged
• Accuracy: 95.87% (conf. int: 94.57-97.08)
• Better than the training data !?
Contributions and Future Work
• Established GPA as a competitive alternative to SVM’s, C4.5 etc.
• Need theory on why the best-gain rule does well.
• Need to study robustness to irrelevant or redundant attributes.
• Need to speed up the application of the resulting decision lists (convert to FSM?)