View
217
Download
0
Embed Size (px)
Citation preview
2/20/2008 Prof. Hilfinger CS164 Lecture 12 1
Earley’s Algorithm: General Context-Free Parsing
Lecture 12P. N. Hilfinger
2/20/2008 Prof. Hilfinger CS164 Lecture 12 2
Parsing General Context-Free Grammars
• Shift-reduce parsing can work for most practical applications.
• However, one must sometimes munge the grammar, though not as much as LL(1).
• Cannot handle ambiguity, nor situations where resolving ambiguities requires looking far ahead.
• Today, we’ll look at a method that can: Earley’s Algorithm.
• In fact, shift-reduce parsing is a highly optimized special case of this algorithm.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 3
Earley’s Algorithm: Basic Idea
• Scan tokens left-to-right.• At each point, keep track of all possible
subtrees that could include the current point in the input, based on everthing seen so far.
• At the end of the input, if there is a tree that is rooted at the start symbol, we’ve found a parse (possibly many).
2/20/2008 Prof. Hilfinger CS164 Lecture 12 4
Some Notation
• If input is s=s1s2…sn then “position k’’ in the input is just after sk and before sk+1, with position 0 at the beginning and position n at the end.
• At each input position, k, compute a set of items, where each item has the form
A , mwhere A is a production and 0≤m≤k.
• Together, the items in the set describe all subtrees of possible parse trees that begin or end at position k or have a child that does.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 5
Meaning of an Item
• An item A , m at position k means:1. The input between positions m and k matches .2. Depending on what sk+1…sn
is, there might be a
subtree formed from production A in the (or a) parse tree for the entire string.
3. So when is empty, means that there is a possible handle for A that ends at k.
• So that leaves the problem of figuring out what items to put in each set.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 6
Example
• Grammar: E E + T E T T T * int T int
• Input:
0 int 1 + 2 int 3 * 4 int 5
• At position 0, we expect to see an E to our right, formed from one of E’s productions.
• Plus, since an E can start with a T, we won’t be surprised by a T formed from one of its productions.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 7
Example: Getting Started
E T, 0E E + T, 0
int0 1
and (since E can start with T), also add itemsfor T
+
T int, 0T T * int, 0
Start with items for startsymbol E
2/20/2008 Prof. Hilfinger CS164 Lecture 12 8
Closure Items
• Whenever we have an item B A , j in item set m, it indicates that a substring producing A might start at this position.
• That’s what the item A , m means, so we also add those items (for each production A ) to item set m.
• These are called closure items.• Other items are kernel items.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 9
Example: Computing next item set
E T, 0E E + T, 0T int, 0T T * int, 0
int0 1
T int , 0
T T * int, 0
E T , 0
E E + T, 0
+
2/20/2008 Prof. Hilfinger CS164 Lecture 12 10
Computing next item set
• For each item of the form A c , k in item set m, where c=sm+1 is the next input symbol, insert A c , k in item set m+1.
• For each complete item, A , k in item set m+1, and each item B A , j back in item set k, add item B A , j to item set m+1. (When creating a parse tree, the A in this new item will have have children , as denoted by dashed red arrows in our examples).
2/20/2008 Prof. Hilfinger CS164 Lecture 12 11
Continuing the Example, Set 2
T int , 0
T T * int, 0
E T , 0
E E + T, 0
1+
2
E E + T, 0
T T * int, 2
T int, 2
closure items
int
2/20/2008 Prof. Hilfinger CS164 Lecture 12 12
Continuing the Example, Set 3
2
E E + T, 0
T T * int, 2
T int, 2
int
T int , 2
T T * int, 2
E E + T , 0
3*
E E + T, 0from item set 0
2/20/2008 Prof. Hilfinger CS164 Lecture 12 13
Continuing the Example, Sets 4 & 5
T int , 2
T T * int, 2
E E + T , 0
3*
E E + T, 0
T T * int, 2
4
T T * int , 2
5
int
T T * int, 2
E E + T , 0
E E + T, 0ACCEPT!
2/20/2008 Prof. Hilfinger CS164 Lecture 12 14
Accepting the String
• In the last item set, have a completed item for the start symbol that started in set 0.
• That means “the input between 0 and end matches an entire production for the start symbol,” so the string parses correctly.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 15
Retrieving a Parse Tree or Derivation
• Start with a completed item in the last set that produces the whole input (has form S…,0 for start symbol S).
• Follow the red arrows to find how to expand that symbol.
• Work backwards through the sets to find the expansions of the other nonterminals.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 16
Getting a Tree from our Example (I)
T T * int , 2
5
int
T T * int, 2
E E + T , 0
E E + T, 0start here
E
E + T
T
* int
To find out howto expand this T,go back to chart3 (before * int)
2/20/2008 Prof. Hilfinger CS164 Lecture 12 17
Getting a Tree from our Example (II)
int
T int , 2
T T * int, 2
E E + T , 0
3
E E + T, 0
E
E + T
T
* intint
To find out howto expand this E,go back to chart1 (before +)
2/20/2008 Prof. Hilfinger CS164 Lecture 12 18
Figuring out Where to Look
• In the last slide, we had to figure out where to look for the derivation of the E in E + T
• We used the items
T T * int, 2 and T int , 2
to get the T in E + T, both of which tell us that the T started after item set #2.
• And since + is a terminal, we then have to go back one more.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 19
Getting a Tree from our Example (III)
E
E T
T
* intint
int 1
T int , 0
T T * int, 0
E T , 0
E E + T, 0
start here
T
+int
2/20/2008 Prof. Hilfinger CS164 Lecture 12 20
An Ambiguous Grammar (I)
• Grammar: E E + E E E * E E int
• Input:
0 int 1 + 2 int 3 * 4 int 5
E int, 0E E + E, 0E E * E, 0
E int , 0E E + E, 0E E * E, 0
0 int 1
2/20/2008 Prof. Hilfinger CS164 Lecture 12 21
An Ambiguous Grammar (II)
E int , 0E E + E, 0E E * E, 0
1 + 2 int 3
E E + E, 0E int, 2E E + E, 2E E * E, 2
E int , 2E E + E, 2E E * E, 2E E + E , 0E E + E, 0E E * E, 0
2/20/2008 Prof. Hilfinger CS164 Lecture 12 22
An Ambiguous Grammar (III)
3 * 4 int 5
E int , 2E E + E, 2E E * E, 2E E + E , 0E E + E, 0E E * E, 0
E E * E, 2E E * E, 0E int, 4E E + E, 4E E * E, 4
E int , 4E E * E , 2E E * E , 0E E + E, 4E E * E, 4E E + E , 0
There are two ways to produce the E starting at 0, reflectingambiguity.
2/20/2008 Prof. Hilfinger CS164 Lecture 12 23
Just for Fun…
E E E E
Grammar is ferociously ambiguous:produces an infinite number ofways!
E , 0
E E E, 0
E E E, 0
E E E , 0! ! !
0
2/20/2008 Prof. Hilfinger CS164 Lecture 12 24
Relationship to LR Shift-Reduce Parsing
• With an LR(1) grammar, never have item sets where two items have the same production, with the dot in the same place, but different starting positions.
• So, ignoring the starting positions, there is a finite number of possible item sets.
• These are the states in the shift-reduce parser.