View
214
Download
1
Embed Size (px)
Citation preview
Monadic CompositionalParsing with ContextUsing Maltese as a Case Study
Gordon J. Pace
September 2004
Monadic CompositionalParsing with ContextUsing Maltese as a Case Study
A Users’ Perspective.For the technical details, read the
paper!
Combinator-Based Programming
A Recipe Provide a class/type of objects you will be
talking about; Provide a few basic objects of that type; Provide a few combinators to combine
objects of the type into more complex ones.
Mix well and put on low heat for 35 min.
type Parser a = …
A basic type:A parser which consumes a
text stream, returning an object of type a
Combinator-Based Programming
fail :: Parser a
return :: a -> Parser a
one :: Parser Char
<+> :: Parser a -> Parser a -> Parser a
mthen :: Parser a -> Parser a -> Parser a
Combinator-Based Programming
Some basic parsers
fail :: Parser a
return :: a -> Parser a
one :: Parser Char
<+> :: Parser a -> Parser a -> Parser a
mthen :: Parser a -> Parser a -> Parser a
Combinator-Based Programming
And some combinatorswhich produce new parsers
out of old ones.
fail :: Parser a
return :: a -> Parser a
one :: Parser Char
<+> :: Parser a -> Parser a -> Parser a
mthen :: Parser a -> Parser a -> Parser a
A parser which always fails
Combinator-Based Programming
fail :: Parser a
return :: a -> Parser a
one :: Parser Char
<+> :: Parser a -> Parser a -> Parser a
mthen :: Parser a -> Parser a -> Parser a
Given a value, this parser leaves the input stream untouched,
returning the given value
Combinator-Based Programming
fail :: Parser a
return :: a -> Parser a
one :: Parser Char
<+> :: Parser a -> Parser a -> Parser a
mthen :: Parser a -> Parser a -> Parser a
A parser which returns the first character on the input stream
Combinator-Based Programming
fail :: Parser a
return :: a -> Parser a
one :: Parser Char
<+> :: Parser a -> Parser a -> Parser a
mthen :: Parser a -> Parser a -> Parser a
Non-deterministic choice between two parsers
Combinator-Based Programming
fail :: Parser a
return :: a -> Parser a
one :: Parser Char
<+> :: Parser a -> Parser a -> Parser a
mthen :: Parser a -> Parser a -> Parser a
Sequential composition of two parsers*
Combinator-Based Programming
* Actually, I’m cheating by somewhat simplifying the type …
p1 <*> p2 =
do
x1 <- p1
x2 <- p2
return (x1,x2)
Parse a pair of objects in sequence
New parsers are easy to derive
matchSat ::
(Char -> Bool) -> Parser Char
matchSat cond =
do
x <- one
if cond x
then return x
else fail
Parse a character, if it satisfies a given
condition
New parsers are easy to derive
matchChar c = matchSat (c==)
matchString “” = return “” matchString (c:cs) = do matchChar c matchString cs return (c:cs)
Parse a particular string
New parsers are easy to deriveParse a particular character
star :: Parser a -> Parser [a]
star p =
return []
<+>
do
(x,xs) <- p <*> star p
return (x:xs)
Kleene star
New parsers are easy to derive
p1 <**> p2 =
do
w1 <- p1
plus (parseSat isSpace)
w2 <- p2
return (w1,w2)
Space separated words
New parsers are easy to derive
Context: The Maltese Article
1. Basic form of definite article is il–.2. Nouns starting with a sun letter (xemxin)
transform the l to match the first letter of the noun.
3. Nouns starting with 2 or 3 consonants, the first being x or s, drop the initial i, and add an initial i to the noun.
4. Nouns starting with a vowel drop the initial i from the article.
5. The initial i is also dropped when the preceding word starts with a vowel.
Context: The Maltese Article
1. Basic form of definite article is il–.2. Nouns starting with a sun letter (xemxin)
transform the l to match the first letter of the noun.
3. Nouns starting with 2 or 3 consonants, the first being x or s, drop the initial i, and add an initial i to the noun.
4. Nouns starting with a vowel drop the initial i from the article.
5. The initial i is also dropped when the preceding word starts with a vowel.
Context
Context
Context
Context
Context: The Maltese Article
1. Basic form of definite article is il–.2. Nouns starting with a sun letter (xemxin)
transform the l to match the first letter of the noun.
3. Nouns starting with 2 or 3 consonants, the first being x or s, drop the initial i, and add an initial i to the noun.
4. Nouns starting with a vowel drop the initial i from the article.
5. The initial i is also dropped when the preceding word starts with a vowel.
Context
Context
Context
Context
How can we write a compositional parser –
one that parses a stand-alone article?
Looking at the Context
We would like to write:
definiteNoun =
article <**> noun
Looking at the Context
We can look at local context using other parsers:
article =
do
initial_i
c <- parseSat isXemxija
parseChar ‘-’
matchFuture (parseChar c)
Looking at the Context
We can look at local context using other parsers:
initial_i =
matchPast (
(parseSat isVowel <*> wordSep)
)`ifnot`
parseChar ‘i’
More Context: First Form Verbs
Example
rikeb
General case
c1 v1 c2 v2 c3
1st Person, Singular nirkeb n v1 c1 c2 v2 c3
2nd Person, Singular tirkeb t v1 c1 c2 v2 c3
3rd Person, Singular, Male jirkeb j v1 c1 c2 v2 c3
3rd Person, Singular, Female tirkeb t v1 c1 c2 v2 c3
1st Person, Plural nirkbu n v1 c1 c2 c3 u
2nd Person, Plural tirkbu t v1 c1 c2 c3 u
3rd Person, Plural jirkbu j v1 c1 c2 c3 u
More Context: First Form Verbs
Example
rikeb
General case
c1 v1 c2 v2 c3
1st Person, Singular nirkeb n v1 c1 c2 v2 c3
2nd Person, Singular tirkeb t v1 c1 c2 v2 c3
3rd Person, Singular, Male jirkeb j v1 c1 c2 v2 c3
3rd Person, Singular, Female tirkeb t v1 c1 c2 v2 c3
1st Person, Plural nirkbu n v1 c1 c2 c3 u
2nd Person, Plural tirkbu t v1 c1 c2 c3 u
3rd Person, Plural jirkbu j v1 c1 c2 c3 u
Context
Setting the Context
We would like to write something like:
sentence =
noun <**> verb <**> noun
Setting the Context
We set a context by setting attributes:
setAttribute (name, value)
getAttribute name
renameAttribute (name, name’)
Setting the Context
We set a context by setting attributes:
setAttribute (name, value)
getAttribute name
renameAttribute (name, name’)
Clashing values cause the parser to fail
Setting the Context
parsePersonSingular = do setAttribute ("SubjectNumber", "Singular") c <- anyCharOf ['n','t','j'] case c of 'n' -> setAttribute ("SubjectPerson" “1“) 'j' -> setAttributes [("SubjectPerson",“3"), ...] 't' -> setAttribute ("SubjectPerson",“2") <+> setAttributes [("SubjectPerson",“3"), ...]
Setting the Context
parseSubject = do n <- parseNoun renameAttributes [("Noun"++x, "Subject"++x) | x <- ["Gender", "Person", "Number"] ]parseObject = ...
parseSentence = parseSubject <**> parseVerb <**> parseObject
Contribution
Local context parsed using the same parser combinators;
Global, shared information shared via attributes;
All this done compositionally, and By enriching standard parser
combinators.
Conclusions
Optimise it, apply it to larger NL subset, compare with other, more traditional approaches.
Optimise the new context-aware combinators.
Extend it to work both as a language generator and parser.