Upload
monty
View
38
Download
1
Embed Size (px)
DESCRIPTION
Learning to Transform Natural to Formal Languages. Rohit J. Kate Yuk Wah Wong Raymond J. Mooney. July 13, 2005. Introduction. Semantic Parsing : Transforming natural language sentences into executable complete formal representations - PowerPoint PPT Presentation
Citation preview
University of Texas at Austin
Machine Learning Group
Machine Learning GroupDepartment of Computer Sciences
University of Texas at Austin
Learning to Transform Natural to Formal Languages
July 13, 2005
Rohit J. Kate Yuk Wah Wong Raymond J. Mooney
2
Introduction
• Semantic Parsing: Transforming natural language sentences into executable complete formal representations
• Different from Semantic Role Labeling which involves only shallow semantic analysis
• Two application domains:– CLang: RoboCup Coach Language – GeoQuery: A Database Query Application
3
CLang: RoboCup Coach Language
• In RoboCup Coach competition teams compete to coach simulated players
• The coaching instructions are given in a formal language called CLang
Simulated soccer field
Coach
CLang
If the ball is in our penalty area, then all our players except player 4 should stay in our half.
((bpos (penalty-area our))(do (player-except our{4}) (pos (half our)))
Semantic Parsing
4
GeoQuery: A Database Query Application
• Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996]
User
How many cities are
there in the US?
Query answer(A, count(B, (city(B), loc(B, C), const(C, countryid(USA))),A))
Semantic Parsing
5
Outline
• Semantic Parsing using Transformation Rules
• Learning Transformation Rules
• Experiments
• Conclusions
6
Semantic Parsing using Transformation Rules
• SILT (Semantic Interpretation by Learning Transformations)
• Uses pattern-based transformation rules which map natural language phrases to formal language constructs
• Transformation rules are repeatedly applied to the sentence to construct its formal language expression
7
Formal Language GrammarNL: If our player 4 has the ball, our player 4 should shoot.CLang: ((bowner our {4}) (do our {4} shoot)) CLang Parse:
• Non-terminals: RULE, CONDITION, ACTION…• Terminals: bowner, our, 4…• Productions: RULE CONDITION DIRECTIVE DIRECTIVE do TEAM UNUM ACTION ACTION shoot
RULE
CONDITION DIRECTIVE
do TEAM UNUM ACTIONbowner TEAM UNUM
our 4 our 4 shoot
8
Transformation Rule Representation
• Rule has two components: a natural language pattern and an associated formal language template
• Two versions of SILT:– String-based rules: used to convert natural language
sentence directly to formal language– Tree-based rules: used to convert syntactic tree to formal
languageString-pattern TEAM UNUM has [1] ball
Template CONDITION (bowner TEAM {UNUM})Tree-pattern
Template CONDITION (bowner TEAM {UNUM})
NP VP
VBZ NP
DT NN
the ball
hasTEAM UNUM
S
word gap
9
Example of Semantic Parsing
If our player 4 has the ball, our player 4 should shoot.
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
10
Example of Semantic Parsing
If player 4 has the ball, player 4 should shoot .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
our ourTEAM
our
TEAM
our
11
Example of Semantic Parsing
If player 4 has the ball, player 4 should shoot .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
TEAM
our
TEAM
our
12
Example of Semantic Parsing
If has the ball, should shoot .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
TEAM
our
TEAM
our
player 4 player 4UNUM
4
UNUM
4
13
Example of Semantic Parsing
If has the ball, should shoot .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
TEAM
our
TEAM
our
UNUM
4
UNUM
4
14
ACTION
shoot
Example of Semantic Parsing
If has the ball, should .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
TEAM
our
TEAM
our
UNUM
4
UNUM
4
shoot
15
Example of Semantic Parsing
If has the ball, should .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
TEAM
our
TEAM
our
UNUM
4
UNUM
4
ACTION
shoot
16
Example of Semantic Parsing
If , should .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
TEAM
our
TEAM
our
UNUM
4
UNUM
4
ACTION
shoot
has the ballCONDITION
(bowner our {4})
17
Example of Semantic Parsing
If , should .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
TEAM
our
UNUM
4
ACTION
shoot
CONDITION
(bowner our {4})
18
Example of Semantic Parsing
If , .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
TEAM
our
UNUM
4
ACTION
shoot
CONDITION
(bowner our {4})
shouldDIRECTIVE
(do our {4} shoot)
19
Example of Semantic Parsing
If , .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
CONDITION
(bowner our {4})
DIRECTIVE
(do our {4} shoot)
20
Example of Semantic Parsing
If , .
our
TEAM our
player 4
UNUM 4
shoot
ACTIONshoot
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
TEAM UNUM should ACTION
DIRECTIVE (do TEAM {UNUM} ACTION)
If CONDITION, DIRECTIVE.
RULE (CONDITION DIRECTIVE)
CONDITION
(bowner our {4})
DIRECTIVE
(do our {4} shoot)
RULE
((bowner our {4}) (do our {4} shoot))
21
Learning Transformation Rules
• SILT induces rules from a corpora of NL sentences paired with their formal representations
• Patterns are learned for each production by bottom-up rule learning
• For every production:– Call those sentences positives whose formal
representations’ parses use that production – Call the remaining sentences negatives
22
Rule Learning for a Production
• SILT applies greedy-covering, bottom-up rule induction method that repeatedly generalizes positives until they start covering negatives
• The ball is in REGION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance.
• If the ball is in REGION and not in REGION then player 3 should intercept the ball.
• During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION .
• When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION .
• All players except the goalie should pass the ball to REGION if it is in RP18.
• If the ball is inside rectangle ( -54 , -36 , 0 , 36 ) then player 10 should position itself at REGION with a ball attraction of REGION .
• Player 2 should pass the ball to REGION if it is in REGION .
• If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8
should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should
pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball
REGION . • If it is before the kick off , after our goal or after the opponent's
goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass
the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one
else.
CONDITION (bpos REGION)positives negatives
23
Generalization of String Patterns
ACTION (pos REGION)
Pattern 1: Always position player UNUM at REGION .Pattern 2: Whenever the ball is in REGION, position player
UNUM near the REGION .
Find the highest scoring common subsequence:
)(*)()( gapswordofsumclengthcscore
24
Generalization of String Patterns
ACTION (pos REGION)
Pattern 1: Always position player UNUM at REGION .Pattern 2: Whenever the ball is in REGION, position player
UNUM near the REGION .
Find the highest scoring common subsequence:
Generalization: position player UNUM [2] REGION .
)(*)()( gapswordofsumclengthcscore
25
Generalization of Tree Patterns
REGION (penalty-area TEAM)
Pattern 1: Pattern 2
Find common subgraphs.
NP
NP NN NN
TEAM POS penalty box
’s
NP
PRP$ NN NN
TEAM penalty area
26
Generalization of Tree Patterns
REGION (penalty-area TEAM)
Pattern 1: Pattern 2
Find common subgraphs.
NP
NP NN NN
TEAM POS penalty box
’s
NP
PRP$ NN NN
TEAM penalty area
NP
TEAM
NN
penalty
NNGeneralization:
*
27
Rule Learning for a Production
• If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8
should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should
pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball
REGION . • If it is before the kick off , after our goal or after the opponent's
goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass
the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one
else.
CONDITION (bpos REGION)positives negatives
Bottom-up Rule Learner
ball is [2] REGION
CONDITION (bpos REGION)
it is in REGION
CONDITION (bpos REGION)
• The ball is in REGION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance.
• If the ball is in REGION and not in REGION then player 3 should intercept the ball.
• During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION .
• When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION .
• All players except the goalie should pass the ball to REGION if it is in REGION.
• If the ball is inside REGION then player 10 should position itself at REGION with a ball attraction of REGION .
• Player 2 should pass the ball to REGION if it is in REGION .
28
Rule Learning for a Production
• If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8
should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should
pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball
REGION . • If it is before the kick off , after our goal or after the opponent's
goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass
the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one
else.
CONDITION (bpos REGION)positives negatives
Bottom-up Rule Learner
ball is [2] REGION
CONDITION (bpos REGION)
it is in REGION
CONDITION (bpos REGION)
• The CONDITION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance.
• If the CONDITION and not in REGION then player 3 should intercept the ball.
• During normal play if the CONDITION then player 7 , 9 and 11 should dribble the ball to the REGION .
• When the play mode is normal and the CONDITION then our player 2 should pass the ball to the REGION .
• All players except the goalie should pass the ball to REGION if CONDITION.
• If the CONDITION then player 10 should position itself at REGION with a ball attraction of REGION .
• Player 2 should pass the ball to REGION if CONDITION .
29
Rule Learning for All Productions
• Transformation rules for productions should cooperate globally to generate complete semantic parses
• Redundantly cover every positive example by β = 5 best rules
• Find the subset of these rules which best cooperate to generate complete semantic parses on the training data
)()()(*)()(
rnegrposrposrposrgoodness
coverage accuracy
30
Experimental Corpora
• CLang – 300 randomly selected pieces of coaching advice from
the log files of the 2003 RoboCup Coach Competition– 22.52 words on average in NL sentences– 14.24 tokens on average in formal expressions
• GeoQuery [Zelle & Mooney, 1996] – 250 queries for the given U.S. geography database– 6.87 words on average in NL sentences– 5.32 tokens on average in formal expressions
31
Experimental Methodology
• Evaluated using standard 10-fold cross validation• Syntactic parses needed by tree-based version were
obtained by training Collins’ parser [Bikel, 2004] on WSJ treebank and gold-standard parses of training sentences
• Correctness– CLang: output exactly matches the correct representation– Geoquery: the resulting query retrieves the same answer as the
correct representation
• Metrics
| || |
ParsesCompletedParsesCompletedCorrectPrecision
||SentencesParses|Completed|CorrectRecall
32
Compared Systems
• CHILL – Learns control rules for shift-reduce parsing using
Inductive Logic Programming (ILP)– CHILLIN [Zelle & Mooney, 1996]
– COCKTAIL [Tang & Mooney, 2001]
• GEOBASE– Hand-built parser for GeoQuery [Borland International,
1988]
33
Precision Learning Curves for CLang
34
Recall Learning Curves for CLang
35
Precision Learning Curves for GeoQuery
36
Recall Learning Curves for GeoQuery
37
Related Work
• SCISSOR [Ge & Mooney, 2005]
– Integrates semantic and syntactic statistical parsing– Requires extensive annotations but gives better results
• PRECISE [Popescu et al., 2003]
– Designed to work specially on NL database interfaces
• Speech Recognition Community [Zue & Glass, 2000]
– Simpler queries in ATIS corpus
38
Conclusions
• New approach for semantic parsing, SILT, which uses transformation rules
• SILT learns transformation rules by doing bottom-up rule induction exploiting the target language grammar
• Tested on two very different domains, performs better than previous ILP-based approaches
39
Thank You!
Our corpora can be downloaded from: http://www.cs.utexas.edu/~ml/nldata.html
Questions??
40
F-measure Learning Curves for CLang
41
F-measure Learning Curves for GeoQuery
42
Extra Slide: Average Training Time in Minutes
CLang GeoQuery
SILT-string 3.2 0.35
CHILLIN 10.4 6.3
SILT-tree 81.4 21.5
COCKTAIL _ 39.6
43
Extra Slide: Variations of Rule Representation
• Context in the patterns:
in REGION
CONDITION (bpos REGION)
44
Extra Slide: Variations of Rule Representation
• Context in the patterns:
the ball in REGION
CONDITION (bpos REGION)
TEAM UNUM has [1] ball
CONDITION (bowner TEAM {UNUM})
in REGIONCONDITION
(bpos REGION)
TEAM UNUM has the ball
45
Extra Slide: Variations of Rule Representation
• Context in the patterns: • Templates with multiple productions:
TEAM UNUM has the ball in REGION
CONDITION (and (bwoner TEAM UNUM) (bpos REGION))
46
((bpos (penalty-area opp))(do (player-except our{4}) (pos (half our)))
Extra Slide: Experimental Methodology
• Correctness– CLang: output exactly matches the correct
representation– Geoquery: the resulting query retrieves the same
answer as the correct representation
If the ball is in our penalty area, all our players except player 4 should stay in our half.
((bpos (penalty-area our))(do (player-except our{4}) (pos (half our)))
Correct:
Output:
47
Extra Slide: Future Work
• Hard-matching symbolic patterns are sometimes too brittle, exploit string and tree kernels as classifiers [Lodhi et al., 2002]
• Unified implementation of string and tree-based versions for direct comparisons