Learning to Transform Natural to Formal Languages

Preview:

DESCRIPTION

Learning to Transform Natural to Formal Languages. Rohit J. Kate Yuk Wah Wong Raymond J. Mooney. July 13, 2005. Introduction. Semantic Parsing : Transforming natural language sentences into executable complete formal representations - PowerPoint PPT Presentation

Citation preview

University of Texas at Austin

Machine Learning Group

Machine Learning GroupDepartment of Computer Sciences

University of Texas at Austin

Learning to Transform Natural to Formal Languages

July 13, 2005

Rohit J. Kate Yuk Wah Wong Raymond J. Mooney

2

Introduction

• Semantic Parsing: Transforming natural language sentences into executable complete formal representations

• Different from Semantic Role Labeling which involves only shallow semantic analysis

• Two application domains:– CLang: RoboCup Coach Language – GeoQuery: A Database Query Application

3

CLang: RoboCup Coach Language

• In RoboCup Coach competition teams compete to coach simulated players

• The coaching instructions are given in a formal language called CLang

Simulated soccer field

Coach

CLang

If the ball is in our penalty area, then all our players except player 4 should stay in our half.

((bpos (penalty-area our))(do (player-except our{4}) (pos (half our)))

Semantic Parsing

4

GeoQuery: A Database Query Application

• Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996]

User

How many cities are

there in the US?

Query answer(A, count(B, (city(B), loc(B, C), const(C, countryid(USA))),A))

Semantic Parsing

5

Outline

• Semantic Parsing using Transformation Rules

• Learning Transformation Rules

• Experiments

• Conclusions

6

Semantic Parsing using Transformation Rules

• SILT (Semantic Interpretation by Learning Transformations)

• Uses pattern-based transformation rules which map natural language phrases to formal language constructs

• Transformation rules are repeatedly applied to the sentence to construct its formal language expression

7

Formal Language GrammarNL: If our player 4 has the ball, our player 4 should shoot.CLang: ((bowner our {4}) (do our {4} shoot)) CLang Parse:

• Non-terminals: RULE, CONDITION, ACTION…• Terminals: bowner, our, 4…• Productions: RULE CONDITION DIRECTIVE DIRECTIVE do TEAM UNUM ACTION ACTION shoot

RULE

CONDITION DIRECTIVE

do TEAM UNUM ACTIONbowner TEAM UNUM

our 4 our 4 shoot

8

Transformation Rule Representation

• Rule has two components: a natural language pattern and an associated formal language template

• Two versions of SILT:– String-based rules: used to convert natural language

sentence directly to formal language– Tree-based rules: used to convert syntactic tree to formal

languageString-pattern TEAM UNUM has [1] ball

Template CONDITION (bowner TEAM {UNUM})Tree-pattern

Template CONDITION (bowner TEAM {UNUM})

NP VP

VBZ NP

DT NN

the ball

hasTEAM UNUM

S

word gap

9

Example of Semantic Parsing

If our player 4 has the ball, our player 4 should shoot.

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

10

Example of Semantic Parsing

If player 4 has the ball, player 4 should shoot .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

our ourTEAM

our

TEAM

our

11

Example of Semantic Parsing

If player 4 has the ball, player 4 should shoot .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

TEAM

our

TEAM

our

12

Example of Semantic Parsing

If has the ball, should shoot .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

TEAM

our

TEAM

our

player 4 player 4UNUM

4

UNUM

4

13

Example of Semantic Parsing

If has the ball, should shoot .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

TEAM

our

TEAM

our

UNUM

4

UNUM

4

14

ACTION

shoot

Example of Semantic Parsing

If has the ball, should .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

TEAM

our

TEAM

our

UNUM

4

UNUM

4

shoot

15

Example of Semantic Parsing

If has the ball, should .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

TEAM

our

TEAM

our

UNUM

4

UNUM

4

ACTION

shoot

16

Example of Semantic Parsing

If , should .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

TEAM

our

TEAM

our

UNUM

4

UNUM

4

ACTION

shoot

has the ballCONDITION

(bowner our {4})

17

Example of Semantic Parsing

If , should .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

TEAM

our

UNUM

4

ACTION

shoot

CONDITION

(bowner our {4})

18

Example of Semantic Parsing

If , .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

TEAM

our

UNUM

4

ACTION

shoot

CONDITION

(bowner our {4})

shouldDIRECTIVE

(do our {4} shoot)

19

Example of Semantic Parsing

If , .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

CONDITION

(bowner our {4})

DIRECTIVE

(do our {4} shoot)

20

Example of Semantic Parsing

If , .

our

TEAM our

player 4

UNUM 4

shoot

ACTIONshoot

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

TEAM UNUM should ACTION

DIRECTIVE (do TEAM {UNUM} ACTION)

If CONDITION, DIRECTIVE.

RULE (CONDITION DIRECTIVE)

CONDITION

(bowner our {4})

DIRECTIVE

(do our {4} shoot)

RULE

((bowner our {4}) (do our {4} shoot))

21

Learning Transformation Rules

• SILT induces rules from a corpora of NL sentences paired with their formal representations

• Patterns are learned for each production by bottom-up rule learning

• For every production:– Call those sentences positives whose formal

representations’ parses use that production – Call the remaining sentences negatives

22

Rule Learning for a Production

• SILT applies greedy-covering, bottom-up rule induction method that repeatedly generalizes positives until they start covering negatives

• The ball is in REGION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance.

• If the ball is in REGION and not in REGION then player 3 should intercept the ball.

• During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION .

• When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION .

• All players except the goalie should pass the ball to REGION if it is in RP18.

• If the ball is inside rectangle ( -54 , -36 , 0 , 36 ) then player 10 should position itself at REGION with a ball attraction of REGION .

• Player 2 should pass the ball to REGION if it is in REGION .

• If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8

should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should

pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball

REGION . • If it is before the kick off , after our goal or after the opponent's

goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass

the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one

else.

CONDITION (bpos REGION)positives negatives

23

Generalization of String Patterns

ACTION (pos REGION)

Pattern 1: Always position player UNUM at REGION .Pattern 2: Whenever the ball is in REGION, position player

UNUM near the REGION .

Find the highest scoring common subsequence:

)(*)()( gapswordofsumclengthcscore

24

Generalization of String Patterns

ACTION (pos REGION)

Pattern 1: Always position player UNUM at REGION .Pattern 2: Whenever the ball is in REGION, position player

UNUM near the REGION .

Find the highest scoring common subsequence:

Generalization: position player UNUM [2] REGION .

)(*)()( gapswordofsumclengthcscore

25

Generalization of Tree Patterns

REGION (penalty-area TEAM)

Pattern 1: Pattern 2

Find common subgraphs.

NP

NP NN NN

TEAM POS penalty box

’s

NP

PRP$ NN NN

TEAM penalty area

26

Generalization of Tree Patterns

REGION (penalty-area TEAM)

Pattern 1: Pattern 2

Find common subgraphs.

NP

NP NN NN

TEAM POS penalty box

’s

NP

PRP$ NN NN

TEAM penalty area

NP

TEAM

NN

penalty

NNGeneralization:

*

27

Rule Learning for a Production

• If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8

should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should

pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball

REGION . • If it is before the kick off , after our goal or after the opponent's

goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass

the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one

else.

CONDITION (bpos REGION)positives negatives

Bottom-up Rule Learner

ball is [2] REGION

CONDITION (bpos REGION)

it is in REGION

CONDITION (bpos REGION)

• The ball is in REGION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance.

• If the ball is in REGION and not in REGION then player 3 should intercept the ball.

• During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION .

• When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION .

• All players except the goalie should pass the ball to REGION if it is in REGION.

• If the ball is inside REGION then player 10 should position itself at REGION with a ball attraction of REGION .

• Player 2 should pass the ball to REGION if it is in REGION .

28

Rule Learning for a Production

• If our player 6 has the ball then he should take a shot on goal. • If player 4 has the ball , it should pass the ball to player 2 or 10. • If the condition DR5C3 is true , then player 2 , 3 , 7 and 8

should pass the ball to player 3. • During play on , if players 6 , 7 or 8 is in REGION , they should

pass the ball to players 9 , 10 or 11. • If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball

REGION . • If it is before the kick off , after our goal or after the opponent's

goal , position player 3 at REGION . • If the condition MDR4C9 is met , then players 4-6 should pass

the ball to player 9. • If Pass_11 then player 11 should pass to player 9 and no one

else.

CONDITION (bpos REGION)positives negatives

Bottom-up Rule Learner

ball is [2] REGION

CONDITION (bpos REGION)

it is in REGION

CONDITION (bpos REGION)

• The CONDITION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance.

• If the CONDITION and not in REGION then player 3 should intercept the ball.

• During normal play if the CONDITION then player 7 , 9 and 11 should dribble the ball to the REGION .

• When the play mode is normal and the CONDITION then our player 2 should pass the ball to the REGION .

• All players except the goalie should pass the ball to REGION if CONDITION.

• If the CONDITION then player 10 should position itself at REGION with a ball attraction of REGION .

• Player 2 should pass the ball to REGION if CONDITION .

29

Rule Learning for All Productions

• Transformation rules for productions should cooperate globally to generate complete semantic parses

• Redundantly cover every positive example by β = 5 best rules

• Find the subset of these rules which best cooperate to generate complete semantic parses on the training data

)()()(*)()(

rnegrposrposrposrgoodness

coverage accuracy

30

Experimental Corpora

• CLang – 300 randomly selected pieces of coaching advice from

the log files of the 2003 RoboCup Coach Competition– 22.52 words on average in NL sentences– 14.24 tokens on average in formal expressions

• GeoQuery [Zelle & Mooney, 1996] – 250 queries for the given U.S. geography database– 6.87 words on average in NL sentences– 5.32 tokens on average in formal expressions

31

Experimental Methodology

• Evaluated using standard 10-fold cross validation• Syntactic parses needed by tree-based version were

obtained by training Collins’ parser [Bikel, 2004] on WSJ treebank and gold-standard parses of training sentences

• Correctness– CLang: output exactly matches the correct representation– Geoquery: the resulting query retrieves the same answer as the

correct representation

• Metrics

| || |

ParsesCompletedParsesCompletedCorrectPrecision

||SentencesParses|Completed|CorrectRecall

32

Compared Systems

• CHILL – Learns control rules for shift-reduce parsing using

Inductive Logic Programming (ILP)– CHILLIN [Zelle & Mooney, 1996]

– COCKTAIL [Tang & Mooney, 2001]

• GEOBASE– Hand-built parser for GeoQuery [Borland International,

1988]

33

Precision Learning Curves for CLang

34

Recall Learning Curves for CLang

35

Precision Learning Curves for GeoQuery

36

Recall Learning Curves for GeoQuery

37

Related Work

• SCISSOR [Ge & Mooney, 2005]

– Integrates semantic and syntactic statistical parsing– Requires extensive annotations but gives better results

• PRECISE [Popescu et al., 2003]

– Designed to work specially on NL database interfaces

• Speech Recognition Community [Zue & Glass, 2000]

– Simpler queries in ATIS corpus

38

Conclusions

• New approach for semantic parsing, SILT, which uses transformation rules

• SILT learns transformation rules by doing bottom-up rule induction exploiting the target language grammar

• Tested on two very different domains, performs better than previous ILP-based approaches

39

Thank You!

Our corpora can be downloaded from: http://www.cs.utexas.edu/~ml/nldata.html

Questions??

40

F-measure Learning Curves for CLang

41

F-measure Learning Curves for GeoQuery

42

Extra Slide: Average Training Time in Minutes

CLang GeoQuery

SILT-string 3.2 0.35

CHILLIN 10.4 6.3

SILT-tree 81.4 21.5

COCKTAIL _ 39.6

43

Extra Slide: Variations of Rule Representation

• Context in the patterns:

in REGION

CONDITION (bpos REGION)

44

Extra Slide: Variations of Rule Representation

• Context in the patterns:

the ball in REGION

CONDITION (bpos REGION)

TEAM UNUM has [1] ball

CONDITION (bowner TEAM {UNUM})

in REGIONCONDITION

(bpos REGION)

TEAM UNUM has the ball

45

Extra Slide: Variations of Rule Representation

• Context in the patterns: • Templates with multiple productions:

TEAM UNUM has the ball in REGION

CONDITION (and (bwoner TEAM UNUM) (bpos REGION))

46

((bpos (penalty-area opp))(do (player-except our{4}) (pos (half our)))

Extra Slide: Experimental Methodology

• Correctness– CLang: output exactly matches the correct

representation– Geoquery: the resulting query retrieves the same

answer as the correct representation

If the ball is in our penalty area, all our players except player 4 should stay in our half.

((bpos (penalty-area our))(do (player-except our{4}) (pos (half our)))

Correct:

Output:

47

Extra Slide: Future Work

• Hard-matching symbolic patterns are sometimes too brittle, exploit string and tree kernels as classifiers [Lodhi et al., 2002]

• Unified implementation of string and tree-based versions for direct comparisons

Recommended