85
Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004

Tools and Analyses for Ambiguous Input Streams

  • Upload
    yon

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Tools and Analyses for Ambiguous Input Streams. Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004. Harmonia: Language-aware Editing. Programming by Voice Code dictation Voice-based editing commands Program Transformations - PowerPoint PPT Presentation

Citation preview

Page 1: Tools and Analyses for Ambiguous Input Streams

Tools and Analyses for Ambiguous Input Streams

Andrew Begel and Susan L. Graham

University of California, Berkeley

LDTA Workshop - April 3, 2004

Page 2: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 2

Harmonia:Language-aware Editing Programming by Voice

– Code dictation– Voice-based editing commands

Program Transformations– Transformation actions– Pattern-matching constructs

Page 3: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 3

Harmonia:Language-aware Editing Programming by Voice

– Code dictation– Voice-based editing commands

Program Transformations– Transformation actions– Pattern-matching constructs

Human Speech

Page 4: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 4

Harmonia:Language-aware Editing Programming by Voice

– Code dictation– Voice-based editing commands

Program Transformations– Transformation actions– Pattern-matching constructs

Human Speech

EmbeddedLanguages

Page 5: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 5

Harmonia:Language-aware Editing Programming by Voice

– Code dictation– Voice-based editing commands

Program Transformations– Transformation actions– Pattern-matching constructs

Each kind of input stream ambiguity requires new language analyses

Human Speech

EmbeddedLanguages

Page 6: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 6

Speech Example

for (int i = 0; i < 10; i++ ) {

}

for int i equals zero i less than ten i plus plus

Page 7: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 7

Ambiguities

for (int i = 0; i < 10; i++ ) {

}

4 int eye equals 0 aye less then 10 i plus plus

Page 8: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 8

Ambiguities

for (int i = 0; i < 10; i++ ) {

}

4 int eye equals 0 aye less then 10 i plus plus

KW or #?

ID Spelling?KW or ID?

Page 9: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 9

Another Utterance

for times ate equals zero two plus equals one

Page 10: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 10

Many Valid Parses!

4 * 8 = zero; to += won

for times ate equals zero two plus equals one

for (times; ate == 0; to += 1) {

}

fore.times(8).equalsZero(2, plus == 1)

Page 11: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 11

Embedded Language Example

C and Regexps embedded in Flex

Flex Rule for Identifiers

[_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID);

Page 12: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 12

Embedded Language Example

C and Regexps embedded in Flex

Flex Rule for Identifiers

[_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID);

Why not this interpretation?

[_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID);

Page 13: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 13

Legacy Language Example

Fortran

DO 57 I = 3,10

Page 14: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 14

Legacy Language Example

Fortran

• Do Loop

DO 57 I = 3,10

Page 15: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 15

Legacy Language Example

Fortran

• Do Loop

DO 57 I = 3,10

DO 57 I = 3

Page 16: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 16

Legacy Language Example

Fortran

• Do Loop

DO 57 I = 3,10• Assignment

DO 57 I = 3

Page 17: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 17

Legacy Language Example

Fortran

• Do Loop

DO 57 I = 3,10• Assignment

DO57I = 3

Page 18: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 18

Legacy Language Example

PL/I

• Non-reserved Keywords

IF IF = THEN THEN THEN = ELSE ELSE ELSE = ENDEND

Page 19: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 19

Legacy Language Example

PL/I

• Non-reserved Keywords

IF IF = THEN THEN THEN = ELSE ELSE ELSE = ENDEND

KW

ID

ID

ID

Page 20: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 20

Single SpellingMultiple Spellings

Single Lexical Category

UnambiguousHomophone IDs

Lexical misspellings

Multiple Lexical Categories

Non-reserved keywords

Ambiguous interpretations

Homophones

Input Stream Classification

Page 21: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 21

Single SpellingMultiple Spellings

Single Lexical Category

UnambiguousHomophone IDs

Lexical misspellings

Multiple Lexical Categories

Non-reserved keywords

Ambiguous interpretations

Homophones

Input Stream Classification

Embedded Languages Fall in all Four Categories!

Page 22: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 22

GLR Analysis Architecture

LexerGLR

ParserSemantics

FOR I

FOR I

for (i = 0; i < 10; i++ ) {

}

(

Page 23: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 23

GLR Analysis Architecture

LexerGLR

ParserSemantics

FOR I

FOR I

for (i = 0; i < 10; i++ ) {

}

(

Handles syntactic ambiguities

Page 24: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 24

Our Contribution:XGLR Analysis Architecture

LexerXGLRParser

Semantics

for i equals zero ...

FOR I

FOR I

Page 25: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 25

Our Contribution:XGLR Analysis Architecture

LexerXGLRParser

Semantics

for i equals zero ...

FOR I4 EYE

FOR I

Handles input stream ambiguities

Page 26: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 26

LR Parsing

ID KW #

1 S2 S3 Err

2 R1 S4 Err

3 S9 R3 S7

IID

FORKW

=KW1

Parse Table

0#

Input StreamParse Stack

Page 27: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 27

LR Parsing

ID KW #

1 S2 S3 Err

2 R1 S4 Err

3 S9 R3 S7

IID

FORKW

=KW1

Parse Table

0#

Input StreamParse Stack

Page 28: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 28

LR Parsing

ID KW #

1 S2 S3 Err

2 R1 S4 Err

3 S9 R3 S7

IID

=KW1

Parse Table

0#

Input StreamParse Stack

FORKW

3

Page 29: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 29

GLR Parsing

ID KW #

1 S2S3

R5Err

2R1

R2S4 Err

3 S9 R3 S7

IID

FORKW

=KW

1 Parse Table

0#

Input StreamParse Stack

Page 30: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 30

GLR Parsing

ID KW #

1 S2S3

R5Err

2R1

R2S4 Err

3 S9 R3 S7

IID

FORKW

=KW

1 Parse Table

0#

Input StreamParse Stack

Page 31: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 31

GLR Parsing

ID KW #

1 S2S3

R5Err

2R1

R2S4 Err

3 S9 R3 S7

IID

FORKW

=KW

1 Parse Table

0#

Input StreamParse Stack

25

Page 32: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 32

GLR Parsing

ID KW #

1 S2S3

R5Err

2R1

R2S4 Err

3 S9 R3 S7

IID

=KW

1 Parse Table

0#

Input StreamParse Stack

2

FORKW

3

FORKW

45

Page 33: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 33

Single SpellingMultiple Spellings

Single Lexical Category

Not Shown Example 1

Multiple Lexical Categories

Example 2 Example 1

XGLR in Action

Page 34: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 34

23 FOR

Parsing Homophones

BAR

Page 35: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 35

23 FOR

FORE

4

ID

KW

NUM

XGLR Extension: Multiple Spellings, Single and Multiple Lexical Categories

BAR

FOUR

Page 36: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 36

23 FOR

FORE

4

23

23

ID

KW

NUM

XGLR Extension: Parsers fork due to input ambiguity

BAR

FOUR

Page 37: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 37

23

26

FOR

FORE

4

29

35

23

23

ID

KW

NUM

Each parser shifts its now unambiguous input

BAR

FOUR

Page 38: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 38

23

26

FOR

FORE

4

29

35

BAR

23

23

ID

IDKW

NUM

The next input is lexed unambiguously

FOUR

Page 39: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 39

23

26

FOR

FORE

4

29

35

BAR

23

23

42

49ID

IDKW

NUM

ID is only a valid lookahead for two parsers

FOUR

Page 40: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 40

Parsing Embedded LanguagesExample BNF Grammar

Contains Languages L and W

bL loopL dW ENDL

loopL LOOPL |

dW WHILEW NUMW doW

doW DOW |

L

W

Page 41: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 41

Parsing Embedded LanguagesExample BNF GrammarContains Languages L and W

bL loopL dW ENDL

loopL LOOPL |

dW WHILEW NUMW doW

doW DOW |

LOOP WHILE 34 END WHILE 56 DO END

L

W

Page 42: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 42

Page 43: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 43

Page 44: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 44

Page 45: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 45

Page 46: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 46

S 0

Parsing Embedded Languages

LOOP WHILE 34

Page 47: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 47

S 0 LOOP WHILE 34

Current parse state has ambiguous lexical language

Page 48: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 48

S

0

0

W

L

LOOP WHILE 34

XGLR Extension: Fork parsers, assign one to each lexical language

Page 49: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 49

S

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE 34

XGLR Extension: Single spelling, Multiple lexical categoriesLex lookahead both in language L and W

Page 50: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 50

S

4L

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE 34

Only LOOPL is valid lookahead, and is shifted

Page 51: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 51

S

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE 34

XGLR Extension: State 4 has lexer lookaheads only in language W

Page 52: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 52

S

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE

34KW

W

Lex lookahead in language W

Page 53: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 53

S

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE

34KW

W

1W

REDUCE by rule 2 and GOTO state 1

loopL

Page 54: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 54

S

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE

34

KW

W1

W

loopL

Page 55: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 55

S

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE

34

KW

W1

W2

W

Shift into state 2

loopL

Page 56: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 56

S

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE

34

KW

W1

W2

W

W

NUM

XGLR Extension: Lex lookahead in language W

loopL

Page 57: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 57

S

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE 34KW

W1

W2

W W

NUM

loopL

Page 58: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 58

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE 34KW

W1

W2

W W

NUM3

W

Shift into state 3

S

loopL

Page 59: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 59

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE 34KW

W1

W2

W W

NUM3

W

Shift into state 3, which has ambiguous lexical language

S

loopL

Page 60: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 60

4W

0

0 LOOP

LOOPW

L

W

L

KW

ID

WHILE 34KW

W1

W2

W W

NUM3

W

3L

XGLR Extension: Single spelling, Multiple lexical categoriesFork parsers, assign one to each lexical language

S

loopL

Page 61: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 61

GLR Ambiguity Support

1. Fork parser on shift-reduce conflict

2. Fork parser on reduce-reduce conflict

Page 62: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 62

XGLR Ambiguity Support

1. Fork parser on shift-reduce conflict

2. Fork parser on reduce-reduce conflict

Page 63: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 63

XGLR Ambiguity Support

1. Fork parser on shift-reduce conflict

2. Fork parser on reduce-reduce conflict

3. Fork parsers on ambiguous lexical language Single spelling, Multiple lexical categories

4. Fork parsers on ambiguous lexical lookahead Single/Multiple Spellings, Multiple lexical

categories Shift-shift conflict resolution

Page 64: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 64

XGLR Ambiguities

Many GLR programming language specs have finite, few ambiguities

XGLR language specs also have finite, but slightly more, ambiguities – Lexical ambiguity due to ambiguous input does

result in more ambiguous parse forests

Page 65: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 65

XGLR Ambiguities

Many GLR programming language specs have finite, few ambiguities

XGLR language specs also have finite, but slightly more, ambiguities– Lexical ambiguity due to ambiguous input does

result in more ambiguous parse forests

Ambiguity causes parsers to fork GLR maintains efficiency by merging parsers

when ambiguity is over

Page 66: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 66

Parser Merging

GLR: Parsers merge when in same parse state

DOKW

8

31

DOKW

5

57#

5

Page 67: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 67

Parser Merging

GLR: Parsers merge when in same parse state

DOKW

8

31

DOKW

5 57#

4

5

57#

Page 68: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 68

Parser Merging

XGLR: Parsers merge when in same parse state and same lexical state

DOKW

8

31

DOKW

5

57#

A

A A

A A

W

5A

Page 69: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 69

Parser Merging

XGLR: Parsers merge when in same parse state and same lexical state

DOKW

8

31

DOKW

5

A

A A

A A

W

5A

57#

57#

A

W

Page 70: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 70

Parser Merging

XGLR: Parsers merge when in same parse state and same lexical state

DOKW

8

31

DOKW

5

A

A A

A A

W

5A

57#

57#

A

W4

W

Page 71: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 71

Parser Merging

XGLR: Parsers merge when in same parse state and same lexical state

DOKW

8

31

DOKW

5

A

A A

A A

W

5A

57#

57#

A

W4

A

Page 72: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 72

Parser Merging

XGLR: Parsers merge when in same parse state and same lexical state

DOKW

8

31

DOKW

5

A

A A

A A

W

5A

57#

57#

A

W4

A

Page 73: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 73

Out of Sync Parsers

DO57I=3

XGLR: Parsers merge when in same parse state and same lexical state and same input position

8

1A

W

5A

Page 74: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 74

Out of Sync Parsers

57I=3

XGLR: Parsers merge when in same parse state and same lexical state and same input position

8

1A

W

5A

DOKW

A

DO57IID

W=3

Page 75: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 75

Out of Sync Parsers

57I=3

XGLR: Parsers merge when in same parse state and same lexical state and same input position

8

1A

W

5A

DOKW

A

DO57IID

W=3

3A

5W

Page 76: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 76

Out of Sync Parsers

I=3

XGLR: Parsers merge when in same parse state and same lexical state and same input position

8

1A

W

5A

DOKW

A

DO57IID

W3

3A

5W

=KW

W

57ID

A

Page 77: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 77

Out of Sync Parsers

I=3

XGLR: Parsers merge when in same parse state and same lexical state and same input position

8

1A

W

5A

DOKW

A

DO57IID

W3

3A

5W

=KW

W

57ID

A

6W

4A

Page 78: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 78

Out of Sync Parsers

=3

XGLR: Parsers merge when in same parse state and same lexical state and same input position

8

1A

W

5A

DOKW

A

DO57IID

W3

3A

5W

=KW

W

57ID

A

6W

4A

#

W

IID

A

Page 79: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 79

Out of Sync Parsers

=3

XGLR: Parsers merge when in same parse state and same lexical state and same input position

8

1A

W

5A

DOKW

A

DO57IID

W3

3A

5W

=KW

W

57ID

A

6W

4A

#

W

IID

A

9W

Page 80: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 80

Out of Sync Parsers

=3

XGLR: Parsers merge when in same parse state and same lexical state and same input position

8

1A

W

5A

DOKW

A

DO57IID

W3

3A

5W

=KW

W

57ID

A

6W

4A

#

W

IID

A

9W

Page 81: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 81

Implementation

Keep map: lookahead parser to use when looking for parsers to merge with

Sort parsers by position of lookahead in the input – Enables pruning of map as parsers move past a

particular input location– Extra memory required is bounded by dynamic

separation between first and last parsers

Page 82: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 82

Related Work GLR Parsing Algorithm

– Tomita [1985]– Farshi [1991]– Rekers [1992]– Johnstone et. al. [2002]

Incremental GLR– Wagner [1997]

GLR Implementations(that I heard of before today)

– ASF+SDF [1993]– Elkhound [2004]– Bison [2003]– DParser [2002]– Aycock and Horspool [1999]

Scannerless Parsing(or Context-Free Scanning)

– Salomon and Cormack [1989]– Visser [1997]

van den Brand [2002] Ambiguous Input Streams

– Aycock and Horspool [2001] Embedded Languages

– ASF+SDF [1997]– Van de Vanter and

Boshernitsan(CodeProcessor) [2000]

Page 83: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 83

Future Work

Semantic Analysis of Embedded Languages

Automated Semantic Disambiguation

Page 84: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 84

Contributions

1. Generalized GLR to handle input stream ambiguities

2. Classified input stream ambiguities into four categories

3. Implemented XGLR algorithm in Harmonia framework

4. Constructed combined lexer and parser generator to support embedded languages and lexical ambiguities at each stage of analysis

5. Enabled analysis of embedded languages, programming by voice, and legacy languages

Page 85: Tools and Analyses for Ambiguous Input Streams

April 3, 2004 LDTA 2004 85