55
Grammar Induction So what did we have?

Grammar Induction So what did we have?. Is that a dog? (6) 102 (5) (4) 102 (3) (4) 101 (1)(2) 101(3) 103 (1) 104 (1) (2) 104 (3) (2) (3) 103 (6) (5)(7)

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Grammar Induction

So what did we have?

Is that a dog?

(6)102(5)(4)102 (3)

(4)

101

101)1( (2) 101 (3)

103

(1)

104

(1)

(2)

104

(3)

(2)(3)

103

(6)

(5)

(7)

(6)

)6(

(5)

where

104

(4)the

dog ? END

(4)

(5)

a

andhorse

)2( that

cat

102(1)BEGIN is

Is that a cat?Where is the dog? And is that a horse?

nodeedge

The Model: Graph representation with words as vertices and sentences as paths.

Detecting significant patterns

Identifying patterns becomes easier on a graph Sub-paths are automatically aligned

search path

4 5

1

2

36 7

e1 end

5 4

7

1

23

vertex

path

begin

8

e4 e5 e6

86

A

e3e2

9Initialization

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p

e1 e2 e3 e4 e5

significantpattern

PR)e2|e1(=3/4

PR)e4|e1e2e3(=1

PR)e5|e1e2e3e4(=1/3

PL)e4(=6/41

PL)e3|e4(=5/6

PL)e2|e3e4(=1

PL)e1|e2e3e4(=3/5

PL

SLSR

PR

PR)e1(=4/41

PR)e3|e1e2(=1

begin end

Motif EXtraction

Pattern significance Say we found a potential pattern-edge

from nodes 1 to n. Define m - the number of paths from 1 to n r – the number of paths from 1 to n+1 Because it’s a pattern edge, we know that

Let’s suppose that the true probability for n+1 given 1 through n is

r/m is our best estimate, but just an estimate What are the odds of getting r and m but

still have ?

1 or nn

n

P rP

P m

*1nP

*1n nP P

Pattern significance Assume The odds of getting result r and m

or better are then given by

If this is smaller than a predetermined α, we say the pattern-edge candidate is significant

*1n nP P

1

( , , ) ( ) (1 )r

i m in n n

i

mB r m P P P

i

search path1

2

36 7

e1 end

5 4

7

1

2

3

vertex

begin

8

e4 e5 e6

54

76

5 4

6 73

e2

new vertex

86

11

e3e2

e3

9

3

9

e4

8

8

C rewiring

e2 e3 e4

P1

4 5

path9

9

Rewiring the graph

Once a pattern is identified as significant, the sub-paths it subsumes are merged into a new vertex and the graph is rewired accordingly. Repeating this process, leads to the formation of complex, hierarchically structured patterns.

Evaluating performance Define

Recall – the probability of ADIOS recognizing an unseen grammatical sentence

Precision – the proportion of grammatical ADIOS productions

Recall can be assessed by leaving out some of the training corpus

Precision is trickier Unless we’re learning a known CFG

Determining L

Involves a tradeoff Larger L will demand more context

sensitivity in the inference Will hamper generalization

Smaller L will detect more patterns But many might be spurious

The effects of context window width

0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1

0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1

C

D

B

G

Recall

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Pre

cis

ion

over-generalization

low

pro

du

cti

vit

y

A

B

C

D L=6

L=5

L=4

L=3

10,000Sentences

120,000Sentences

40,000Sentences

0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1

0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1

F

0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1

0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1

E

120,000Sentences

An ADIOS drawback

ADIOS is inherently a heuristic and greedy algorithm Once a pattern is created it remains

forever – errors conflate Sentence ordering affects outcome

Running ADIOS with different orderings gives patterns that ‘cover’ different parts of the grammar

An ad-hoc solution Train multiple learners on the corpus

Each on a different sentence ordering Create a ‘forest’ of learners

To create a new sentence Pick one learner at random Use it to produce sentence

To check grammaticality of given sentence If any learner accepts sentence, declare as

grammatical

The ATIS experiments ATIS-NL is a 13,043 sentence corpus of

natural language Transcribed phone calls to an airline

reservation service ADIOS was trained on 12,700 sentences

of ATIS-NL The remaining 343 sentences were used to

assess recall Precision was determined with the help of 8

graduate students from Cornell University

The ATIS experiments

ADIOS’ performance scores (40 learners) – Recall – 40% Precision – 70%

For comparison, ATIS-CFG reached – Recall – 45% Precision - <1%(!)

ADIOS/ATIS-N comparison

0.00

0.20

0.40

0.60

0.80

1.00

ADIOS ATIS-N

Pre

cis

ion

A B

Chinese

Spanish

French

English

Swedish

Danish

C

D E

Meta-analysis of ADIOS results

Define a pattern spectrum as the histogram of pattern types for an individual learner A pattern type is determined by its

contents E.g. TT, TET, EE, PE…

A single ADIOS learner was trained with each of 6 translations of the bible

Pattern spectraT

T

TE

TP

ET

EE

EP

PT

PE

PP

TTT

TTE

TTP

TE

T

TE

E

TE

P

TP

T

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

English

Spanish

Swedish

Chinese

Danish

French

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

0 200 400 600 8000.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

0 200 400 600 800

A B

Chinese

Spanish

French

English

Swedish

Danish

C

D E

Language dendogram

TT TE TP ET EE EP PT PE PPTT

TTT

ETT

PTE

TTE

ETE

PTP

T

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

A B

Chinese

Spanish

French

English

Swedish

Danish

C

D E

So why doesn’t it work?

Our experience ADIOS does nicely on

ATIS-N Childes Artificial CFGs

Fails miserably on almost anything else The Wall-Street Journal Children’s literature The Bible

Results CHILDES

Very high recall + precision The ESL test

ATIS-N Up to 70% recall (with 700 learners) Superior language model

Children’s lit Very few patterns are detected

Some example sentences Childes

baby go ing to go up the ladder ? the dog won 't sit in the chaise lounge . take the lady for a ride

Atis-n i would like one coach reservation for may

ninth from pittsburgh to atlanta leaving pittsburgh before ten o'clock in the morning

where is the stopover of american airlines flight five four five nine

what are the flights from boston to washington on october fifteenth nineteen ninety one

Some example sentences Children’s lit

The Tin Woodman and the Scarecrow didn ' t mind the dark at all , but Woot the Wanderer felt worried to be left in this strange place in this strange manner , without being able to see any danger that might threaten .

I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .

Some corpus statistics

Corpus Word types #sentences

Avg. sentence

length

CHILDES 14,401 320,000 ~6 words

ATIS-N 1,153 12,700 ~10 words

Children’s lit

52,180 41,129 ~52 words

Possible causes for failure I

Sentence complexity and structural diversity CHILDES and ATIS-N have very few

sentence ‘types’ Most of which are simple, single-clause

sentences Children’s lit has many complex

sentences with multiple clauses

Types of complex sentences Complementary clauses

Peter promised that he would come Sue wants Peter to leave

Relative clauses Sally bought the bike that was on sale Is that the driver causing the accidents?

Adverbial clauses He arrived when Mary was just about to

leave She left the door open to hear the baby

Coordinate clauses He tried hard, but he failed

That example again

I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .

Possible causes for failure Sentence complexity and

structural diversity CHILDES and ATIS-N have very few

sentence ‘types’ Most of which are simple, single-clause

sentences Children’s lit has many complex

sentences with multiple clauses The music lesson

Possible remedies

How do children do it? Incremental learning On the importance of starting small

How might we mimic that? Sorting sentences according to

complexity Starting out with a simpler corpus

The problem of the growing lexicon

Generalizing patterns

New sentence: I like the cow

P1: I like the _E1

_E1 =dogcat

horse

P1: I like the _E1

_E1 =

dogcat

horsecow

May cause overgeneralization

P1: I like the _E1

_E1 =dogcat

horse

New sentence: I like the finer things in life

P1: I like the _E1

_E1 =

dogcat

horsefiner

Allowing gaps

New sentence: I like the red dog

P1: I like the _E1

_E1 =dogcat

horse

P2: I like the red _E1

_E1 =dogcat

horse

Another approach Two-phase learning

Split complex sentences into simple clauses Learn simple clauses Combine results back to complex sentences

and resume learning Sidesteps the problem of the growing

lexicon Introduces the problem of identifying

clause boundaries

That example again

I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .

Possible causes for failure II

Sentence complexity and structural diversity

Lexicon size vs. #sentences Large lexicon might curtail

alignments necessary for generalization

Possible remedies How do children do it?

Have access to semantic information Which may be used for alignment

How can we mimic it? Introducing pre-existing ECs WordNet Distributional Clustering

Semantic tagging?

An aside - bootstrapping Used for very small corpora Iteratively do –

Train a set of learners on the current corpus Generate sentences Replace corpus with generated sentences

Problematic for large corpora Must be performed by transforming the

existing sentences

A word about the code

A little on Java classes Similar to struct in C Also allow the definition of class-

specific functions Data members may be

Private – only accessible to class functions

Public – accessible to everyone Protected – like private, for most of

our purposes

The code Consists of three packages

Com.ADIOS.Model – contains classes defining the graph (graph.java, node.java, edge.java, etc’)

Com.ADIOS.Algorithm – the ‘brains’ of the implementation (most importantly contains MarkovMatrix.java and Trainer.java)

Com.ADIOS.Helpers – various helper classes

The model

Node EquivalenceClass Pattern

Edge Path Graph

The algorithm

Trainer MarkovMatrix

also finds new equivalence class Generator

calculates recall and generates new sentences

The main package

Main Processes command line arguments

(context window width, corpus file name, etc’)

Finals A repository of constants used

throughout the code

The Model – Node.java

Data members Label, inEdges, outEdges

Nontrivial functions getOutEdges(Vector inEdges)

Returns the edges going out if this node that come from inEdges

getInEdges(Vector outEdges) Same, only in other direction

The model – EquivalenceClass.java

Inherits from Node Additional data members –

Nodes Nontrivial functions –

getOutEdges(), getOutEdges(Vector inEdges)

Same as in Node, only sums for all constituent nodes

The model – Pattern.java

Inherits from Node Additional data members

Id, path (the pattern specification)

The model – Path.java Data members –

Id, nodes Nontrivial functions –

Init(StringTokenizer st) – inits the path according to a line of text

Squeeze(Pattern p, int, int) – finds the instances of p in the path and replaces them by the single node p

Does not rewire the graph!

The model – Edge.java

Data members – fromNode, toNode prevEdge, nextEdge, path

No nontrivial functions

The model – Graph.java

Main data members – nodes, edges, paths,

equivalenceClasses, patterns Nontrivial functions –

addPattern(Pattern p) – rewires the graph

Print functions – print various data to files

The algorithm – MarkovMatrix.java Main data members –

path, matrix, pathsCountMatrix winSize, winIndex, wildcardIndex ec

Nontrivial functions – findWildcardCandidate() – generates the

new equivalence class in the wildcard position

initMarkovMatrix() – calculates the matrix

The algorithm – Trainer.java Main data members –

leftCandidates, rightCandidates patterns

Nontrivial functions – trainSinglePath – runs MEX on a single

(maybe generalized) path createTrialPathMatrix – finds the subpaths

the go through the context window alignment – generalizes the search path and

searches for patterns getPatterns – intersect candidates

The algorithm – Generator.java

Main functions – getRecall – gets a file name and

returns how many sentences were accepted by the current grammar

generatePathsFile – creates new sentences using current grammar and stores them in file

Helpers – Parser.java

Main function – loadCorpusFile – loads corpus to

graph, initializing everything

com.ADIOS

Finals.java – contains important constants used throughout the code

Main.java Processes command line arguments Calls the appropriate functions of the

appropriate objectss

The implementation’s output