48
Learning Deterministically Recognizable Tree Series — Revisited Andreas Maletti , 22 May 2007

Learning Deterministically Recognizable Tree Series --- Revisited

Embed Size (px)

Citation preview

Page 1: Learning Deterministically Recognizable Tree Series --- Revisited

Learning DeterministicallyRecognizable Tree Series —Revisited

Andreas Maletti

������o�����, 22 May 2007

Page 2: Learning Deterministically Recognizable Tree Series --- Revisited

00 Motivation

Goal� Given : T� ! A with (A;+; �; 0; 1) semifield

� Learn finite representation (here: deterministic wta) of , if possible

� Access to is granted by a certain form of teacher (oracle)

TU Dresden, 22 May 2007 Learning deterministic wta slide 2 of 26

Page 3: Learning Deterministically Recognizable Tree Series --- Revisited

00 Table of Contents

Notation

Weighted tree automaton

Myhill-Nerode congruence

Learning algorithm

An example

TU Dresden, 22 May 2007 Learning deterministic wta slide 3 of 26

Page 4: Learning Deterministically Recognizable Tree Series --- Revisited

01 Notation

Trees� T�: trees over ranked alphabet �

� C�: contexts (trees with exactly one occurrence of �) over �

� size(t): number of nodes of a tree t

Tree series� tree series: mapping of type T� ! A

� we write ( ; t) for (t) with : T� ! A

� AhhT�ii: set of all mappings of type T� ! A

TU Dresden, 22 May 2007 Learning deterministic wta slide 4 of 26

Page 5: Learning Deterministically Recognizable Tree Series --- Revisited

02 Table of Contents

Notation

Weighted tree automaton

Myhill-Nerode congruence

Learning algorithm

An example

TU Dresden, 22 May 2007 Learning deterministic wta slide 5 of 26

Page 6: Learning Deterministically Recognizable Tree Series --- Revisited

02 Syntax

Definition (Borchardt and Vogler ’03)(Q;�;A; �; F ) is a weighted tree automaton (wta)

� Q is a finite nonempty set (states)

� � is a ranked alphabet (of input symbols)

� A = (A;+; �; 0; 1) is a semifield (of weights)

� � = (�k)k�0 with �k : �(k) ! AQk�Q (called tree representation)

� F � Q (final states)

Definitionwta (Q;�;A; �; F ) is deterministic if for every � 2 �(k) and w 2 Qk there exists atmost one q 2 Q such that �k(�)w;q 6= 0.

TU Dresden, 22 May 2007 Learning deterministic wta slide 6 of 26

Page 7: Learning Deterministically Recognizable Tree Series --- Revisited

02 Example wta

Q = fS;VP;NP;NN;ADJ;VBg and F = fSg

Alice 0:5! NN Bob 0:5

! NN loves 0:5! VB hates 0:5

! VB

ugly 0:25! ADJ nice 0:25

! ADJ mean 0:25! ADJ tall 0:25

! ADJ

NN VP0:5! S

NP VP0:5! S

VB NN0:5! VP

VB NP0:5! VP

ADJ NN0:5! NP

ADJ NP0:5! NP

TU Dresden, 22 May 2007 Learning deterministic wta slide 7 of 26

Page 8: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

mean Bob

hates �

ugly Alice

mean 0:25! ADJ

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 9: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

ADJ0:25 Bob

hates �

ugly Alice

Bob 0:5! NN

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 10: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

ADJ0:25 NN0:5

hates �

ugly Alice

ADJ NN0:5! NP

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 11: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

NP0:0625 �

hates �

ugly Alice

hates 0:5! VB

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 12: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

NP0:0625 �

VB0:5 �

ugly Alice

ugly 0:25! ADJ

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 13: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

NP0:0625 �

VB0:5 �

ADJ0:25 Alice

Alice 0:5! NN

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 14: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

NP0:0625 �

VB0:5 �

ADJ0:25 NN0:5

ADJ NN0:5! NP

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 15: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

NP0:0625 �

VB0:5NP

0:0625

VB NP0:5! VP

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 16: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

NP0:0625

VP0:0078125

NP VP0:5! S

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 17: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta

Example

S0:000244140625

So the tree

mean Bob

hates �

ugly Alice

is accepted with weight 0:000244140625.

TU Dresden, 22 May 2007 Learning deterministic wta slide 8 of 26

Page 18: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta (cont’d)

Example

Alice loves

Bob Alice 0:5! NN

TU Dresden, 22 May 2007 Learning deterministic wta slide 9 of 26

Page 19: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta (cont’d)

Example

NN0:5 loves

Bob loves 0:5! VB

TU Dresden, 22 May 2007 Learning deterministic wta slide 9 of 26

Page 20: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta (cont’d)

Example

NN0:5 VB0:5

Bob Bob 0:5! NN

TU Dresden, 22 May 2007 Learning deterministic wta slide 9 of 26

Page 21: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta (cont’d)

Example

NN0:5 VB0:5

NN0:5 ?

TU Dresden, 22 May 2007 Learning deterministic wta slide 9 of 26

Page 22: Learning Deterministically Recognizable Tree Series --- Revisited

02 Computation using wta (cont’d)

Example

So the tree

Alice loves

Bob

is rejected (accepted with weight 0).

TU Dresden, 22 May 2007 Learning deterministic wta slide 9 of 26

Page 23: Learning Deterministically Recognizable Tree Series --- Revisited

02 Deterministically recognizable

DefinitionA tree series 2 AhhT�ii is deterministically recognizable if there exists adeterministic wta accepting .

TU Dresden, 22 May 2007 Learning deterministic wta slide 10 of 26

Page 24: Learning Deterministically Recognizable Tree Series --- Revisited

03 Table of Contents

Notation

Weighted tree automaton

Myhill-Nerode congruence

Learning algorithm

An example

TU Dresden, 22 May 2007 Learning deterministic wta slide 11 of 26

Page 25: Learning Deterministically Recognizable Tree Series --- Revisited

03 DefinitionIn the sequel, let 2 AhhT�ii with A = (A;+; �; 0; 1) a semifield.

Definition (Borchardt ’03)Two trees t; u 2 T� are equivalent if there exists a 2 A n f0g such that for everycontext c 2 C�

a � ( ; c[t]) = ( ; c[u]) :

This equivalence relation is denoted by �.

ExampleLet = size and A = (Z [ f1g;min;+;1; 0). Then t � u for every t; u 2 T�because with a = size(u)� size(t)

a+ size(c[t]) = a+ size(c)� 1 + size(t) = size(u) + size(c)� 1 = size(c[u])

TU Dresden, 22 May 2007 Learning deterministic wta slide 12 of 26

Page 26: Learning Deterministically Recognizable Tree Series --- Revisited

03 DefinitionIn the sequel, let 2 AhhT�ii with A = (A;+; �; 0; 1) a semifield.

Definition (Borchardt ’03)Two trees t; u 2 T� are equivalent if there exists a 2 A n f0g such that for everycontext c 2 C�

a � ( ; c[t]) = ( ; c[u]) :

This equivalence relation is denoted by �.

ExampleLet = size and A = (Z [ f1g;min;+;1; 0). Then t � u for every t; u 2 T�because with a = size(u)� size(t)

a+ size(c[t]) = a+ size(c)� 1 + size(t) = size(u) + size(c)� 1 = size(c[u])

TU Dresden, 22 May 2007 Learning deterministic wta slide 12 of 26

Page 27: Learning Deterministically Recognizable Tree Series --- Revisited

03 Myhill-Nerode theorem

Lemma (Borchardt ’03)� is a congruence on (T�;�).

Theorem (Borchardt ’03)The following are equivalent:

� is deterministically recognizable.

� � has finite index.

Note: The implementation of � yields a minimal deterministic wta accepting .

TU Dresden, 22 May 2007 Learning deterministic wta slide 13 of 26

Page 28: Learning Deterministically Recognizable Tree Series --- Revisited

03 Myhill-Nerode theorem

Lemma (Borchardt ’03)� is a congruence on (T�;�).

Theorem (Borchardt ’03)The following are equivalent:

� is deterministically recognizable.

� � has finite index.

Note: The implementation of � yields a minimal deterministic wta accepting .

TU Dresden, 22 May 2007 Learning deterministic wta slide 13 of 26

Page 29: Learning Deterministically Recognizable Tree Series --- Revisited

03 Approximating the Myhill-Nerode relation

DefinitionLet C � C�. Two trees t; u 2 T� are C-equivalent if there exists a 2 A n f0g suchthat for every context c 2 C

a � ( ; c[t]) = ( ; c[u]) :

The C-equivalence relation is denoted by �C .

Lemma� � and �C� coincide.

� If � has finite index, then there exists finite C � C� such that � and �Ccoincide.

TU Dresden, 22 May 2007 Learning deterministic wta slide 14 of 26

Page 30: Learning Deterministically Recognizable Tree Series --- Revisited

03 Approximating the Myhill-Nerode relation

DefinitionLet C � C�. Two trees t; u 2 T� are C-equivalent if there exists a 2 A n f0g suchthat for every context c 2 C

a � ( ; c[t]) = ( ; c[u]) :

The C-equivalence relation is denoted by �C .

Lemma� � and �C� coincide.

� If � has finite index, then there exists finite C � C� such that � and �Ccoincide.

TU Dresden, 22 May 2007 Learning deterministic wta slide 14 of 26

Page 31: Learning Deterministically Recognizable Tree Series --- Revisited

04 Table of Contents

Notation

Weighted tree automaton

Myhill-Nerode congruence

Learning algorithm

An example

TU Dresden, 22 May 2007 Learning deterministic wta slide 15 of 26

Page 32: Learning Deterministically Recognizable Tree Series --- Revisited

04 Supervised learning

GoalLearn a small C � C� such that � and �C coincide.

Definition (Drewes and Vogler ’07)A maximally adequate teacher answers two types of queries:

� Coefficient query: Given t 2 T� the teacher supplies ( ; t).

� Equivalence query: Given wta M the teacher supplies either– ? if S(M) = ; or– some t 2 T� such that (S(M); t) 6= ( ; t).

TU Dresden, 22 May 2007 Learning deterministic wta slide 16 of 26

Page 33: Learning Deterministically Recognizable Tree Series --- Revisited

04 Main data structure

Definition(E; T;C) is an observation table if

� E and T are finite subsets of T�; C is a finite subset of C�� E � T � �(E) = f�(t1; : : : ; tk) j � 2 �(k); t1; : : : ; tk 2 Eg

� � 2 C

� T \ LC = ; where LC = ft 2 T� j 8c 2 C : ( ; c[t]) = 0g

� card(E) = card(E=�C )

Definitionobservation table (E; T;C) complete if card(E) = card(T=�C ).

TU Dresden, 22 May 2007 Learning deterministic wta slide 17 of 26

Page 34: Learning Deterministically Recognizable Tree Series --- Revisited

04 Main data structure

Definition(E; T;C) is an observation table if

� E and T are finite subsets of T�; C is a finite subset of C�� E � T � �(E) = f�(t1; : : : ; tk) j � 2 �(k); t1; : : : ; tk 2 Eg

� � 2 C

� T \ LC = ; where LC = ft 2 T� j 8c 2 C : ( ; c[t]) = 0g

� card(E) = card(E=�C )

T \ LC = ; means: “No dead states”

Definitionobservation table (E; T;C) complete if card(E) = card(T=�C ).

TU Dresden, 22 May 2007 Learning deterministic wta slide 17 of 26

Page 35: Learning Deterministically Recognizable Tree Series --- Revisited

04 Main data structure

Definition(E; T;C) is an observation table if

� E and T are finite subsets of T�; C is a finite subset of C�� E � T � �(E) = f�(t1; : : : ; tk) j � 2 �(k); t1; : : : ; tk 2 Eg

� � 2 C

� T \ LC = ; where LC = ft 2 T� j 8c 2 C : ( ; c[t]) = 0g

� card(E) = card(E=�C )

card(E) = card(E=�C ) means: “No equivalent states”

Definitionobservation table (E; T;C) complete if card(E) = card(T=�C ).

TU Dresden, 22 May 2007 Learning deterministic wta slide 17 of 26

Page 36: Learning Deterministically Recognizable Tree Series --- Revisited

04 Main data structure

Definition(E; T;C) is an observation table if

� E and T are finite subsets of T�; C is a finite subset of C�� E � T � �(E) = f�(t1; : : : ; tk) j � 2 �(k); t1; : : : ; tk 2 Eg

� � 2 C

� T \ LC = ; where LC = ft 2 T� j 8c 2 C : ( ; c[t]) = 0g

� card(E) = card(E=�C )

Definitionobservation table (E; T;C) complete if card(E) = card(T=�C ).

TU Dresden, 22 May 2007 Learning deterministic wta slide 17 of 26

Page 37: Learning Deterministically Recognizable Tree Series --- Revisited

04 Completing an observation table

Algorithm: COMPLETE

Require: an observation table (E; T;C)

Ensure: return a complete observation table (E0; T; C) such that E � E0

for all t 2 T do

2: if t 6�C e for every e 2 E then

E E [ ftg4: return (E; T;C)

TU Dresden, 22 May 2007 Learning deterministic wta slide 18 of 26

Page 38: Learning Deterministically Recognizable Tree Series --- Revisited

04 Construction of the wta

DefinitionLet T = (E; T;C) complete observation table. Construct (Q;�;A; �; F )

� Q = E

� F = fe 2 E j ( ; e) 6= 0g

� for every � 2 �k and e1; : : : ; ek 2 E such that t = �(e1; : : : ; ek) 2 T

�k(�)e1���ek;T (t) = ( ; t) �kY

i=1

( ; ei)�1

� all remaining entries are 0

TU Dresden, 22 May 2007 Learning deterministic wta slide 19 of 26

Page 39: Learning Deterministically Recognizable Tree Series --- Revisited

04 Construction of the wta

DefinitionLet T = (E; T;C) complete observation table. Construct (Q;�;A; �; F )

� Q = E

� F = fe 2 E j ( ; e) 6= 0g

� for every � 2 �k and e1; : : : ; ek 2 E such that t = �(e1; : : : ; ek) 2 T

�k(�)e1���ek;T (t) = ( (T ); t) �kY

i=1

( (T ); ei)�1

� all remaining entries are 0

TU Dresden, 22 May 2007 Learning deterministic wta slide 19 of 26

Page 40: Learning Deterministically Recognizable Tree Series --- Revisited

04 Construction of the wta (cont’d)

DefinitionLet T = (E; T;C) complete observation table. Define (T ) : T� ! A n f0g by

� if ( ; t) 6= 0( (T ); t) = ( ; t)

� if ( ; t) = 0 and t 2 T , then let c 2 C be such that ( ; c[t]) 6= 0

( (T ); t) = ( ; c[t]) � ( ; c[T (t)])�1

� All remaining entries are 1

TU Dresden, 22 May 2007 Learning deterministic wta slide 20 of 26

Page 41: Learning Deterministically Recognizable Tree Series --- Revisited

04 Construction of the wta (cont’d)

DefinitionLet T = (E; T;C) complete observation table. Define (T ) : T� ! A n f0g by

� if ( ; t) 6= 0( (T ); t) = ( ; t)

� if ( ; t) = 0 and t 2 T , then let c 2 C be such that ( ; c[t]) 6= 0

( (T ); t) = ( ; c[t]) � ( ; c[T (t)])�1

� All remaining entries are 1

TU Dresden, 22 May 2007 Learning deterministic wta slide 20 of 26

Page 42: Learning Deterministically Recognizable Tree Series --- Revisited

04 Construction of the wta (cont’d)

DefinitionLet T = (E; T;C) complete observation table. Define (T ) : T� ! A n f0g by

� if ( ; t) 6= 0( (T ); t) = ( ; t)

� if ( ; t) = 0 and t 2 T , then let c 2 C be such that ( ; c[t]) 6= 0

( (T ); t) = ( ; c[t]) � ( ; c[T (t)])�1

� All remaining entries are 1

TU Dresden, 22 May 2007 Learning deterministic wta slide 20 of 26

Page 43: Learning Deterministically Recognizable Tree Series --- Revisited

04 The outer structure

Algorithm: MAIN

T (;; ;; f�g) finitial observation tableg2: loop

M M(T ) fconstruct new wtag4: t EQUAL?(M) fask equivalence queryg

if t = ? then

6: return M freturn the approved wtagelse

8: T EXTEND(T ; t) fextend the observation tableg

TU Dresden, 22 May 2007 Learning deterministic wta slide 21 of 26

Page 44: Learning Deterministically Recognizable Tree Series --- Revisited

04 The workhorse: EXTEND

Algorithm: EXTEND

Require: a complete observation table T = (E; T;C) and a counterexamplet 2 T�

Ensure: return a complete observation table T 0 = (E0; T 0; C0)such that E � E0 and T � T 0 and one inclusion is strict

Decompose t into t = c[u] where c 2 C� and u 2 �(E) n E

2: if u 2 T and u �C[fcg T (u) then

return EXTEND(T ; c[T (u)]) fnormalize and continueg4: else

return COMPLETE(E; T [ fug; C [ fcg) fadd u and c to tableg

TU Dresden, 22 May 2007 Learning deterministic wta slide 22 of 26

Page 45: Learning Deterministically Recognizable Tree Series --- Revisited

05 Table of Contents

Notation

Weighted tree automaton

Myhill-Nerode congruence

Learning algorithm

An example

TU Dresden, 22 May 2007 Learning deterministic wta slide 23 of 26

Page 46: Learning Deterministically Recognizable Tree Series --- Revisited

05 Input wta

Q = fS;VP;NP;NN;ADJ;VBg and F = fSg

Alice 0:5! NN Bob 0:5

! NN loves 0:5! VB hates 0:5

! VB

ugly 0:25! ADJ nice 0:25

! ADJ mean 0:25! ADJ tall 0:25

! ADJ

NN VP0:5! S

NP VP0:5! S

VB NN0:5! VP

VB NP0:5! VP

ADJ NN0:5! NP

ADJ NP0:5! NP

TU Dresden, 22 May 2007 Learning deterministic wta slide 24 of 26

Page 47: Learning Deterministically Recognizable Tree Series --- Revisited

05 Learned wtaQ = fNP;VB;VP;S;ADJg and F = fSg

Alice 1! NP Bob 1

! NP loves 1! VB hates 1

! VB

ugly 1! ADJ nice 1

! ADJ mean 1! ADJ tall 1

! ADJ

NP VP0:03125! S

VB NP1! VP

ADJ NP0:125! NP

TU Dresden, 22 May 2007 Learning deterministic wta slide 25 of 26

Page 48: Learning Deterministically Recognizable Tree Series --- Revisited

05 Thank you for your attention!

References� Borchardt, Vogler: Determinization of finite state weighted tree automata. J.

Autom. Lang. Combin. 8(3):417–463, 2003

� Borchardt: The Myhill-Nerode theorem for recognizable tree series. Proc. 7thInt. Conf. Developments in Language Theory, LNCS 2710, p. 146–158,Springer 2003

� Drewes, Vogler: Learning deterministically recognizable tree series. J. Autom.Lang. Combin., to appear, 2007

TU Dresden, 22 May 2007 Learning deterministic wta slide 26 of 26