1 Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries Yasuhiro TAJIMA, Yoshiyuki KOTANI Tokyo Univ. of Agri. & Tech

1

Polynomial Time Probabilistic Learning of a Subclass of Linear Languages

with Queries

Yasuhiro TAJIMA, Yoshiyuki KOTANI

Tokyo Univ. of Agri. & Tech.

2

This talk…

• Probabilistic learning algorithm of a subclass of linear languages with membership queries

• learning via queries + special examples → Probabilistic learning

Use translation algorithms

representative sample → random examples

equivalence query → random examples

3

Motivations

A simple deterministic grammar (SDG) has

at most one rule for every pair of

⇒ learning algorithm for SDG from• membership queries• representative sample

⇒ for linear languages

aA

Σa

NA

CFLs

SDLs

Regular

Linear

(Tajima et al. 2004)

4

Linear grammar

aBcA

A context-free grammar is a linear grammarif every rule is of the form

aBABcA

BA,ca,

: nonterminal

: terminal

Any linear grammar can be written in RL-linear s.t.if every rule is of the form

and

CBCaABaA

CBaCAaBA

,

,

aA

aA

aBABcA

5

has only left linear rules ( or right linear rules)

Strict-deterministic linear grammarAn RL-linear is a Strict-det linearif, for any pair of rules )2|||,(|, vuvAuA

cDvaBu , DcvBau ,or

A

cECcEaEC

bDBbDaDB

cAbACcABbA

aAS

,,

,,,

,,,,

,

Ex) }{ iiii cabaL

for some a,B,c,D

6

Deterministic linear grammar

A linear grammar is deterministic linear (DL)

if every rule is of the form

aBuA Aor

),,,,( * ΣaNBAΣvu

and

vuCBaCvAaBuA ,,

Theorem : detstrict DL

Theorem(de la Higuera, Oncina 2002) :DL : identifiable in the limit from polynomial time and data

7

MAT learning (Angluin1987)

learner

hypothesis

teacher

target language

membership query

counter example

?tLw

yes or no

})1,0{,(w

hGtL

)( hh GLL

hypothesis hG

)()( ht GLGLw

equivalence query

8

PAC learning (Valiant 1984)PAC : Probabilistic Approximate Correct

D : probability distribution

target concept

tLexample

tLw

tLu*Σ

learningalgorithm hypothesis

hL

1))(Pr( ht LLP is PAChL

9

If a hypothesis is consistent with

PAC)1)(2(ln1

ln1

thenini

Equivalence query PAC learning algorithm (Angluin[1987])⇒

examplesni

××

iG

1iG

1iG

If there is a consistenthypothesis PAC learnable⇒

examples

consistent with

examples)( ijn j

10

Probabilistic learning with queries

Learning algorithm

*Σw

*ΣD

})1,0{,(w)Pr(w

Example oracle

target language

tL

Membership query Yes or NohG

hypothesis

1)))((Pr( ht GLLP

11

Representative sample for a Strict-det

: Strict-det),,,( SPΣNG

: representative sample (RS))(GLQ

..,)( tsQwPA

wxxAS**

for some **, Nx

All rules are used to generate Q

12

cECcEaEC

bDBbDaDB

cAbACcABbA

aAS

,,

,,,

,,,,

,

Example :

),},,,{},,,,,,{( SPcbaΣEDCBASNG

{P

then

},,,{ aaacccaaabbbacabQ is a representative sample (RS)

13

Rule occurring probabilitytG : a target grammar

: a probability distribution on for an example : error parameter : confidential parameter : the size of target grammar’s rules

For every rule , define

D *Σ

*ΣD})1,0{,(w

A

})(,

,|{)(

*21

*

2121

**

ΣNsomefor

wASΣwAZtt GG

|| tP

)Pr(w

14

)(

)Pr()Pr(

AZw

wA

is a rule occurring probability s.t. appears in the derivation of an example

*ΣD

})1,0{,(w

)Pr(w

)Pr( A

is an probability that• • and is used in the derivation

)( tGLwA

wStG

*

A

15

LetSuppose

The set of m-examples containsa set of RS with the probability

Proof: “Any rule doesn’t appear in derivations of m-examples”occurs

tP

dm log

1

}|)min{Pr( tPAAd

1

dmt

mt ePdP )1(

RS

*Σ

D

m

16

We can conclude that

1. Equivalence query can be replaced by

random examples

2. Representative sample can be replaced by

random examples

)1)(2(ln

1ln

1ini

tP

dm log

1

17

example oracle

membership oracle

learning algorithm

membership query

equivalence query

representative sample

quer

y response

nega

m-randomexamples

posi

n-random examples

probabilistic learning algorithm with queries

consistency check

18

Learning algorithm via queries and RS

while (finish == 0) begin

make nonterminals from

make rules and hypothesis

if (equivalence query for responds “yes”)

output , finish = 1

else

update by the counterexample

end

hN

ΣT

T w

hP hG

hG

}|),,{( RSuvwwvuM h

hMT ,

hG

19

Making nonterminals

}|),,{( RSuvwwvuM h

)()(),,(),,( xyzMEMuvwMEMzyxwvuT

),,,(T

wvuA

)/(T

hh MN

then

: a nonterminal = an equivalence class contains (u,v,w)

20

Making rules

),,,,(),,,(TT

wvbuaAwavbuA

,),,,(),,,( bwavuAwavbuATT

}),,,( awauAT

Make all rules as follows except for not consistent with query results

{CFGP

)),,,(,,,(T

hhh wAPΣNG

Select a hypothesis randomly CFGh PP

21

a set of Strict-det

(not bounded bya polynomial)

SDSDSD

Exact learning of strict-det

• Strict-det is polynomial time exact learnable via– membership queries, and– a representative samples (RS)

c.f. [Angluin(1980)] for regular sets

RSPossible

rules

The learning algorithm overview:

SD SD SD

Chose one randomly,Equivalence query

SD

The correct hypothesisWitnesses delete incorrect rule

22

Conclusions

• Strict-det linear language can be probabilistic learnable with queries in polynomial time

Future works• Identification from polynomial time and data

(teachability)

• RS → Correction queries

23

24

Theorem

Strict-det linear languages are

polynomial time probabilistic learnable with membership queries

25

Simple Deterministic Languages

• Context-free grammar(CFG)

in 2-standard Greibach normal form is

Simple Deterministic Grammar (SDG) iff

is unique for every and

• Simple Deterministic Language (SDL) is the generated language by a SDG

),,,( SPΣNG

)2||,( * NaβANA Σa

26

Representative sample for an SDG

: SDG),,,( SPΣNG

: representative sample (RS))(GLQ

..,)( tsQwPaA

wxaxAS**

for some **, Nx

All rules are used to generate Q

27

Example :

),},,,{},,,,{( SPcbaΣCBASNG

,,{ cCSaASP

,, bAaABA

},, bCcCBCbB then

},,{ ccbbaabbabQ is a representative sample (RS)

28

PAC learning

1)))((Pr( ht GLLP

tL

)( hGL

Target language :

Hypothesis language :

A PAC learning algorithm outputs such thathG

where

)(

)())((tt GLLw

ht wPGLLP

PProbability distribution : on *Σ

(Valiant1984)

29

Query learning of SDLs

• SDLs are polynomial time learnable via membership queries and a representative sample

tLthe learner the teachermembership query

?tLw

yes / nohG

representativesample

at the beginning

representative sample : a special finite subset of tL

)( tGL

(Tajima2000)

30

Learning model

tLthe learner the teachermembership query

?tLw

yes / nohG

representativesample

at the beginning

representative sample : a special finite subset of tL

)( tGL

Documents

1 Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries Yasuhiro TAJIMA, Yoshiyuki KOTANI Tokyo Univ. of Agri. & Tech