22
Finite State Transducers Eric Gribkoff May 29, 2013 Original Slides by Thomas Hanneforth (Universitat Potsdam)

Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Finite State Transducers

Eric Gribkoff

May 29, 2013

Original Slides by Thomas Hanneforth (Universitat Potsdam)

Page 2: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Outline

1 Definition of Finite State Transducer

2 Examples of FSTs

3 Definition of Regular Relations

4 Language of an FST

5 Closure Properties of Regular Relations5.1 Composition5.2 Intersection

6 FSTs Applied to String Indexing

Eric Gribkoff | UC Davis 2/22

Page 3: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

1 Definition of Finite State TransducerA Non-Deterministic Finite State Transducer (FST) is a 7-tuple(Q,Σ,Γ, δ, ω, q0, F )

1. Q is a finite set called the states

2. Σ is a finite set called the alphabet

3. Γ is a finite set called the output alphabet

4. δ : Q× Σ ∪ {ε} → P(Q) is the transition function

5. ω : Q× Σ ∪ {ε} ×Q→ Γ∗ is the output function

6. q0 ∈ Q is the start state

7. F ⊆ Q is the set of accept states

Eric Gribkoff | UC Davis 3/22

Page 4: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

A Deterministic Finite State Transducer (FST) is a 7-tuple(Q,Σ,Γ, δ, ω, q0, F )

1. Q is a finite set called the states

2. Σ is a finite set called the alphabet

3. Γ is a finite set called the output alphabet

4. δ : Q× Σ→ Q is the transition function

5. ω : Q× Σ→ Γ is the output function

6. q0 ∈ Q is the start state

7. F ⊆ Q is the set of accept states

Eric Gribkoff | UC Davis 4/22

Page 5: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

2 Examples of FSTs

The action of a Finite State Transducer can be viewed as computing arelation between two sets.

Eric Gribkoff | UC Davis 5/22

Page 6: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Figure 1: Tlex maps words to morphological featuresdog → NOUN sg (singular)dogs → NOUN pl (plural)

Eric Gribkoff | UC Davis 6/22

Page 7: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Figure 2: Laughing machine Tlaugh

The input string laugh is mapped to the infinite set {han|n ≥ 1}(laugh, ha)(laugh, haa)(laugh, haaa)

...

Eric Gribkoff | UC Davis 7/22

Page 8: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Figure 3: Bracketing machine Tbracket

Every occurrence of ab is enclosed within brackets.cabbabc→ c{ab}b{ab}c

Eric Gribkoff | UC Davis 8/22

Page 9: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

3 Definition of Regular RelationsA relation over the sets A and B is a subset of the Cartesian product,A×B.

Ex: Let A = {dog, cat, cow}, B = {seven, π, octopus}. Then R ={(dog, seven), (cow, π), (cow, octopus)} is a relation.

A regular (or rational) relation over the alphabets Σ, Γ is formed froma finite combination of the following rules:

1. ∀ (x, y) ∈ Σ ∪ {ε} × Γ ∪ {ε}

2. ∅ is a regular relation

3. If R, S are regular relations, then so are R ◦ S, R ∪ S, and R∗

Eric Gribkoff | UC Davis 9/22

Page 10: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

4 Language of an FSTThe language L(T ) of a non-deterministic FST T = (Q,Σ,Γ, δ, ω, q0, F )is defined using the extended transition and output functions δ∗, σ∗:

L(T ) = {(u, v) | δ∗(q0, u) ∩ F 6= ∅ ∧ v ∈ σ∗(q0, u)}

δ∗(q, ε) = {q}

δ∗(q, wa) =⋃q′∈δ∗(q,w) δ(q′, a)

σ∗(q, ε) = {ε}

σ∗(q, wa) = σ∗(q, w) ◦⋃q′∈δ∗(q,w)

⋃p∈δ(q′,a) σ(q′, a, p)

Eric Gribkoff | UC Davis 10/22

Page 11: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Explanation of δ∗

δ∗ is the extended transition function. It returns the set of possible statesafter processing a state and input string pair.

δ∗(q, ε) = {q} means that if the input is the empty string, we will remainat the current state

δ∗(q, wa) =⋃q′∈δ∗(q,w) δ(q′, a)⋃

q′∈δ∗(q,w) is a union over all states reachable on input string w from stateq

δ(q′, a) is the non-deterministic transition function, returning the set ofpossible states given state q and input character a

Eric Gribkoff | UC Davis 11/22

Page 12: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Explanation of σ∗

σ∗ is the extended output function. It returns the set of possible outputstrings after processing a state and input string pair.

σ∗(q, ε) = {ε} (Missing output from ε-transitions?)

σ∗(q, wa) = σ∗(q, w) ◦⋃q′∈δ∗(q,w) σ(q′, a, p), p ∈ δ(q′, a)

σ∗(q, w) is the set of possible outputs after processing w from state q⋃q′∈δ∗(q,w) is a union over all states q′ reachable on input string w from

state q⋃p∈δ(q′,a) is a union over all states p reachable from q′

Eric Gribkoff | UC Davis 12/22

Page 13: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Explanation of σ∗, continued

σ(q′, a, p) is the non-deterministic output function, returning the set ofpossible output strings given the transition from state q′ to state p oninput a.

Eric Gribkoff | UC Davis 13/22

Page 14: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Parallel with DFAs

A Deterministic Finite Automaton recognizes a regular language

Similarly, a Finite State Transducer recognizes (or encodes) a regularrelation

Eric Gribkoff | UC Davis 14/22

Page 15: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

5 Closure Properties of Regular Relations

Regular relations are closed under the following operations:

• Concatenation

• Union

• Closure (Star)

• Composition

Eric Gribkoff | UC Davis 15/22

Page 16: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

5.1 Composition

Composing an FST T1 with an FST T2 means taking an input u for T1,collecting the output v of T1 , feeding v as input to T2, and collecting theoutput w of T2.

Eric Gribkoff | UC Davis 16/22

Page 17: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Figure 4: Compositions of FSTsThe leftmost FST maps words to their categories. The middle FST mapsnoun phrase-patterns to noun-phrase categories. Their composition acceptsvalid noun-phrases.

Eric Gribkoff | UC Davis 17/22

Page 18: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Closure Properties of Regular Relations

Unlike regular languages, regular relations are not closed under:

• Intersection

• Complementation

• Difference

Eric Gribkoff | UC Davis 18/22

Page 19: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

5.2 Intersection

The regular relation (an, bnc∗) can be formed by the concatenation of(a, b)∗ and (ε, c)∗. Likewise, (an, b∗cn) = (ε, b)∗ ◦ (a, c)∗.

Their intersection, (an, bncn), is not a regular relation.

The lack of closure under complementation and difference follows fromDe Morgan’s law.

Eric Gribkoff | UC Davis 19/22

Page 20: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

6 FSTs Applied to String IndexingFSTs can be used to map between a set of words and the index of eachword in a sorted array

Define a regular relation between the set of words and their numberingwithin the alphabetically sorted set

With an FST encoding this relation, words are matched to their index term

But its more space efficient to construct an FST with shared paths forcommon prefixes and suffixes

Eric Gribkoff | UC Davis 20/22

Page 21: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

The words {mop, moth, pop, star, stop, top} are matched to their ordinalindex by summing the digits of the output string.

Efficient algorithms exist to generate these FSTs for lists of millions ofwords.

Eric Gribkoff | UC Davis 21/22

Page 22: Finite State Transducersrogaway/classes/120/spring13/eric... · 1DefinitionofFiniteStateTransducer ANon-DeterministicFiniteStateTransducer(FST)isa7-tuple (Q,Σ,Γ,δ,ω,q0,F) 1

Q & A