Upload
jemimah-preston
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Recognising Languages
We will tackle the problem of defining languagesby considering how we could recognise them.
Problem:
Is there a method of recognising infinitelanguages?
i.e. given a language description and a string,is there an algorithm which will answer yes orno correctly?
We will define an abstract machine which takes a candidate string and produces the answer yes or no.
The abstract machine will be the specification of the language.
Finite State Automata
A finite state automaton is an abstract modelof a simple machine (or computer).
The machine can be in a finite number ofstates. It receives symbols as input, and theresult of receiving a particular input in aparticular state moves the machine to aspecified new state. Certain states arefinishing states, and if the machine is in one ofthose states when the input ends, it has endedsuccessfully (or has accepted the input).
Example: A1
1
2 3
4
a
bb
b
a
aa,b
Formal definition of FSAs
• We present here the special case of a Deterministic FSA (DFSA)– As proven in CS2013, DFSAs can recognise the
same set of languages as Nondeterministic FSAs (NDFSAs)
DFSA: Formal Definition
A DFSA is a 5-tuple (Q, I, F, T, E) where:Q = states Q is a finite set;I = initial state I is an element of Q;F = final states F is a subset of Q;T = an alphabet;E = edges E is a partial function from Q T Q.
FSA can be represented by a labelled, directed graph = set of nodes (some final; one initial) +
directed arcs (arrows) between nodes + each arc has a label from the alphabet.
Example: formal definition of A1
Q = {1, 2, 3, 4}I = {1}F = {4}T = {a, b}E = { (1,a,2), (1,b,4), (2,a,3), (2,b,4), (3,a,3), (3,b,3), (4,a,2), (4,b,4) }
A11
2 3
4
a
bb
b
a
aa,b
What does it mean to accept a string/language?
If (x,a,y) is an edge, x is its start state and y isits end state.
A path is a sequence of edges such that theend state of one is the start state of the next.
path p1 = (2,b,4), (4,a,2), (2,a,3)
A path is successful if the start state of thefirst edge is an initial state, and the endstate of the last is a final state.
path p2 = (1,b,4),(4,a,2),(2,b,4),(4,b,4)
The label of a path is the sequence of edgelabels.
label(p1) = baa.
What does it mean to accept a string/language?
A string is accepted by a FSA if it is the labelof a successful path.
Let A be a FSA. The language accepted by Ais the set of strings accepted by A, denoted L(A).
babb = label(p2) is accepted by A1.
A string is accepted by a FSA if it is the labelof a successful path.
Let A be a FSA. The language accepted by Ais the set of strings accepted by A, denoted L(A).
babb = label(p2) is accepted by A1.
The language accepted by A1 is ...
1
2 3
4
a
bb
b
a
a a,b
A string is accepted by a FSA if it is the labelof a successful path.
Let A be a FSA. The language accepted by Ais the set of strings accepted by A, denoted L(A).
babb = label(p2) is accepted by A1.
The language accepted by A1 is the set ofstrings of a's and b's which end in b, and inwhich no two a's are adjacent.
1
2 3
4
a
bb
b
a
a a,b
Some simple examples (assuming determinism)
1. Draw an FSA to accept the set of bitstrings starting with 0
2. Draw an FSA to accept the set of bitstrings ending with 0
3. Draw an FSA to accept the set of bitstrings containing a sequence 00
4. Draw an FSA to accept the set of bitstringscontaining both 1 and 0
5. Can you draw an FSA to accept the set of bitstrings that contain an equal number of 0 and 1?
L1. Bitstrings starting with 0
q1 q2
q3
0
1
0 1
0 1
L1. Bitstrings starting with 0
q1 q2
q3
0
1
0 1
0 1
Can you make a smaller FSA that accepts the same language?
L2. Bitstrings ending with 0
q1 q2
q3
0
1
0
1
01
At home: Can you find
a smaller FSA that accepts the same language?
L3. Bitstrings containing 00
q1 q3 0
1
q2 0
0 1 1
L5. Bitstrings with 0 and 1
L5. Bitstrings with equal numbers of 0 and 1
• This cannot be done.• FSAs are not powerful enough.• Later we shall meet automata that can do it
Later:
Some problems cannot be solved by any automaton
Recognition Algorithm
Problem: Given a DFSA, A = (Q,I,F,T,E), anda string w, determine whether w L(A).
Note: denote the current state by q, and thecurrent input symbol by t. Since A is deterministic, (q,t) will always be a singleton set or will be undefined. If it is undefined, denote it by ( Q).
Algorithm:
Add symbol # to end of w.q := initial statet := first symbol of w#.while (t # and q )begin
q := (q,t)t := next symbol of w#
endreturn ((t == #) & (q F))
Minimum Size of FSA's
Let A = (Q,I,F,T,E). Definition:For any two strings, x, y in T*, x and y are distinguishable w.r.t. A if there is a string z T* s.t. exactly one ofxz and yz are in L(A). z distinguishes x and y w.r.t. A.
This means that with x and y as input, A must end in different states - A has to distinguish x and y in order to give the right results for xz and yz. This is used to prove the next result:
Theorem: (proof omitted)
Let L T*. If there is a set of n elements of T* s.t. any two of its elements are distinguishable w.r.t. A, then any FSA that recognises L must have at least n states.
Applying the theorem (1)
L3 (above) = the set of bitstrings containing 00
Distinguishable: all of {11,10,00}{11,10}: 100 is in L, 110 is out, {11,00}: 001 is in L, 111 is out, {10,00}: 001 is in L, 101 is out.
{11,10,00} has 3 elements,
hence, the DFA for L3 requires at least 3 states
Applying the theorem (2)
L5 again: equal numbers of 0 and 1 • n=2: {01,001} need 2 states• n=3: {01,001,0001} need 3 states• n=4: {01,001,0001,00001} need 4 states• …
For any finite n, there’s a set of n elements that are distinguishable the FSA for L4 would need more than finitely many states, (which is not permitted)!
A taste of the theory of Formal Languages
• This theorem tells you something about the kind of automaton (in terms of its number of states) that’s required given a particular kind of problem (i.e., a particular kind of language)
• It also tells you that certain languages cannot be accepted by any FSA
*Deterministic* FSAs
Note(i) there are no -labelled edges;(ii) for any pair of state and symbol (q,t), there is at most one edge (q,t, p); and(iii) there is only one initial state.
DFSA and NDFSA stand for deterministic and non-deterministic FSA respectively.
At home: revise old definition of FSA,to become the definition of a DFSA.
All three conditions must hold
Note: acceptance (of a string or a language) has been defined in a declarative (i.e., non-procedural) way.
Automata with Output (sketch)
We now have ways to formally definelanguages, and ways to automatically test whether a given string is a member of a language.
We also have a simple model of a computer.
Can we extend this model, so that insteadof simply replying "yes" or "no" whenpresented with input, our abstract machinewrites some output from an alphabet,determined by the input string?
Moore Machines
A Moore Machine is a 6-tuple (Q,I,T,E,,O),where Q, I, T and E are as for DFSA's, is an alphabet (the output alphabet), andO is a set of pairs (a function : Q ) definingthe output corresponding to each state of themachine.
Graph: if (q, x) O, then draw state qas q/x
Example: print out a 1 every time an aab substring is input.
0/ 1/ 2/ 3/
b
b
b
ba
a
a
a aaababaaab gives output of 11.
Mealy Machines
A Mealy Machine is a 6-tuple (Q,I,T,E,,O),where Q,I,T and E are as for DFSA's, is an alphabet (the output alphabet), andO is a set of triples (a function Q T )defining the output corresponding to each(state,symbol) pair of the machine.
Graph: if (q,t,x) O, then draw it as anarc from q labelled t/x.
Example: read in a binary number (reversed), and output the (reversed) number one larger.
0/10/0, 1/1
0/1
1/01/0
11101 input gives 00011 as output.
Moore-Mealy Equivalence
Let M be a Moore machine or a Mealy machine,with output alphabet . Define Mo(w) to be theoutput of M on w.
Let M1 = (Q1,I1,T1,E1,1,O1) be a Mooremachine, and M2 = (Q2,I2,T2,E2,2,O2) be aMealy machine. Let Mo() = b.
M1 and M2 are equivalent if T1 = T2 and for allstrings w T1*, M1
0(w) = b M20(w)
Theorem: Moore-Mealy Equivalence
If M1 is a Moore machine, then there exists aMealy machine, M2, equivalent to M1.
If M2 is a Mealy machine, then there exists aMoore machine, M1, equivalent to M2.
Varieties of FSAs
• FSAs with output have applications e.g. in the design of machines (e.g. a vending machine: reads pounds, pennies, etc; outputs a chocolate bar); coding and decoding of messages; etc.
Many other variations on the theme of FSAs exist:• Markov models (FSAs whose edges have probabilities)• Hidden Markov models (with hidden states)• These have many applications, e.g. in Natural Language Processing
Markov Models
Markov Models model the probability of one state following another. (Assumption: only the previous state matters.)
Example: a simple weather model:
Applications of Markov models
• In Natural Language Processing, Markov Models are often used
• For instance, they can model how a word is pronounced as phones (“speech sounds”)– State transition model, where each state is a phone
• Probability of a path through the model is the product of the probabilities of the transitions
Markov models for word pronunciation
Pronunciation Example
• Suppose this string of Phones has been recognized: [aa n iy dh ax]
(Assume we know this for sure)
• Three different utterances (i.e., stings of words) may explain what was recognized:
1. I/[aa] need/[n iy] the/[dh ax]
2. I/[aa] the/[n iy] the/[dh ax]
3. On/[aa n] he/[iy] the/[dh ax]
Probabilities (based on the Markov Model)
• Probabilities of paths:– I/[aa] need/[n iy] the/[dh ax]
• .2 * .12 * .92*.77 = .017
– I/[aa] the/[n iy] the/[dh ax]• .2 * .08 *.12 * .92*.77 = .001
– On/[aa n] he/[iy] the/[dh ax]• 1 * .1 * .92*.77 = .071
Ranked Probabilities
Ranked list of paths:
• On he the (.071)
• I need the (.017)
• I the the (.001)
• etc..
System would only keep top N
• These probabilities can help a speech recognition system
• Other information is needed as well:– Probability of each phone– Probability of each word– Probability that one word follows another
(Can “on he the” be part of an English sentence?)
More about this in level 4 (NLP course).
Footnote (not for exam) Hidden Markov models
• States S1,..Sn cannot be observed• Observations O1,..,Om can be made• Given an observation O, each State has a
probability• But the probability of a State depends on the
previous state as well• For those you you who know them: this makes
Hidden Markov Models a special case of a Bayesian network
Footnote (not for exam): Hidden Markov Model Definition
• Hidden Markov Model = (A, B, π)• A={aij}: State transition probabilities
aij=P(qt+1=Sj | qt=Si)
• π={πi}: initial state distributionπi=P(q1=Si)
• Β={bi(v)}: Observation probability distribution
bi(v)=P(Ot=v | qt=Si)
Summing up
• Finite automata are flexible tools– Important in practical applications
• DFSAs are relatively simple automata– Variants of DFSAs can produce output
(and/or use probabilities)
• Later in this course: More complex automata called Turing Machines (TM) – Some variants of TMs produce output