49
Lecture Slides for MAT-73006 Theoretical computer science PART Ib: Automata and Languages. Context-Free languages Henri Hansen January 26, 2015 1

Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Lecture Slides for MAT-73006

Theoretical computer science

PART Ib: Automata and Languages.

Context-Free languages

Henri Hansen

January 26, 2015

1

Page 2: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Context-free languages

• There are several very simple languages that are not regu-lar, such as {0n1n | n ≥ 0}

• They are ”simple” to describe mathematically, but computa-tionally the situation is different

• An important class of languages is context-free languages.

• We shall explore a way of describing these languages, calledcontext-free grammars.

2

Page 3: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• An important area of application for these grammars is foundin programming languages

Page 4: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Context-free grammar

• Let us start with an example of a grammar:

A→ 0A1

A→ B

B →#

• These three rules are substitution rules. The left hand sideof each rule contains a variable, and the right hand sidecontains a string consisting of variables and terminal sym-bols

3

Page 5: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• Terminal symbols are symbols of the language that is beingdefined, i.e., Σ is the set of terminal symbols

• A grammar describes a language by generating the stringsin the language. This happens by the following the proce-dure:

1. Write down the start variable. Unless otherwise stated,it is the left-hand side of the topmost rule

2. Find a variable that has been written down, and a rulethat has this variable as it left-hand side. Replace thewritten down variable with the right-hand side of the rule

3. Repeat step 2 until no variables remain.

Page 6: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• For example, the example grammar can generate the string000#111

• The sequence of substitutions that results in the string iscalled a derivation.

• A derivation can also have a graphic representation as aparse tree.

• The set of strings that can be generated by a given grammaris called the language of the grammar.

Page 7: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

A more complicated example〈SENTENCE〉 → 〈NOUN-PHRASE〉 〈VERB-PHRASE〉

〈NOUN-PHRASE〉 → 〈CMPLX-NOUN〉 | 〈CMPLX-NOUN〉 〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CMPLX-VERB〉 | 〈CMPLX-VERB〉 〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉 〈CMPLX-NOUN〉〈CMPLX-NOUN〉 → 〈ARTICLE〉 〈NOUN〉〈CMPLX-VERB〉 → 〈VERB〉 | 〈VERB〉 〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → likes | sees | touches〈PREP〉 → with

4

Page 8: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Formal definition of CFG

• A context-free grammar is a 4-tuple (V,Σ, R, S), where

1. V is a finite set called variables

2. Σ is a finite set, disjoint from V called terminals (AKAalphabet)

3. R is a finite set of rules, a rule being a pair (v, σ) wherev is a variable and σ is s string of variables and termi-nals; also written as v → σ

4. S ∈ V is the starting variable

5

Page 9: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• if u, v and w are strings of variables and terminals, andA→ w is a rule of the grammar, then uAv yields the stringuwv, written uAv ⇒ uwv.

• We say that u derives v, written u⇒∗ v if u = v or if thereis some sequence u⇒ u1 ⇒ u2 ⇒ · · · ⇒ uk ⇒ v

• The language of the grammar is the set {w ∈ Σ∗ | S ⇒∗

w}

Page 10: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Examples of CFGs.

• Often we write a CFG by simply giving the rules; the vari-ables are the symbols that appear at left-hand sides and theothers are terminals.

• S ⇒ aSb | SS | ε (think of a as "(" and b as ")")

E → E + T | T

T → T × F | F

F → (E) | n6

Page 11: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Where the alphabet is {n,+,×, (, )}

• A compiler of a programming language translates code intoanother form; CFG:s are used, for instance in describingprogramming language syntax

• the process by which the meaning of a string is found byrelating it to a grammar, is known as parsing.

Page 12: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Ambiguity

• Consider the grammar rule E → E + E | E × E | (E) | a.There are several derivations for strings such as a+ a× a

• Definition: A grammar is ambiguous if there are two or moreways of deriving a string of its language

• Ambiguity makes (unique) parsing impossible, so obviouslyone should strive to describe languages unambiguously when-ever possible,

• Some languages are inherently ambiguous, i.e., all gram-mars that generate them, are ambiguous

7

Page 13: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Pushdown automata

• Regular languages were defined as languages that are rec-ognized by some finite automaton

• Context-free languages can similarly be recognized by cer-tain kind of automata, due to the recursive nature of context-free languages, some form of memory is needed.

• Informally, pushdown automata are like nondeterministic fi-nite automata, but instead of simply moving from one stateto another, they use a stack to store information about whatthe automaton has done in the past, and this informationaffects what the automaton does next

8

Page 14: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• When a pushdown automaton is in a given state, it respondsto the alphabet that is read from the input, and to the vari-able that is on top of the stack.

• Let us mark Σε the set Σ ∪ {ε} (and similarly for Γε

• Formally: A pushdown automaton is a 6-tuple (Q,Σ,Γ, δ, q0, F ),where

1. Q is the (finite) set of states

2. Σ is the input alphabet

3. Γ is the stack alphabet

Page 15: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

4. δ : Q×Σε × Γε 7→ 2Q×Γε is the nondeterministic tran-sition function

5. q0 ∈ Q is the start state

6. F ⊆ Q is the set of accept states

• A pushdown automaton (PDA) M = (Q,Σ,Γ, δ, q0, F ) ac-cepts an input a1 · · · an (where ai ∈ Σε) if and only if thereis some sequence of states q0q1 · · · qn and a set of stringsg0, g1, · · · , gn of Γ∗ε such that the following conditions aremet:

1. g0 = ε, i.e., the automaton starts with an empty stack

Page 16: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

2. for 0 ≤ i ≤ n − 1 we have (qi+1, x) ∈ δ(qi, ai+1, y)

and gi = yt and gi+1 = xt; i.e., the content of the stackis the same after the move, except possibly the topmostelement

3. qn ∈ F

• To understand the transition function, if (qi+1, x) ∈ δ(qi, ai+1, y),then this transition can executed if y is on top of the stack,the automaton is in state qi and the next read input symbolis ai+1. After it is executed, y is removed from the stackand x is put on top, and the automaton has moved to stateqi+1

Page 17: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Example

• Consider the language {aibjck | i = j or i = k} i.e., eitherthe number of bs or the number of cs is the same as thenumber of as.

• Informally, it is relatively easy to consider a PDA that ac-cepts the language: First read all as, pushing a counter intothe stack. Then, nondeterministically choose to count eitherthe bs or the cs and match their number with as.

9

Page 18: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

q0 q1 q4

q2

q5 q6

q3

ε, ε→ $

ε, ε→ ε

ε, ε→ ε

ε,$→ ε

ε, ε→ ε ε,$→ ε

a, ε→ a b, ε→ ε c, a→ ε

b, a→ ε c, ε→ ε

Page 19: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Equivalence

• Pushdown automata and context-free grammars are equiv-alent in the same way as regular expressions and finite au-tomata are:

• Theorem: A language is context-free if and only if there is apushdown automaton that recognizes it

• First we explain how to prove this in the other direction. LetA be a context free language. By definition then, it has aCFG, say G that generates it

10

Page 20: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• The idea of the proof is as follows: We generate a nonde-terministic PDA that, when reaging an input "guesses" whatsubstitutions are needed for a given string.

1. Initially, the PDA puts the start variable on the stack

2. After this, the automaton always looks at the top symbolof the stack. If it is a variable, then it nondeterministi-cally chooses a rule to apply, removes the variable andreplaces the variable with the right-hand side of the rule(in reverse order)

3. If the top symbol is a terminal, then it compares it tothe next input. If the symbols differ, this branch rejects;otherwise the top symbol is simply removed.

Page 21: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

4. If the stack is empty when the input ends, the automatonaccepts.

• Please verify that the automaton accepts exactly the stringsthat are generated by the grammar!

• The other direction is proven so that we generate a contextfree grammar from the transition relation of a PDA

• Given a PDA P three modifications are made:

1. It will contain only one accepting state, qa. This is not aproblem, because nondeterminism is allowed

Page 22: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

2. The automaton only accepts after it has emptied thestack. This is not a restriction either

3. Every transition either pushes a symbol (but does notremove) or removes a symbol (but does not add) to thestack. Again, this is not a restriction, because transitionscan be "split" into two.

• The PDA is then used as a recipe for creating a grammarthat generates exactly the language that is accepted by thePDA; let p be the first state and q be the last state (theunique accept state).

• When P is computing on a string, say x, conditions 2 and3 require that the first operation adds and the last operation

Page 23: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

removes a symbol of the stack. If the symbols are different,then the stack must have been empty at some point (why??)

• If the symbols are the same, we create the rule Apq →aArsb, where a is the input read at the first move and b atthe last move.

• If the symbols are not the same, then the there is somestate r in which the stack is empty. we create a rule Apq →AprArq, and so on.

• To formalize the proof, let (Q,Σ,Γ, δ, q0, {qa}) be a PDA(after the modification)

Page 24: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

1. For each p, q, r, s ∈ Q, u ∈ Γ and a, b ∈ Σε, if δ(p, a, ε)contains (r, u) and δ(s, b, u) contains (q, ε), generatethe rule Apq → aArsb in G

2. For each p, q, r ∈ Q put the rule Apq → AprArq in G

3. Finally, for each p ∈ Q put the rule App → ε in G

• Lemma: If Apq generates x then P has an execution from p

(with empty stack) to q (with empty stack) reading x.

• This can proven by induction

1. If the derivation of x happens in one step, then the right-hand side contains a result with no variables, only termi-

Page 25: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

nals. The only such rule that is generated by this con-struction is App → ε, hence, x must be the empty string

2. Assume it holds for all derivations with at most k steps.If Apq ⇒∗ x in k + 1 steps, the first step is eitherApq → aArsb or Apq → AprArq. Both cases resultin derivations of length less than k.

• Lemma: If P has an execution reading x from p to q (withempty stack in both ends), then Apq generates x.

• This again is done by induction:

1. If the computation contains 0 steps, the automaton can-not read any symbols and x is the empty string, and theautomaton stays in state p. App → ε generates x.

Page 26: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

2. The inductive step is as before.

Page 27: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Non- context-free languages

• There are languages that are not regular nor context-free.

• There is a lemma, similar to pumping lemma, for contextfree grammars:

• If A is a context free language, then there is a number psuch that, if s ∈ A with |s| ≥ p, then s can be divided into 5parts s = wvxyz such that

1. wvixyiz ∈ A for every i ≥ 0

2. |vy| > 0 and11

Page 28: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

3. |vxy| ≤ p

• Proof: Let A be a CFL. Then it has a grammar G that gen-erates it. Let s be a "very long" string of the language.

• Because s is "very long" (longer than p), it’s derivation willuse (at least) one of the variable symbols more than onceon (at least) one branch of the derivation tree. (please com-pare to the pumping lemma!). Let this variable be calledR.

• Let x be the string that is derived from the last occurrenceof R, and the occurrence before the last derive wxy.

Page 29: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• Then, we can replace the last occurrence of R with exactlythe same subtree as the one in the second to last

• Therefore, instead of wxy, we derive wwxyy.

• This can be done arbitrarily many times over.

Page 30: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Examples of non CF-languages.

• The language {anbncn | n ≥ 0} is not context-free.

• The language {aibjck | 0 ≤ i ≤ j ≤ k} is not context-free

• The langauge {ww | w ∈ {0,1}∗} is not context-free

12

Page 31: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Deterministic CFLs.

• Deterministic and nondeterministic finite automata are equiv-alent, but the same does not hold for pushdown automata

• To formalize the theory, let us begin with a definition of adeterministic PDA, or DPDA.

• A deterministic pushdown automaton is a 6-tuple (Q,Σ,Γ, δ, q0, F )

such that

1. Q is a finite set of states

2. Σ is the (input) alphabet13

Page 32: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

3. Γ is the stack alphabet

4. δ : Q × Σε × Γε 7→ (Q × Γε) ∪ {∅} is the transitionfunction

5. q0 ∈ Q is the start state

6. F ⊆ Q is the set of accept states

• The transition function is furthermore required to be nonemptyfor exactly one of the values

δ(q, a, x), δ(q, a, ε), δ(q, ε, x), δ(q, ε, ε)

for every q ∈ Q, a ∈ Σ, and x ∈ Γ.

Page 33: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• In other words, the automaton either accepts any input andmoves (the fist two) or it just moves, and when moving, itbehaves in a unique manner.

• A language accepted by a DPDA is called a deterministiccontext-free language.

Page 34: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Examples

• The language {0n1n | n ≥ 0} is deterministic: It readsinput 0 and pushes a counter token every time until the first1, after which it removes counters every time it reads a 1.

• The language {aibjck | i = j ∨ i = k} is not deterministic.

• The language of palindromes is not deterministic

• Proving determinism is relatively easy: Simply give the de-terministic PDA

14

Page 35: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• Proving nondeterminism is much harder, and for that weneed some more theory

Page 36: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Properties of deterministic CFLs

• Lemma: Every deterministic PDA has an equivalent au-tomaton that always reads the entire input string

– There are two ways in which a DPDA might fail to readthe whole input: hanging, where the automaton is forcedto pop an empty stack, and looping, where te automatonmakes an endless loop of ε-reads.

– Hanging is prevented by putting a special symbol intothe stack before the automaton starts; popping this fromthe stack before the input ends, results in reading therest of the input and rejecting.

15

Page 37: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

– Looping is solved by identifying loops structurally: a ε-loop is then replaced by reading the entire input and re-jecting

– The exception being situations where the whole inputhas been read: if accepts states are visited in such situ-ations, the automaton should accept.

• Theorem: The class of Deterministic CFLs is closed undercomplementation

– Swapping accept and non-accept states works for DFAs.

– DPDAs need to solve an additional problem: if the au-tomaton enters both accepting and non-accepting states

Page 38: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

at the end of an input, it accepts even after complemen-tation. This is solved by requiring that only states whichread input, are allowed to accept.

– Swapping accept/non-accept states in such a DPDA com-plements the language accepted.

• This yields at least one test for non-determinisim: If thecomplement of a given CFL is not context-free, then thelanguage is not deterministic.

• Sometimes it is easier to look at a modified language. LetA be a language, and let ⊥ be a symbol not in the alpha-bet. We denote A⊥ = {w⊥ | w ∈ A} as the end-markedlanguage.

Page 39: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• Theorem: A is a deterministic CFL if and only if A⊥ is adeterministic CFL.

– proof of "only if": Accept states of a PDPA are replacedby a transition reading ⊥ and accepting.

– proof of "if": Let P⊥ accept A⊥. Construct P as follows:If P⊥ would accept after reading⊥ without looking at thestack, simply accept immediately. For other situations,the stack contains "two stacks" as a memory. When ⊥would be read (and possibly accept, depending on thestack) the behaviour of P⊥ is simulated and acceptedaccordingly, but if P⊥ would reject, then the stack is "re-verted".

Page 40: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Deterministic CFGs

• Deterministic PDAs have counterpart in grammars, calleddeterministic context-free grammars.

• Deterministic CFGs and deterministic languages have someattractive properties and restrictions on how strings can bederived.

• A reduce step is a substitution in reverse, for example, ifR→ xyz, then xyz is reduced into R, which is the reducingstring. The reverse derivation of a string is called reduction

16

Page 41: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• When a rule T → h is used backwards on a string xhy toproduce xTy, we write xhy ↪→ xTy

• A reduction from u is a sequence u = u1 ↪→ u2 ↪→ · · · ↪→uk = S, with S as the start symbol.

• The reduction is a leftmost reduction if each reducing stringis reduced only after all other reducing strings that appearto its left.

• if the rule T → h is used in a leftmost reduction to produceui ↪→ ui+1, then h (with this rule) is called the handle of ui.

Page 42: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• A string that appears in a leftmost reduction (for instance,ui) is called a valid string.

• If v = xhy is a valid string and h is its handle, we say thath is a forced handle if h is a unique handle for every validstring of the form xhz, where z ∈ Σ∗.

• A context-free grammar is deterministic iff every valid stringhas a forced handle

• In other words, in deterministic grammars, reduction de-pends only on the leftmost part of the string.

Page 43: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• This does not immediately give us a way of detecting deter-minism, but there is one test that we can derive from it.

Page 44: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

The DK-test

• For any CFG G we can construct a deterministic finite au-tomaton DK that identifies handles. Specifically, DK ac-cepts z if

1. z is the prefix of some valid string v = zy and

2. z ends with a handle of v

• We first define a nondeterministic automaton, K

1. Let J be an NFA that accepts any string that ends withthe right-hand side of some grammar rule

17

Page 45: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

2. In any accepting run of J , it "follows" the right-hand sideof a rule. Let us denote this so-called "rule-state" byB → u′v, when the automaton has read u and v hasnot yet been read. Then the rule-state B → uv′ is ac-cepting.

3. K works like J but with slight modifications.

4. For every rule-state B → u′Cv there is a ε-transitionto a rule-state with C as the left-hand-side, that has notread anything yet.

• Lemma: K may enter state T → u′v on reading z if andonly if z = xu and xuvy is a valid string with handle uv andreducing rule T → uv, for some y ∈ Σ∗.

Page 46: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• Proof should be obvious from construction

• Corollary: K may enter accept state T → h′ on input z ifand only if z = xh and h is a hanlde of some valid stringxhy with reducing rule T → h.

• This gives us the DK-test: Make K deterministic and checkif every accept state contains

1. Exactly one completed rule-state, and

2. no rule-state in which a terminal symbol immediatiatelyfollows, i.e., no rule of the form B → u′av, for somea ∈ Σ

Page 47: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• Theorem: G passes the DK-test iff G is deterministic

• If G is nondeterministic, there is some string with a handlethat is not forced. If DK is run on a string that is a handlebut not a forced handle, then DK must enter an acceptstate at the end of the handle. Because the handle is nota forced handle, it is not unique, so that the accept statecontains another accepting rule-state, or some continuationof the current string leads to an accepting state, and the testfails.

• If the DK-test fails, then there is a valid string with two han-dles: either the handle is complete or there is a continuationof the valid string with a different handle.

Page 48: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

Practical applications of the theory

• Deterministic CFL are very important in practice, becauseparsing of deterministic CFGs is efficient That is why thesyntax of most programming languages is given as deter-ministic CFGs.

• The requirement of forced handles is, however, sometimestoo restrictive, because it restricts the use of intuition in de-signing grammars: it is not always easy to make sure allhandles are forced.

• There is a slightly broader class of grammars, however, thatis both practical and intuitive.

18

Page 49: Lecture Slides for MAT-73006 Theoretical computer science ...hansen/TCS-Slides-Ib.pdf · tain kind of automata, due to the recursive nature of context-free languages, some form of

• The so-called LR(k)- grammars use lookahead. The idea isthat you are allowed to have non-determinism, as long asyou can resolve it by looking ahead no more then k steps ofthe input before choosing the handle.

• Formally: if h is the handle of v = xhy then we say thath is forced by a lookahead of of k, if his the unique han-dle of every string xhz, where y and z agree on the first ksymbols.

• LR(0) languages are deterministic

• LR(k) are grammars for which the handle of every validstring is forced by a lookahead of k.