12
Enter Chomsky Grammars

Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

Embed Size (px)

Citation preview

Page 1: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

Enter Chomsky

Grammars

Page 2: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

2

What has Chomsky* to do with computing?

Linguistics and computing intersect at various places:

Things that are used to create computer software—programming languages, compilers, text editors, etc.—all of these have elements of linguistics.

More importantly, any computing problem can be seen as a “language recognition” problem! Even problems that seem remotely connected to language recognition such as adding two numbers, for instance, can be seen as language recognition problems:

Consider L = {ambncx | there is a c for each and every occurrence of an a and a b }. In fact, recognizing L amounts to adding two numbers: a’s represent the first number, b’s denote the second number and c’s, the sum of the two (in unary form).

Thus, the theoretician rightly views the whole business of computing simply as “language recognition”. And, describing languages in precise ways is definitely an issue …

* Noam Chomsky is, for linguists, what Einstein is for physicists. Chomsky, a Professor of Linguistics at MIT, is also a political analyst well-known for his criticisms of the US foreign policy.

Page 3: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

3

What has Chomsky to do with computing?

What Chomsky set out to do:

“Assuming the set of grammatical sentences of English to be given, we now ask what sort of device can produce this set …..”

(N. Chomsky, Syntactic Structures, 1957, proposition 3.1.)

Chomsky “invented” a powerful theoretical tool for generating strings/patterns of a given type (a language) known as grammars and launched the new field, Formal Languages.

Page 4: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

4

Grammars are “string generators”

0 1 2 3

aa b b

Language recognizers (e.g. finite state automata)

aa b b

Language generators (e.g. grammars)

Page 5: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

5

God loves chicken and even numbers. He decides to create a world with just chicken---even number of them, and allow them to “multiply”. (At no point of time, he’d allow odd number of chicken in the world.) God is intelligent (!) and he does the following trick:

A fairy tale

Creates the first egg.

Creates the following rule:

From each egg can come out two little eggs or two chicken. (Once again, from each little egg can come out two tiny eggs or two chicken and so on.)

Page 6: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

6

Each egg (shell) can break and let out either two chicken or two little shells that can in turn let out two chicken or two tiny little shells that can …

One possible “chicken world”

or

The Rule

Using this rule, one generates any even number of chicken.

Shell Shell Shell | a a

Grammar Rule that mimics bursting shells:

Generates language L = {strings of a’s of even length (greater than 0) }

A fairy tale

Grammar rules operate in a way similar to the way “God’s rule” works in the fairy tale.

Page 7: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

7

The Two Commandments

1. Thou shalt not produce any string that is “outside the set” (strings that don’t belong to L).

2. Thou shalt produce ALL strings that are “inside the set” (strings that belong to L).

Anyone who wishes to write grammar rules to generate a language L has to follow these “commandments”:

Page 8: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

8

An example

S S S

Let Σ = { a, b }

L = {w є Σ* | w is of even length }

Did you notice?Did you notice?… … that Finite State Automata use that Finite State Automata use loops/cyclesloops/cycles to generate patterns to generate patterns repetitively and grammars use repetitively and grammars use recursion recursion for the same purpose.for the same purpose.

production rules (substitution rules)

Example:

Let’s see how the string abbaba (which is of even length) can be generated / derived from this grammar.

S S S

S S S

ab S S

abbaS

abbaba

Derivation: 1S S S

ab S

ab S S

abbaS

abbaba

Derivation: 2

A grammar that allows two different (left-most*) derivations for the same string (as above) is not considered “good” in general. But, this example is used just to show how grammar works. Also, one can easily modify this grammar to avoid such multiple derivations.

*Left-most capital-letter-symbol is expanded first always.

S aa | ab | ba | bb

S ε

Page 9: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

9

Grammar: Definition

A grammar G is a 4-tuple (V, Σ , R, S)

finite set of non_terminals(capital letters)

finite set of terminals(small letters)

finite set of rules

start symbol a special non-terminal

V and Σ are disjoint

All grammar rules that we develop (in this course) will have only one non-terminal symbol on the left side of “”. A grammar with such a restriction is known as “context-free grammar”.

Page 10: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

10

Exercises (see next page for answers to some of the questions)

(one or more a’s)

(one or more a’s OR one or more b’s)

(zero or more a’s followed by one or more b’s)

(any combination of a’s and b’s)

(zero or more a’s and b’s followed by abb)

(a’s and b’s of length = 2)

(a’s and b’s of length ≠ 2)

L8= { w є {a,b}* : |w| is even } (a’s and b’s of even length)L1 = {an | n >= 1}

L2 = {an | n >= 1} U {bn | n >= 1}

L3 = {am bn | m >= 0, n >= 1}

L4 = {a,b}* - {ε}

L5 = { xabb | x є {a,b}*}

L6 = { w є {a,b}* : |w| = 2 }

L7 = { w є {a,b}* : |w| ≠ 2 }

L9= { w є {a,b}* : |w| is odd } (a’s and b’s of odd length)

L10= { w є {a,b}* : w doesn’t have two consecutive b’s}

L11= { wwR | w є {a,b}* } (any string followed by its reverse)

L12 = {w є {a,b}* | w is a palindrome}

L13 = {w є { [ , ] }* | w has balanced parentheses}

L15 = { an bn | n > 0 }

L14 = { w є {a,b}* | w contains bbb }

L16 = { an bmcmdn | n, m > 0 }

L17 = { am bn | m, n > 0, m < n }

L18 = { am bn | m, n > 0, m ≠ n }

Page 11: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

11

Examples(one or more a’s)

S aS | a

(one or more a’s OR one or more b’s)

(zero or more a’s followed by one or more b’s)

S A B

A aA | ε

B bB | b

(any combination of a’s and b’s)

S aS | bS | a | b

S A | B

A aA | a

B bB | b

(zero or more a’s and b’s followed by abb)

Please work it out yourself!

(a’s and b’s of length = 3)

S A A A

A a | b

(a’s and b’s of length ≠ 2)

S ε | a | b | aa A | ab A | bb A | bb A

A aA | bA | a | b

L8= { w є {a,b}* : |w| is even } (a’s and b’s of even length)

S aaS | bbS | abS | baS | ε

L1 = {an | n >= 1}

L2 = {an | n >= 1} U {bn | n >= 1}

L3 = {am bn | m >= 0, n >= 1}

L4 = {a,b}* - {ε}

L5 = { xabb | x є {a,b}*}

L6 = { w є {a,b}* : |w| = 3 }

L7 = { w є {a,b}* : |w| ≠ 2 }

Page 12: Enter Chomsky Grammars. 2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create

12

L9= { w є {a,b}* : |w| is odd } (a’s and b’s of odd length)

More examples

L10= { w є {a,b}* : w doesn’t have two consecutive b’s}S baS …….

(Work out the remaining rules for this language yourself!)

Please work it out yourself!

L14 = { w є {a,b}* | w contains bbb }

S L bbb L

L aL | bL | ε

Some of the remaining (harder) problems will be solved in class.