MELJUN CORTES Automata Lecture Context-free Languages 1

Embed Size (px)

Citation preview

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    1/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 1 of 15

    TOPIC TITLE: Context-Free Languages

    Specific Objectives:

     At the end of the topic session, the students are expected to:

    Cognit ive:

    1. Explain how context-free grammars are more powerful tools indescribing languages than finite automatons and regularexpressions.

    2. Derive the strings of context-free languages from context-freegrammars.

    3. Differentiate leftmost derivation from rightmost derivation.4. Use parse trees to graphically represent the derivation of strings

    from context-free grammars.

    Affective:

    1. Listen to others with respect.2. Participate in class discussions actively.

    MATERIALS/EQUIPMENT:

    o  topic slides

    o  OHP

    TOPIC PREPARATION:

    o  Have the students review related topics that were discussed inprevious courses.

    o  Prepare the slides to be presented in class.o  It is imperative for the instructor to incorporate various kinds of

    teaching strategies while discussing the suggested topics. o  Prepare additional examples on the topic to be presented. 

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    2/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 2 of 15

    Context-Free LanguagesPage 1 of 22

    Context-Free Languages

    Context-Free Grammars

    The past lessons have established that finite automatons and regular expressionsare tools that are used for describing regular languages. However, it was alsoemphasized that these have limitations because these cannot describe non-regularlanguages.

    Context- free g rammars  (CFG) are more powerful tools than finite automatons andregular expressions because these can describe both regular and non-regularlanguages.

    Context-free grammars are widely used in defining the syntax of programminglanguages. These have very important applications particularly in the field ofcompiler design.

    In linguistics, a grammar is a system of rules by which sentences are constructed

    by putting together words of the language.

    Context-Free LanguagesPage 2 of 22

    Example of a very simple grammar, G1, in English:

    1. →  2. →  3. →  4. → a 5. → the 6. → boy 7. → girl 8. → smiles 9. → laughs 

    The example given is a set of very simple rules that can be used to construct verylimited and simple sentences. As can be seen, the grammar G1 has nine rules thatcan be used in constructing sentences.

    Rule 1 states that a sentence can be formed by using a noun phrase followed by apredicate.

    Rule 2 states that a noun phrase is formed by using an article followed by a noun.

    Rule 3 states that a predicate is formed by simply using a verb.

    Rule 4 states that an article is formed by using the word a.

    Rule 5 states that an article can also be formed by using the word the.

    Rule 6 states that a noun is formed by using the word boy .

    Rule 7 states that a noun can also be formed by using the word girl .

    Rule 8 states that a verb can be formed by using the word smiles.

    Rule 9 states that a verb can also be formed by using the word laughs.

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    3/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 3 of 15

    Context-Free LanguagesPage 3 of 22

     As an example of how a sentence can be constructed or derived from the givengrammar, consider the sequence of steps in forming the sentence “the girl smiles”: 

    1. First, apply Rule 1 which shows how a sentence can be constructed:

    →  

     According to Rule 1, a sentence is constructed by getting a noun phraseand the following it with a predicate.

    2. The next step is to apply Rule 2 which indicates how a noun phrase isformed. Replace the term in the first step by substitutingit using Rule 2.

    →  

     A sentence now is constructed by first getting an article, following it with anoun, and then with a predicate.

    3. The next step is to apply Rule 3 which indicates how a predicate isformed. Replace the term in the second step by substitutingit using Rule 3.

    →  

     At this point in time, a sentence is constructed by first getting an article,following it with a noun, and then with a verb.

    4. The fourth step is to use Rule 5 which indicates how an article is obtainedby using the word the. Replace the term in the third step withthe word the.

    → the

    5. The next step is use Rule 7 which indicates that a noun is obtained byusing the word girl . Replace the term in step 4 with the word girl .

    → the girl

    6. And last, use Rule 8 which indicates that a verb is obtained by using theword smiles. Replace the term in step 5 with the word smiles.

    → the girl smiles

     As shown, sentences can be constructed by simply applying the different rules ofthe grammar of that language. In grammar G1, the sentences that can beconstructed are:

    the girl smilesthe girl laughsthe boy smilesthe boy laughsa girl smilesa girl laughsa boy smilesa boy laughs

    Of course the actual grammatical rules used in the English language are muchmore complicated than grammar G1.

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    4/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 4 of 15

    Context-Free LanguagesPage 4 of 22

    In theory of computation, a context-free grammar is used in a similar manner. Inthis course, a grammar is a system of rules by which strings are constructed byputting together symbols of the given alphabet.

    Example:

    Consider the grammar, G2, for language L = {w   {0,1}* w  = wR } given by:

    1. S → ε  

    2. S → 0 

    3. S → 1 

    4. S → 0S0 

    5. S → 1S1 

    Recall that this language is composed of strings that are palindromes. Apalindrome is a word which reads the same forward and backward. And as shownin the discussions on the pumping lemma, this language is not a regular language.

    Rules 1 to 3 of grammar G2  specify the basic ways of constructing palindromestrings. A palindrome string may be formed by simply using the empty string є , 0,or 1. In other words, ε , 0, and 1 are already palindromes.

    Rule 4 provides a mechanism for constructing more complicated palindromestrings. Rule 4 states that a palindrome is formed by “sandwiching” anotherpalindrome with 0s. For example, if there is a palindrome string 11011, anotherpalindrome may be formed by adding a 0 each at the start and end of that string.This will result in 0110110, which is still a palindrome.

    Similarly, Rule 5 states that a palindrome is formed by sandwiching anotherpalindrome with 1s. Given the previous palindrome string 11011, anotherpalindrome may be formed by adding a 1 each at the start and end of that string.This will result in 1110111, which is still a palindrome.

    Observe that Rules 4 and 5 can be viewed as recursive functions.

     As an example of how a string of this language can be constructed or derived fromthe given grammar, consider the sequence of steps in forming the string 1010101:

    1. First, apply Rule 5 which shows how one string can be constructed:

    S → 1S1 

    2. Then, replace the S on the right-hand side using Rule 4:

    S → 10S01 

    3. Then, replace the S on the right-hand side using Rule 5:

    S → 101S101 

    4. Then, apply Rule 2 by replacing the S on the right-hand side with a 0:

    S → 1010101 

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    5/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 5 of 15

    Context-Free LanguagesPage 5 of 22

    Therefore, a context-free grammar is simply a set of rules that dictate how stringsare formed. These rules are called subst i tut ion rules  or product ions .

    The left-hand side of each production has a symbol, called a var iable , followed byan arrow and a string. In grammar G1, the variables are , , , , , and . In grammar G2, there is onlyone variable which is S.

    The right-hand side of each production has a string composed of variables andother symbols called terminals . Variables are symbols that can be replaced whileterminals are symbols that cannot be replaced (similar to constants inmathematical equations). In grammar G1, the terminals are a, the, boy , girl , smiles,and laughs  (Strictly speaking, the terminals here are the letters of the Englishalphabet). In grammar G2, the terminals are 0 and 1. The empty string ε   is notreally a terminal for grammar G2, and this will be explained later.

    For each grammar, one variable is assigned as the star t var iable . It usuallyappears on the left-hand side of the first production. This is where all derivationsbegin. For grammar G1 the start variable is while for grammar G2 the

    start variable is S.

    Context-Free LanguagesPage 6 of 22

    By following the rules of a grammar, each string of that language can be generatedor derived. The procedure is:

    1. Write down the rule that contains the start variable. It is usually the firstrule of the grammar. For grammar G1, all derivations will have to startusing Rule 1 since it is the only rule with the start variable atthe left-hand side. For grammar G2, derivations may start using any ofthe rules since all of them have the start variable S at their left-hand side.The actual rule to start with depends on the string being derived.

    2. Find a variable on the right-hand side of the rule written in step 1 and findanother rule that starts with that variable. Replace or substitute the

    variable with the right-hand side of that rule. An arrow (→) meanssubstitute. So the rule → boy means the variable maybe substituted with the terminal boy . Hence, all variables on the righthand side of the current production may be replaced with the appropriaterule or rules.

    3. Repeat the second step until only terminals remain on the right-handside.

    From step 1, keep on substituting the right-hand side variables with the appropriaterule (using variables and/or terminals) until no variables are left. As an exampleagain, using grammar G2 to form the string 1010101:

    1. start with rule 5 to form S → 1S1 

    2. substitute the right-hand side S with 0S0 (rule 4) to form S → 10S01 

    3. substitute the right-hand side S with 1S1 (rule 5) to form S → 101S101 

    4. substitute the right-hand side S with 0 (rule 2) to form S → 1010101 

    Since there are no more variables on the right-hand side, the substitution stops.

    The sequence of substitutions performed to obtain a string is called a der ivat ion .

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    6/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 6 of 15

    Context-Free LanguagesPage 7 of 22

    For convenience and simplicity, rules with the same left-hand side variable may be

    combined into a single rule with their right-hand side strings separated by a whichmeans or .

    Hence, grammar G1 can be written as

    1. →  2. →  3. →  

    4. → a the

    5. → boy girl

    6. → smiles laughs

    Rule 5 now states that the variable may be replaced by the terminal boy  orthe terminal girl .

    Similarly, grammar G2 can be written as

    1. S → ε   0 1 0S0 1S1

    Context-Free LanguagesPage 8 of 22

     All strings generated or derived from a grammar G constitute the language of thatgrammar and is written as L(G).

    The language of grammar G1 is then the set of all sentences that can be derivedfrom it. Specifically,

    L(G1) = {the girl smiles, the girl laughs, the boy smiles, the boy laughs, agirl smiles, a girl laughs, a boy smiles, a boy laughs}

    On the other hand, the language of grammar G2 is the set of all strings that can bederived from it, which is the set of all palindromes formed using the symbols 0 and1. Specifically,

    L(G2) = {w   {0,1}* w  = wR }

     Any language that can be generated by some context-free grammar is called acontext- free language . Hence, L(G1) and L(G2) are context-free languages.

    The term "context-free" expresses the fact that variables can be rewritten withoutregard to the context or situation in which they occur.

    Context-Free LanguagesPage 9 of 22

    Formal Definition of Context-Free Grammars

     A formal definition of context-free grammars may now be given. The formaldefinition may be obtained by simply listing the different components of a context-free grammar (similar to the formal definition of a DFA and/or NFA).

     A context-free grammar  G is a 4-tuple (V , , R , S), where

    1. V is the finite set of variables, 

    2. is the finite set of terminals,3. R  is the finite set of rules, with each rule being a variable on the

    left-hand side and a string of variables and/or terminals on theright-hand side, and

    4. S is the start variable.

    Take note of the following:

    The set of variables V   and the set of terminals must be disjoint or have no

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    7/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 7 of 15

    common elements (V   = ). In other words, a terminal cannot be avariable and vice-versa.

    Each rule is of the form V  → (V   Σ )*. This means that the left-hand side of arule is composed of a single variable while its right-hand side is made of astring composed of variables and/or terminals (which includes the emptystring є   by definition of the start operation). Hence, ε   may be used inproductions even though it is not included in the set of terminals.

    The start variable S is a member of the set of variables (S  V ).

    Context-Free LanguagesPage 10 of 22

    Examples:

    Given grammar G3 as

    1.  A → 0A1 ε  

    The formal definition of grammar G3 is then

    G3 = (V , , R , S)

    where

    V  = { A}

    = {0, 1}

    R  = { A → 0 A1 ε }

    S = A 

    Context-Free Languages

    Page 11 of 22

    Sample derivations using grammar G3:

     A → 0A1 → 00A11 substituted A with 0A1→ 0011 substituted A with ε  

     A → 0A1 → 00A11 substituted A with 0A1→ 000A111 substituted A with 0A1→ 000111 substituted A with ε 

     A → 0A1 → 00A11 substituted A with 0A1→ 000A111 substituted A with 0A1→ 0000A1111 substituted A with 0A1

    → 00001111 substituted A with ε 

    Take note that the strings formed by using grammar G3 are strings that start with ablock of consecutive 0s followed by a block of consecutive 1s in which the totalnumber of 0s is equal to the total number of 1s following it. Hence, the language ofgrammar G3 is

    L(G3) = {0 x 1 x 

     x  ≥ 0} 

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    8/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 8 of 15

    Context-Free LanguagesPage 12 of 22

    Given grammar G4 as

    1.  A → B1 

    2. B → 0B1 ε  

    The formal definition of grammar G4 is

    G4 = (V , , R , S)

    where

    V  = { A, B}

    = {0, 1}

    R  = { A → B1, B → 0B1 ε )

    S = A 

    Context-Free LanguagesPage 13 of 22

    Sample derivations using grammar G4:

     A → B1 → 0B11 substituted B with 0B1→ 011 substituted B with ε 

     A → B1 → 0B11 substituted B with 0B1→ 00B111 substituted B with 0B1→ 00111 substituted B with ε 

     A → B1 → 0B11 substituted B with 0B1→ 00B111 substituted B with 0B1→ 000B1111 substituted B with 0B1→ 0001111 substituted B with ε 

    Take note that the strings formed by using grammar G4 are strings that start with ablock of consecutive 0s followed by a block of consecutive 1s in which the totalnumber of 0s is one less the total number of 1s following it. Hence, the languageof grammar G4 is

    L(G4) = {0x1

    x+1 x  ≥ 0} 

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    9/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 9 of 15

    Context-Free LanguagesPage 14 of 22

    Given grammar G5 as

    1.  A → (A)  AA ε  

    The formal definition of grammar G5 is

    G5 = (V , , R , S)

    where

    V  = {A}

    = {( , )}

    R  = {A → (A)  AA ε}

    S = A

    Take note that in this example, the terminals are the left and right parenthesesand not the usual 0 and 1.

    Context-Free LanguagesPage 15 of 22

    Sample derivations using grammar G5:

     A → (A) → ( )  substituted A with ε  

     A → (A) → (AA)  substituted A with AA→ ((A)A)  substituted 1st A with (A)→ ((A)(A))  substituted 2nd A with (A)→ (( )(A))  substituted 1st A with ε 

    → (( )( ))  substituted 2nd A with ε 

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    10/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 10 of 15

    Context-Free LanguagesPage 16 of 22

     A → (A) → (AA)  substituted A with AA→ ((A)A) substituted 1st A with (A)→ ((A)(A)) substituted 2nd A with (A)→ (( )(A)) substituted 1st A with ε 

    → (( )(AA)) substituted A with AA→ (( )((A)A)) substituted 1st A with (A)→ (( )((A)(A))) substituted 2nd A with (A)→ (( )(( )(A))) substituted 1st A with ε → (( )(( )( ))) substituted 2nd A with ε 

    L(G5) is the language of all strings composed of properly nested parentheses. Thismeans that each left parenthesis has a corresponding right parenthesis at theproper level. This grammar is particular useful in checking mathematicalexpressions in computer programs to determine if the parentheses are properlynested.

    Context-Free LanguagesPage 17 of 22

    More on Derivations

    Consider the following grammar G6 whose rules are:

    1. S → A1B 

    2.  A → 0A ε  

    3. B → 0B 1B ε  

    The set of variables V  = {S, A, B}, the set of terminals = {0, 1}, and the start stateis S.

    In the process of deriving a string, there will be situations where there will be morethan one variable on the right-hand side. Hence, there is a choice as to whichvariable will be replaced first.

    For example, in grammar G6, derivations will begin by using the first rule S →  A1B. After this, there is now a choice whether to replace variable A or B first by using thesucceeding rules.

    To restrict the number of choices for simplicity, it is required that the leftmost ofrightmost variable be replaced first at each step of the derivation.

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    11/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 11 of 15

    Context-Free LanguagesPage 18 of 22

    The derivation obtained by substituting the leftmost variable at each step first iscalled the lef tmost der ivat ion .

    Example:

    In deriving the string 00101 from grammar G6 using leftmost derivation:

    S → A1B 

    Since this is a leftmost derivation, the first variable to be replaced is A instead of B.Variable A will be replaced by 0 A.

    S → 0A1B

    In the next step, variable A will be replaced by 0 A again.

    S → 00A1B

    In the next step, variable A will be replaced by ε .

    S → 001B

    Since there is only one variable in the production step, variable B will be replacedas 0B.

    S → 0010B

    In the next step variable B will be replaced by 1B.

    S → 00101B

    Finally, variable B will be replaced by ε .

    S → 00101

    Context-Free LanguagesPage 19 of 22

    The derivation obtained by substituting the rightmost variable at each step first, iscalled the r ightmos t der ivat ion .

    Example:

    In deriving the string 00101 from grammar G6 using rightmost derivation:

    S → A1B 

    Since this is a rightmost derivation, the first variable to be replaced is B instead of A. Variable B will then be replaced by 0B.

    S → A10B

    In the next step, variable B will be replaced by 1B.

    S → A101B

    In the next step, variable B will be replaced by ε .

    S → A101

    Since there is only one variable left in the production step, variable  A  will bereplaced by 0 A.

    S → 0A101

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    12/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 12 of 15

    In the next step variable A will be replaced by 0 A again.

    S → 00A101

    In the last step variable A will be replaced by ε .

    S → 00101

    Context-Free LanguagesPage 20 of 22

    Example:

    Consider grammar G7 with the following rules:

    1. S → 0AB 2.  A → 1B1 

    3. B → A ε  

    Give the leftmost and rightmost derivation of the string 01111.

    Context-Free LanguagesPage 21 of 22

    Parse Trees

    The derivations obtained from a context-free grammar can be represented

    graphically using a tree structure called parse trees  or der ivat ion trees .

    Parse trees give a visualization of the entire derivation of a string.

    The variables occupy the internal nodes (these are the nodes with at least onechild) of a tree with the start variable being the root of the tree. The terminalsoccupy the leaf nodes (these are the nodes with no children) at the bottom

    The children of an internal node (variables) are the right-hand side string of a ruleused to expand the variable.

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    13/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 13 of 15

    Context-Free LanguagesPage 22 of 22

    Example: Construct the parse tree for the derivation of the string 00101 usinggrammar G6.

     As with any derivation, the construction of the parse tree begins with the startvariable:

    S → A1B 

    The parse tree for the first production will be

    S

    1 BA

     

    Take note that the children of node S are the right-hand side string of a rule usedto expand S.

    The next step is to expand A. This is done by replacing A with 0 A. Therefore,

    S → 0A1B

    The parse tree will then be

    S

    1 BA

    A0

     

    Take note that reading the leaf nodes from left to right will give 0A1B which is theright-hand side of the current production.

    The next step is to expand A by replacing it again with 0 A. Therefore,

    S → 00A1B

    The parse tree will now be:

    S

    1 BA

    A0

    A0

     

     Again, reading the leaf nodes from left to right will give 00 A1B which is the right-hand side of the current production.

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    14/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 14 of 15

    The next step is to expand A by replacing it with ε . Therefore,

    S → 001B

    The parse tree will now be

    S

    1 BA

    A0

    A0

     

     Again, reading the leaf nodes from left to right will give 00ε 1B, which is actually001B, which is the right-hand side of the current production.

    The next step is to expand B by replacing it with 0B. Therefore,

    S → 0010B

    The parse tree will now be

    S

    1 BA

    A0 B0

    A0

     

    Reading the leaf nodes from left to right will give 00ε 10B, which is actually 0010B—the right-hand side of the current production.

    The next step is to expand B by replacing it with 1B. Therefore:

    S → 00101B

    The parse tree will now be

  • 8/21/2019 MELJUN CORTES Automata Lecture Context-free Languages 1

    15/15

     

    Theory of Computation (With Automata Theory)

    Context-Free Languages *Property o f STIPage 15 of 15

    S

    1 BA

    A0   B0

    A0   B1

     

    The last step is to expand B by replacing it with ε . Therefore,

    S → 00101

    The parse tree will now be

    S

    1 BA

    A0   B0

    A0   B1

     

    Reading the leaf nodes from left to right will give 00ε 101ε , which is actually00101—the string being derived.

    [Context-Free Languages, Pages 1 –22 of 22]