28
Transformational Grammars The Chomsky hierarchy of grammars Context-free grammars describe languages that regular grammars can’t Unrestrict ed Context- sensitive Context- free Regular Slide after Durbin, et al., 1998

Transformational Grammars

  • Upload
    lyre

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Transformational Grammars. The Chomsky hierarchy of grammars . Unrestricted. Context-sensitive. Context-free. Regular. Slide after Durbin, et al ., 1998. Context-free grammars describe languages that regular grammars can’t . Limitations of Regular Grammars. - PowerPoint PPT Presentation

Citation preview

Page 1: Transformational Grammars

Transformational GrammarsThe Chomsky hierarchy of grammars

Context-free grammars describe languages that regular grammars can’t

Unrestricted

Context-sensitive

Context-freeRegular

Slide after Durbin, et al., 1998

Page 2: Transformational Grammars

Limitations of Regular GrammarsRegular grammars can’t describe

languages where there are long-distance interactions between the symbols!

two classic examples are palindrome and copy languages:

Regular language: a b a a a bPalindrome language: a a b b a a

Copy language: a a b a a b

Yes, OK. Regular grammars can produce palindromes. But you can’t design one that produces only palindromes!

Illustration after Durbin, et al., 1998

Page 3: Transformational Grammars

Context-Free GrammarsSymbols and Productions (A.K.A “rewriting rules”)

Like regular grammars are defined by their set of symbols and the production rules for manipulating strings

consisting of those symbols There are still only two types of symbols:• Terminals (generically represented as “a”)

• these actually appear in the final observed string (so imagine nucleotide or amino acid symbols)

• Non-terminals (generically represented as “W”)• abstract symbols – easiest to see how they are used

through example. The start state (usually shown as “S”) is a commonly used non-terminal

The difference arises from the allowable types of production

Page 4: Transformational Grammars

Context-free GrammarsSymbols and Productions (A.K.A “rewriting rules”)

The left-hand side must still be just a non-terminal, but the right-hand side can be any combination of terminals and non-terminals

W→ aW

W→ abWa

W→ abW

W→ WW

W→ aWa

W→ aWb

W→ aabb

W→ eThese are just examples of some possible valid productions

Page 5: Transformational Grammars

Context-free GrammarsSymbols and Productions (A.K.A “rewriting rules”)

W = {S = “Start”}

a = {a,b}

S→ aSa S→ bSb

S→ aa S→ bb

As before, we start with S then repeatedly choose any of the valid productions, with the non-terminal S being replaced each time by the string on the right hand side of the production we’ve chosen…

Here’s the minimal CFG that produces palindromes:

Page 6: Transformational Grammars

Context-free GrammarsSymbols and Productions (A.K.A “rewriting rules”)

W = {S = “Start”}

a = {a,b,e}

S→ aSa|bSb|aa|bbOr, with an explicit end state:

S→ aSa|bSb|e

S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabaabaa

Here’s the minimal CFG that produces palindromes:

Here’s one possible sequence of productions:

Note that the sequence now grows from outside in, rather than from left to right!!

Page 7: Transformational Grammars

A CFG for RNA stem-loops A A C A C AG A G A G A G•C U•A GxC A•U C•G CxU C•G G•C GxG

Figure after Durbin, et al., 1998

RNA secondary structure imposes nested pairwise constraints similar to those of a palindrome language

Seq1 Seq2 Seq3

Seq1 C A G G A A A C U GSeq2 G C U G C A A A G C

Page 8: Transformational Grammars

A CFG for RNA stem-loops

A A C A C AG A G A G A G•C U•A GxC A•U C•G CxU C•G G•C GxG

Figure after Durbin, et al., 1998

Sequences that violate the constraints would be rejected

Seq1 Seq2 Seq3

Seq3 G C G G C A A C U G

Page 9: Transformational Grammars

A CFG for RNA stem-loops

A A C A C AG A G A G A G•C U•A GxC A•U C•G CxU C•G G•C GxG

S → aW1u | cW1g | gW1c | uW1a

W1 → aW2u | cW2g | gW2c | uW2a

W2 → aW3u | cW3g | gW3c | uW3a

W3 → gaaa | gcaa

Seq1 Seq2 Seq3

A context-free grammar specifying stem loops with a three base-pair stem and either a GAAA or GCAA loop

W = {S = “Start”, W1, W2, W3}

a = {a,c,g,u}

Page 10: Transformational Grammars

Context-free grammars are parsed with push-down automata

Proviso: Push-down automata generally only practical with deterministic CFG!!

The PDA faces a combinatorial explosion if confronted with a non-deterministic CGF with non-trivial problem

size… but we can brute-force small N

Grammar Parsing automatonRegular grammar

Context-free grammarContext-sensitive grammar

Unrestricted grammar

Finite State automaton

Push-down automatonLinear bounded automaton

Turing machine

Page 11: Transformational Grammars

A Push-Down AutomatonAn RNA stem-loop considered as a sequence of states?

W1S

The regular grammar / finite state automaton paradigm will not work!!

W2 W3 e

S → aW1u | cW1g | gW1c | uW1a

W1 → aW2u | cW2g | gW2c | uW2a

W2 → aW3u | cW3g | gW3c | uW3a

W3 → gaaa | gcaa

Page 12: Transformational Grammars

Push-Down AutomatonParse trees are the most useful way to depict PDA

S → aW1u | cW1g | gW1c | uW1a

W1 → aW2u | cW2g | gW2c | uW2a

W2 → aW3u | cW3g | gW3c | uW3a

W3 → gaaa | gcaaW1

S

W2

W3

G C C G C A A G G CThis depiction suggests a stack based method for parsing…

Page 13: Transformational Grammars

Python focus – stacksPython lists have handy stack-like methods!

myStack = [] # creates an empty list

myStack.append(someObject) # “push”

otherObject = myStack.pop() # “pop”

Remember, the stack is a “First-In, Last-Out” (FILO) data structure

How is FILO relevant to context-free grammars?

Page 14: Transformational Grammars

Python focus – stacksPython exception handling may be convenient:

try:

otherObject = myStack.pop() # “pop”

except indexError:

# means myStack was empty! # accepting the input sequence return self.return_string

We’ll introduce exception handling on an “as-needed” basis, but it is a very powerful and useful feature of Python

Errors of various sorts each have their own internal error type. These are objects too!

Page 15: Transformational Grammars

Algorithm for PDA parsingInitialization:

• Set cur_position in sequence under test (“input sequence”) to zero• Push the start state “S” onto the stack

• Pop a symbol off the stack • stack empty? Accept!! Return string

• Is the symbol from the stack a terminal or non-terminal?• Terminal?

• stack symbol matches symbol at cur_position?• Yes! – accept symbol and increment cur_position• No? – reject sequence, return False

• Non-terminal?• Does symbol at cur_position + 1 have a valid production?

• No? – reject sequence, return False• Yes! Push right side of production onto stack, rightmost

symbols first

Iteration: For non-deterministic, we need to consider each possible production!

Page 16: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

S

S →gW1cValid production:

Page 17: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

cW1g

Accept G, move rightAction:

Remember, the previous production is added to the stack right-to-left!!

Page 18: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

cW1

W1 →cW2gValid production:

Page 19: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

cgW2cAction:

Accept C, move right

Page 20: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

cgW2

W2 →cW3gValid production:

Page 21: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

cggW3cAction:

Accept C, move right

Page 22: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

cggW3

W3 →gcaaValid production:

Page 23: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

cggaacgAction:

Accept G, move right

Page 24: Transformational Grammars

PDA parsing – an example

cggaacg

An interlude….If the stack has no non-terminals and corresponds to the input string..

..we would accept several symbols in a row. let’s skip ahead a few steps!!

GCCGCAAGGC

Page 25: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

cAction:

Accept C, move right

Page 26: Transformational Grammars

PDA parsing – an exampleInput string:

GCCGCAAGGCStack:

Empty or eAction:

Accept input string!

Page 27: Transformational Grammars

Push-down AutomataOur stem-loop context-free grammar as a

Python data structure

This dict has keys that are states corresponding to the left-hand side of valid productions, and values that are lists

corresponding to the right-hand side of valid productions. These again are encapsulated as tuples

As with our regular grammar this is just one possible way…

states = {

"Start":[("A","W1","U"), ("C","W1","G"), ("G","W1","C"), ("U","W1","A")],

"W1":[("A","W2","U"),("C", "W2", "G"), ("G", "W2", "C"),("U", "W2","A")],

"W2":[("A","W3","U"),("C","W3", "G"), ("G", "W3", "C"),("U", "W3", "A")],

"W3" : [("G", "A", "A", "A"),("G", "C", "A", "A")] }

Page 28: Transformational Grammars

Python focusSome possibly useful Python

• The in keyword can be used to test membership in a list:

if my_symbol in mylist_of_terminals: # do something

• Reverse iterate through a list or tuple with reversed():for element in reversed(cur_tuple): # do something

Iterate by both index and item with enumerate():for i,NT in enumerate(list_of_nucleotides): print I # first will be 0, then 1, etc. print NT # first will be A, then C, etc.