Upload
elijah-contreras
View
51
Download
3
Embed Size (px)
DESCRIPTION
Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. Text search and closure properties. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Text search. The program grep. grep -E regexp file. - PowerPoint PPT Presentation
Citation preview
CSCI 3130: Automata theory and formal languages
Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130
The Chinese University of Hong Kong
Text search and closure properties
Fall 2010
The program grep
grep -E regexp file
Searches for the occurrence of patterns matching a regular expression
[ab][12] {a1, a2, b1, b2} concatenation
cat|12 {cat, 12} union
[abc] {a, b, c} shorthand for a|b|c
[ab]{2} {aa, ab, ba, bb} {n} copies
(ab)* {, ab, abab, ...} star
[ab]? {, a, b} zero or one
(cat)+ {cat, catcat, ...} one or more
Searching with grep
grep –E `savou?r` words
outsavorsavorsavoredsavorersavorilysavorinesssavoringlysavorlesssavorous
savorsomesavorysavourunsavoredunsavoredlyunsavorednessunsavorilyunsavorinessunsavory
Words containing savor or savour
grep –E `[ab]{5}` words
grabbable
Words with 5 consecutive a or b
grep –E `zo+zo+` words
zoozoo
More grep commands
$ end of line
. any symbol
\< beginning of line
[a-z] anything in a range
s u f f e r
r e g u l a r
s t a r
n f aa
l
en
1
2
3
4
5
1 If a DFA is too hard, I do an ...
2 CSCI 3130 homework makes me ...
3 If a DFA likes it, it is ...
4 $10000000 = $10?
5 I study 3130 hard because it will make me ...
grep –E `\<.ff.u..t` words
how do you look for...
grep –E `([aeiouy].*){10}` words
grep –E `\<cat.*cat` words
grep –E `\<[^AEIOUYaeiouy]*$` words
Words with at least ten vowels?
Words that start in cat and have another cat
Words without any vowels? [^R] does not contain R
grep –E `\<[^AEIOUYaeiouy]* ([aeiouy][^AEIOUYaeiouy]*){10}$` words
Words with exactly ten vowels?
How grep (could) work
regularexpression
NFA NFAwithout
DFA
text file input
[ab]?, a+, (cat){3} not allowed allowed
input handling matches wholelooks for pattern
output accept/reject finds pattern
differences in class in grep
Implementation of grep
• How do you handle expressions like:
[ab]? zero or one
(cat)+ one or more
a{3} {n} copies
→ ()|[ab]R? → |R
→ (cat)(cat)*R+ → RR*
→ aaaR{n} → RR...R
n times
[^aeiouy] not containing any ?
Example
• The language L of strings that end in 101 is regular
• How about the language L of strings that do not end in 101?
(0+1)*101
Example
• Hint: w does not end in 101 if and only if it ends in:
or it has length 0, 1, or 2
• So L can be described by the regular expression
000, 001, 010, 011, 100, 110 or 111
(0+1)*(000+001+010+010+100+110+111) + + (0 + 1) + (0 + 1)(0 + 1)
Complement
• The complement L of a language L is the set of all strings that are not in L
• Examples ( = {0, 1})– L1 = all strings that end in 101
– L1 = all strings that do not end in 101 = all strings end in 000, …, 111 or have length 0, 1, or 2
– L2 = 1* = {, 1, 11, 111, …}
– L2 = all strings that contain at least one 0 = (0 + 1)*0(0 + 1)*
Example
• The language L of strings that contain 101 is regular
• How about the language L of strings that do not contain 101?
(0+1)*101(0+1)*
You can write a regular expression,but it is a lot of work!
Closure under complement
• To argue this, we can use any of the equivalent definitions for regular languages:
• The DFA definition will be most convenient– We assume L has a DFA, and show L also has a
DFA
regularexpression
DFANFA
If L is a regular language, so is L.
Arguing closure under complement
• Suppose L is regular, then it has a DFA M
• Now consider the DFA M’ with the accepting and rejecting states of M reversed
accepts L
accepts strings not in L
this is exactly L
Food for thought
• Can we do the same thing with an NFA?
1 0
0, 1
q q q (0+1)*10
1 0
0, 1
q q q (0+1)*
NO!
Intersection
• The intersection L L’ is the set of strings that are in both L and L’
• Examples:
• If L, L’ are regular, is L L’ also regular?
L = (0 + 1)*11 L’ = 1* L L’ =
L = (0 + 1)*10 L’ = 1* L L’ =
1*11
∅
Closure under intersection
If L and L’ are regular languages, so is L L’.
regularexpression
DFANFA
• To argue this, we can use any of the equivalent definitions for regular languages:
Suppose L and L’ have DFAs, call them M and M’
Goal: Construct a DFA (or NFA) for L L’
An example
L = even number of 0s
r0 r1
1 10
0
Ms0 s1
0 01
1
M’
L’ = odd number of 1s
L L’ = even number of 0s and odd number of 1s
r0, s1
1
1
1
1
0 0 0 0
r0, s0
r1, s1r1, s0
Closure under intersection
M and M’ DFA for L L’
states Q = {r1, ..., rn}
Q’ = {s1, ..., sn’}
Q × Q’ = {(r1, s1), (r1, s2), ..., (r2, s1), ..., (rn, sn’)}
Whenever M is in state ri and M’ is in state sj,the DFA for L L’ will be in state (ri, sj)
start state
acceptingstates
ri for Msj for M’
(ri, sj)
F for MF’ for M’
F × F’ = {(ri, sj): ri ∈ F, sj ∈ F’}
Closure under intersection
M and M’ DFA for L L’
transitions
si’ sj’
a
ri, si’ rj, sj’ain M
in M’
ri rj
a
Reversal
• The reversal wR of a string w is w written backwards
• The reversal LR of a language L is the language obtained by reversing all its strings
w = cave wR = evac
L = {cat, dog} LR = {tac, god}
Reversal of regular languages
• L = all strings that end in 101 is regular
• How about LR?
• This is the language of all strings beginning in 101
• Yes, because it is represented by
(0+1)*101
101(0+1)*
Closure under reversal
• How do we argue?
If L is a regular language, so is LR.
regularexpression
DFANFA
Arguing closure under reversal
• Take a regular expression E for L
• We will show how to reverse E
• A regular expression can be of the following types:– The special symbols and – Alphabet symbols like a, b– The union, concatenation, or star of simpler
expressions
Proof of closure under reversal
regular expression E
a (alphabet symbol)
E1 + E2
reversal ER
E1E2
E1*
a
E1R + E2
R
E2RE1
R
(E1R)*
A question
If L is regular, is LDUP also regular?
regularexpression
DFANFA?
LDUP = {ww: w L} L = {cat, dog}Ex.
LDUP = {catcat, dogdog}
A question
• Let’s try with regular expression:
• Let’s try with NFA:
q0 q1NFA for L NFA for L
LDUP = LLL = {a, b}
LDUP = {aa, bb}
LL = {aa, ab, ba, bb}
An example
• Let’s try to design an NFA for LDUP
L = 0*1 is regular
LDUP = {11, 0101, 001001, 00010001, ...}
= {0n10n1: n ≥ 0}
LDUP = {1, 01, 001, 0001, ...}