View
354
Download
3
Embed Size (px)
Citation preview
Compilers
WELCOME TO A JOURNEY TO
CS419 Lecture 3
Scanning (Lexical Analysis) using Regular expressions
Cairo UniversityFCI
Dr. Hussien SharafComputer Science [email protected]
2
PART ONE
Dr. Hussien M. Sharaf
SCANNING A scanner reads a stream of characters
and puts them together into some meaningful (with respect to the source language) units called tokens.
It produces a stream of tokens for the next phase of compiler.
scannerStream of characters
Stream of tokens
3Dr. Hussien M. Sharaf
WHAT IS LEXICAL ANALYSIS Lexical Analysis (Scanning): is the task of reading a
stream of input characters and dividing them into tokens(words) that belong to a certain language.
a[index]=4+2; Scanner
Stream
a Identifier[ left bracketindex Identifier] left bracket=assignment4
Number+
operator2
Number
4Dr. Hussien M. Sharaf
Responsibility: accepts and splits a stream into tokens according to rules defined by the source code language.
REGULAR EXPRESSIONS RE is a language for describing simple
languages and patterns. Algorithm for using RE
1. Define a pattern: S1StudentnameS22. Loop 3. Find next pattern 4. Store StudentName into DB or encrypt
StudentName5. Until no match
REs are used in applications that involve file parsing and text matching.
Many Implementations have been made for RE.5Dr. Hussien M. Sharaf
HOW CAN RE SERVE IN LEXICAL ANALYSIS?
Algorithm for using RE in LA1. Define a pattern for each valid statement: 2. Loop 3. Find next pattern and call it Token4. Parse(Token)5. Until no match6. If Not EOF stream then
return Errorelse
return Success
6Dr. Hussien M. Sharaf
RE IN C++ BOOST LIBRARY //Pattern matching in a String // Created by Flavio Castelli <[email protected]> // distrubuted under GPL v2 license
#include <boost/regex.hpp> #include <string> int main() { boost::regex pattern(“Ali”,boost::regex_constants::icase|boost::regex_constants::perl); std::string stringa ("Searching for Ali, Is there another Ali? Yes
there is another Ali"); int count =0;while(boost::regex_search (stringa, pattern,
boost::regex_constants::format_perl)) {count++;}printf (“%d found\n“, count); return 0; }
Please check [http://flavio.castelli.name/2006/02/16/regexp-with-boost/]
7Dr. Hussien M. Sharaf
Output is : “3 found”
RE IN C++ STL NEEDS EXTRA FEATURE PACK
#include <regex> #include <iostream> #include <string> bool is_email_valid(const std::string& email) { // define a regular expression
const std::tr1::regex pattern("(\\w+)(\\.|_)?(\\w*)@(\\w+)(\\.(\\w+))+");
// try to match the string with the regular expression
return std::tr1::regex_match(email, pattern); } Please check
http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339
8Dr. Hussien M. Sharaf
RE IN C++ STL (CONT.)
std::string str("abc + (inside brackets)dfsd");std::smatch m;std::regex_search(str, m, std::regex(“b"));if (m[0].matched)std::cout << "Found " << m.str(0) << '\n';elsestd::cout << "Not found\n“;
Please check http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339
9Dr. Hussien M. Sharaf
RE IN PYTHONExample for Python REimport re line = "Cats are smarter than dogs"matchObj = re.search( r'(.*) are(\.*)',
line, re.M|re.I) if matchObj:
print "matchObj.group() : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2)
else: print "No match!!"
Please check http://www.tutorialspoint.com/python/python_reg_expressions.htm
10Dr. Hussien M. Sharaf
Some notifications of Python RE\d Matches any decimal digit; this is
equivalent to the class [0-9]. \D Matches any non-digit character; this
is equivalent to the class [^0-9]. \s Matches any whitespace character;
this is equivalent to the class [ \t\n\r\f\v].
\S Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].
\w Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].
\W Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].
OTHER CODE SAMPLES
Other samples can be check at: Boost Library1. http://stackoverflow.com/questions/5804453/c-regular
-expressions-with-boost-regex2. http://www.codeproject.com/KB/string/regex__.aspx3. http://www.boost.org/doc/libs/1_43_0/more/getting_st
arted/windows.html#build-from-the-visual-studio-ide STL Library1. http://www.codeproject.com/KB/recipes/rexsearch.asp
x2. http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.
php/c15339[recommended]
3. http://www.microsoft.com/download/en/confirmation.aspx?id=6922 [feature pack VS2008 Pack]
4. http://channel9.msdn.com/Shows/Going+Deep/C9-Lectures-Stephan-T-Lavavej-Standard-Template-Library-STL-8-of-n [Video]
11Dr. Hussien M. Sharaf
12
PART TWO
Dr. Hussien M. Sharaf
DEFINITION OF A LANAGUAGE “L”
∑ is the set of alphabets
1. Let ∑ ={x}, Then L = ∑* In RE, we write x* 2. Let ∑ ={x, y}, Then L = ∑* In RE, we write (x+y)* 3. Kleene’s star “*” means any
combination of letters of length zero or more.
Note: Please study the appendix at the end of the lecture
13Dr. Hussien M. Sharaf
RE RULES
Given an alphabet ∑, the set of regular expressions is defined by the following rules.1. For every letter in ∑, the letter written in bold is a
regular expression. Λ- lamda or ε-epsilon is a regular expression. Λ or ε means an empty letter.
2. If r1 and r2 are regular expressions, then so are:1. (r1)
2. r1 r2
3. r1+r2
4. r1* and also r2* NOTE 1: r1
+ is not a RENOTE 2: spaces are ignored as long as they are not included in ∑
3. Nothing else is a regular expression.14Dr. Hussien M. Sharaf
RE-1
Example 1∑ ={a, b} Formally describe all words with a
followed by any number of b’sL = a b* = ab* Give examples for words in L{a ab abb abbb …..}
15Dr. Hussien M. Sharaf
RE-2
Example 2∑ ={a, b} Formally describe the language that
contains nothing and contains words where any a must be followed by one or more b’s
L = b*(abb*)* OR (b+ab)* Give examples for words in L{Λ ab abb abababb ….. b bbb…}
16Dr. Hussien M. Sharaf
RE-3
Example ∑ ={a, b} Formally describe all words with a
followed by one or more b’sL = a b+ = abb*
Give examples for words in L{ab abb abbb …..}
17Dr. Hussien M. Sharaf
RE-4
Example ∑ ={a, b, c} Formally describe all words that start
with an a followed by any number of b’s and then end with c.
L = a b*c Give examples for words in L{ac abc abbc abbbc …..}
18Dr. Hussien M. Sharaf
RE-5
Example ∑ ={a, b} Formally describe all words where a’s if
any come before b’s if any.L = a* b*
Give examples for words in L{Λ a b aa ab bb aaa abb abbb bbb…..}NOTE: a* b* ≠ (ab)*
because first language does not contain abab but second language has.Once single b is detected then no a‘s can be added
19Dr. Hussien M. Sharaf
RE-6
Example ∑ ={a} Formally describe all words where
count of a is odd.L = a(aa)* OR (aa)*a Give examples for words in L{a aaa aaaaa …..}
20Dr. Hussien M. Sharaf
RE-7.1
Example ∑ ={a, b, c} Formally describe all words where
single a or c comes in the start then odd number of b’s.
L = (a+c)b(bb)* Give examples for words in L{ab cb abbb cbbb …..}
21Dr. Hussien M. Sharaf
RE-7.2
Example ∑ ={a, b, c} Formally describe all words where
single a or c comes in the start then odd number of b’s in case of a and zero or even number of b’s in case of c .
L = ab(bb)* +c(bb)* Give examples for words in L{ab c abbb cbb abbbbb …..}
22Dr. Hussien M. Sharaf
RE-8
Example ∑ ={a, b, c} Formally describe all words where
one or more a or one or more c comes in the start then one or more b’s.
L = (a+c) + b+= (aa*+cc*) bb* Give examples for words in L{ab cb aabb cbbb …..}
23Dr. Hussien M. Sharaf
RE-9
Example ∑ ={a, b} Formally describe all words with length three.L = (a+b) 3 =(a+b) (a+b) (a+b) List all words in L{aaa aab aba baa abb bab bba bbb} What is the count of words of length 4?16 = 24
What is the count of words of length 44?244
24Dr. Hussien M. Sharaf
ASSIGNMENTS1. [One week] Solve Compilers-Sheet_0-
General.pdf2. [One week] Solve Compilers_Sheet_1_RE.pdf3. [Two weeks] Implement an email validator
using Python.a. Build a GUI using python Tkinter.b. The GUI must include at least two labels, a
TextBox and a Button.c. The user is given a message on a label stating
whether the email is a valid email or not.
25Dr. Hussien M. Sharaf
DEADLINE Thursday 28th of Feb is the deadline. No excuses. Don’t wait until last day. TAs can help you to the highest limit
within the next 3 days.
Dr. Hussien M. Sharaf 26
APPENDIX
27Dr. Hussien M. Sharaf
APPENDIX MATHEMATICAL NOTATIONS AND
TERMINOLOGY
Dr. Hussien M. Sharaf
SETS-[1]
A set is a group of elements represented as a unit.
For exampleS ={a, b, c} a set of 3 elements
Elements included in curly brackets { }a S a belongs to S
f S f does NOT belong to S
Dr. Hussien M. Sharaf
SETS-[2] If S ={a, b, c} and
T = {a, b} Then T ST is a subset of S
T S ={a, b}T intersects S ={a,
b}
T S ={a, b, c}T Union S ={a, b, c}
Venn diagram for S and T
Dr. Hussien M. Sharaf
SEQUENCES AND TUPLES
A sequence is a list of elements in some order.(2, 4, 6, 8, ….) parentheses
A finite sequence of K-elements is called k-tuple.
(2, 4) 2-tuple or pair(2, 4, 6) 3-tuple
A Cartesian product of S and P (S x P)is a set of 2-tuples/pairs (i, j)where i S and j P
Dr. Hussien M. Sharaf
EXAMPLE FOR CARTESIAN PRODUCT
Example (1)If N ={1,2,…} set of integers; O ={+, -}
N x O ={(1,+), (1,-), (2,+), (2,-), …..}Meaningless? Example (2)
N x O x N={(1,+,1), (1,-,1), (2,+,1), (2,-,2), …..}
Does this make sense?
Dr. Hussien M. Sharaf
CONTINUE CARTESIAN PRODUCT
Example (3)If A ={a, b,…, z} set of English alphabets;
A x A ={(a, a), (a, b), ..,(d, g), (d, h), …(z, z)}
These are all pairs of set A. Example (4)If U={0,1,2,3…,9} set of digits then U x U
x U ={(1,1,1), (1,1,2),...,(7,4,1), ….., (9,9,9)}
Dr. Hussien M. Sharaf
Example (5)If U={0, 1, 2, 3…, 9} set of digits then U x… x U ={(1,..,1), (1,..,2),...,(7,..,1), ...,
(9,..,9)}
Continue Cartesian Product
nCan be written as Un
Dr. Hussien M. Sharaf
RELATIONS AND FUNCTIONS A relation is more general than a function. Both maps a set of elements called
domain to another set called co-domain. In case of functions the co-domain can be
called range. R : DC A relation has no restrictions. f : D R A function can not map one element to
two differnet elements in the range.
Dr. Hussien M. Sharaf
SURJECTIVE FUNCTION
t1
T
t3
t2
P1
P2
Pn
P4
P3
s1
s3
s2
t1
t3
t2
S T
Bijective function
Many planes fly at the same time
Only one plane lands on one runway at a
time
Dr. Hussien M. Sharaf
WHAT IS THE USE OF FUNCTIONS IN CS?
Helps to describe the transition function that transfer the computing device from one state to another.
Any computing device must have clear states.
s1
s3
s2
Ds2
shalt
s3
R
Dr. Hussien M. Sharaf
GRAPHS
Is a visual representation of a set and a relation of this set over itself.
G = (V, E)V ={1, 2, 3, 4, 5}E = {(i, j) and (j, i)| i, j
belongs to V}E is a set of pairs ={(1,
3), (3, 1) …(5, 4), (4, 5)}
1
35
2 4
Dr. Hussien M. Sharaf
GRAPHS CONSTRUCT Is there a formal language
to describe a graph? G =(V, E)Where : V is a set of n vertices
={i| i < n-1} E is a set of edges. Each
edge is a pair of elements in V={(i, j), (j, i)|i, j V}
or={(i, j) |i, j V }
1
35
2 4
Dr. Hussien M. Sharaf
DEFINITIONS
(S Alphabet) : a finite set of letters, denoted S
Letter: an element of an alphabet SWord: a finite sequence of letters from
the alphabet SL (empty string): a word without letters.Language S * (Kleene ‘s Star): the set
of all words on S
Dr. Hussien M. Sharaf
STRINGS AND LANGUAGES
A string w1 over an alphabet Σ is a finite sequence of symbols from that alphabet.
1. Σ: is a set of symbols i.e. {a, b, c, …, z} English letters;{0,1, 2,…,9,.} digits of Arabic numbers{AM, PM}different clocking system{1, 2, …, 12}hours of a clock;
Dr. Hussien M. Sharaf
STRINGS -2
2.1 String: is a sequence of Σ (sigma) symbols
Σ Example Description
Σ to the power?
{a, b, c, …, z}
apple English string
Σ5
{0,1, 2,…,9,.}
35 the oldest age for girls
Σ2
{AM, PM} PM clocking system
Σ1
{1, 2, …, 12}
12 a specific hour in the day
Σ1 or Σ
2
Dr. Hussien M. Sharaf
STRINGS - 3
2.2 Empty String is Λ (Lamda) is of length zero
Σ0 = Λ
2.3 Reverse(xyz) =zyx2.4 Palindrome is a string whose
reverse is identical to itself.If Σ = {a, b} thenPALINDROME ={Λ and all strings x such thatreverse(x) = x }radar, level, reviver, racecar,madam, pop and noon.
Dr. Hussien M. Sharaf
STRINGS - 4
2.5 Kleene star * or Kleene closureis similar to cross product of a set/string over itself.
If Σ = {x}, then Σ* = {Λ x xx xxx ….}
If Σ = {x}, then Σ+ = {x xx xxx ….}
If S = {w1 , w2 , w3 } then
S* ={Λ, w1 , w2 , w3 , w1w1 , w1w2 , w1w3 , ….}
S+ ={w1 , w2 , w3 , w1w1 , w1w2 , w1w3 , ….}
Note1: if w3 =Λ, then Λ S+
Note2: S* = S
**
Dr. Hussien M. Sharaf
PART THREEEXERCISES
Dr. Hussien M. Sharaf
EXERCISE 1
Assume Σ={0, 1}1. How many elements are there in Σ2?Length(Σ) X Length(Σ) = 2 X 2 =42. How many combinations of in Σ3?2 X 2 X 2 = 23 3. How many elements are there in Σn?(Length(Σ))n= 2n
Dr. Hussien M. Sharaf
EXERCISE 2
Assume A1={AM, PM}, A2 = {1, 2, …59}, A3 = {1, 2, …12}1. How many elements are there in A1 A3?Length(A1) X Length(A3) = 2 X 12 =24
2. How many elements are there in A1 A2 A3)?
Length(A1) X Length(A3) = 2 X 59 X 12 = 1416
Dr. Hussien M. Sharaf
EXERCISE 3
Assume L1={Add, Subtract}, DecimalDigits = {0,1, 2, …9}
1. Construct integer numbers out of L2.DecimalDigits +
-A number of any length of digits.2. Construct a language for assembly
commands from L1 and DecimalDigits .L1 DecimalDigits + {,} DecimalDigits + - Commands in form of Add 1000, 555
Dr. Hussien M. Sharaf
EXERCISE 4
Assume HexaDecimal = {0,1, …9,A,B,C,D,E,F}1. How many HexaDecimals of length 4?
164 2. How many HexaDecimals of length n?
16n 3. How many elements are there in
{0, 1}8 ?28 = 256
Dr. Hussien M. Sharaf
50
THANK YOU
Dr. Hussien M. Sharaf