Upload
aquene
View
40
Download
0
Embed Size (px)
DESCRIPTION
Programming Languages Third Edition. Chapter 6 Part I Syntax / Regular Expressions. Objectives. Understand the lexical structure of programming languages Understand regular expressions Read Section 6.1, pp. 204-208. Introduction. Syntax is the structure of a language - PowerPoint PPT Presentation
Citation preview
Programming LanguagesThird Edition
Chapter 6 Part ISyntax / Regular Expressions
Objectives
• Understand the lexical structure of programming languages
• Understand regular expressions• Read Section 6.1, pp. 204-208
Programming Languages, Third Edition 2
Introduction
• Syntax is the structure of a language• Syntax rules are analogous to the grammar rules of
a natural language• John Backus and Peter Naur developed a
notational system for describing these grammars, now called Backus-Naur forms, or BNFs – First used to describe the syntax of Algol60
• Every modern computer scientist needs to know how to read, interpret, and apply BNF descriptions of language syntax
Programming Languages, Third Edition 3
Programming Languages, Third Edition 4
Source Code(your program)
Object Code(machine language)
Compiler
Simple Flowchart for Compilation
CPU executes
Results / Output
Generally speaking,compilation is analogousto book translation(translated as a unit, thengiven to someone to read)
Programming Languages, Third Edition 5
One statement
Intermediate Code(such as byte code)
Some translation
Simple Flowchart for Interpretation using a REPL (as in Racket interactions and Python shell)
Virtual machine executes
Results / Output
Generally speaking,interpretation is analogousto live translation of speech(translated and “given” tosomeone one sentence ata time)
Programming Languages, Third Edition 6
Source Code(your program)
Intermediate Code(such as byte code)
Some translation
Simple Flowchart for Interpretation that seems like compilation (huh?)
Virtual machine executes
Results / Output
Programming Languages, Third Edition 7
Source Code(your program = char stream)
Object Code(machine language)
Scanner / Lexer(lexical analysis)
Flowchart for Compilation – More Details
Lexical items / Tokens
Parser(syntactic analysis)
Parse tree
Intermediate Code
Semantic analysis(analyzes meaning)
Optimization
Lexical Structure of Programming Languages
• Lexical structure: the structure of the tokens, or words, of a language
• Scanning phase: the phase in which a translator collects sequences of characters from the input program and forms them into tokens
• Parsing phase: the phase in which the translator processes the tokens, determining the program’s syntactic structure
Programming Languages, Third Edition 8
Lexical Structure of Programming Languages (cont’d.)
• Tokens generally fall into several categories:– Reserved words (or keywords)– Literals or constants– Special symbols, such as “;” “<=“ “+”– Identifiers
Programming Languages, Third Edition 9
Lexical Structure of Programming Languages (cont’d.)
• Token delimiters (or white space): formatting that affects the way tokens are recognized
• Indentation can be used to determine structure• Free-format language: one in which format has no
effect on program structure other than satisfying the principle of longest substring
• Fixed format language: one in which all tokens must occur in pre-specified locations on the page
• Tokens can be formally described by regular expressions
Programming Languages, Third Edition 10
Example: Scanner’s job
The Java statement:total = total + value;
Looks to compiler like stream of characters:
So scanner has to split this up into tokens:
total = total + value ;
Programming Languages, Third Edition 11
Example: Scanner’s job
The Java statement:if (x==y) a[2]=;
Looks to compiler like stream of characters:
So scanner has to split this up into tokens:
if ( x == y ) a [ 2 ] = ;
Programming Languages, Third Edition 12
i f ( x = = y ) a [ 2 ] = ;
Parser’s Job is to take tokens andsee if they form legal “sentences”
Programming Languages, Third Edition 13
ScanningRegular Expressions
• Metalanguage for describing patterns for strings of characters – metasymbols are
| means choice* means zero or more occurrences+ means one or more occurrences? means one optional occurrence[ ] choose one of list of chars in brackets
can use a range. (period) means one of any character( ) can be used for grouping\ can precede metasymbol with this to use metasymbol in string
Programming Languages, Third Edition 14
Regular Expressions (cont’d.)
• Most modern text editors use regular expressions in text searches
• Utilities such as lex can automatically turn a regular expression description of a language’s tokens into a scanner
Programming Languages, Third Edition 15
Regular Expressions (cont’d.)
• Examples:
[aeiou][aeiouAEIOU][aeiouAEIOU]+[aeiouAEIOU]*(a|b)*c[ab]*c(ab|ba|aa)*c
[A-Z][a-z]*[A-Z]+[a-z][A-Za-z]*[0-9]+[0-9]+(\.[0-9]+)?[a-z].[0-9][^aeiou][a-z]+
Programming Languages, Third Edition 16
Regular Expressions (cont’d.)
• Let’s try writing some for license plates:– Start with VA, followed by zero or more digits– Start with VA, followed by one or more digits– Start with VA, followed by 2 digits, followed by zero
or more lower case letters– Start with V or A, followed by -, followed by 2-4 digits– Start with VA, any case, followed by 2-3 digits or 2-3
letters
Programming Languages, Third Edition 17
Regular Expressions (cont’d.)
• Let’s try writing some:– Signed integers, sign not optional– Signed integers, sign optional– Signed integers, sign optional, no signed zero– Signed integers, sign optional, no signed zero, but
allow leading zeroes. (+0, -0 are invalid, but 0, +005, -06 are valid)
Programming Languages, Third Edition 18
Regular Expression Fun
• Regular Expression Crossword Puzzles– http://regexcrossword.com/
Programming Languages, Third Edition 19