19
Programming Languages Third Edition Chapter 6 Part I Syntax / Regular Expressions

Programming Languages Third Edition

  • Upload
    aquene

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Programming Languages Third Edition. Chapter 6 Part I Syntax / Regular Expressions. Objectives. Understand the lexical structure of programming languages Understand regular expressions Read Section 6.1, pp. 204-208. Introduction. Syntax is the structure of a language - PowerPoint PPT Presentation

Citation preview

Page 1: Programming Languages Third Edition

Programming LanguagesThird Edition

Chapter 6 Part ISyntax / Regular Expressions

Page 2: Programming Languages Third Edition

Objectives

• Understand the lexical structure of programming languages

• Understand regular expressions• Read Section 6.1, pp. 204-208

Programming Languages, Third Edition 2

Page 3: Programming Languages Third Edition

Introduction

• Syntax is the structure of a language• Syntax rules are analogous to the grammar rules of

a natural language• John Backus and Peter Naur developed a

notational system for describing these grammars, now called Backus-Naur forms, or BNFs – First used to describe the syntax of Algol60

• Every modern computer scientist needs to know how to read, interpret, and apply BNF descriptions of language syntax

Programming Languages, Third Edition 3

Page 4: Programming Languages Third Edition

Programming Languages, Third Edition 4

Source Code(your program)

Object Code(machine language)

Compiler

Simple Flowchart for Compilation

CPU executes

Results / Output

Generally speaking,compilation is analogousto book translation(translated as a unit, thengiven to someone to read)

Page 5: Programming Languages Third Edition

Programming Languages, Third Edition 5

One statement

Intermediate Code(such as byte code)

Some translation

Simple Flowchart for Interpretation using a REPL (as in Racket interactions and Python shell)

Virtual machine executes

Results / Output

Generally speaking,interpretation is analogousto live translation of speech(translated and “given” tosomeone one sentence ata time)

Page 6: Programming Languages Third Edition

Programming Languages, Third Edition 6

Source Code(your program)

Intermediate Code(such as byte code)

Some translation

Simple Flowchart for Interpretation that seems like compilation (huh?)

Virtual machine executes

Results / Output

Page 7: Programming Languages Third Edition

Programming Languages, Third Edition 7

Source Code(your program = char stream)

Object Code(machine language)

Scanner / Lexer(lexical analysis)

Flowchart for Compilation – More Details

Lexical items / Tokens

Parser(syntactic analysis)

Parse tree

Intermediate Code

Semantic analysis(analyzes meaning)

Optimization

Page 8: Programming Languages Third Edition

Lexical Structure of Programming Languages

• Lexical structure: the structure of the tokens, or words, of a language

• Scanning phase: the phase in which a translator collects sequences of characters from the input program and forms them into tokens

• Parsing phase: the phase in which the translator processes the tokens, determining the program’s syntactic structure

Programming Languages, Third Edition 8

Page 9: Programming Languages Third Edition

Lexical Structure of Programming Languages (cont’d.)

• Tokens generally fall into several categories:– Reserved words (or keywords)– Literals or constants– Special symbols, such as “;” “<=“ “+”– Identifiers

Programming Languages, Third Edition 9

Page 10: Programming Languages Third Edition

Lexical Structure of Programming Languages (cont’d.)

• Token delimiters (or white space): formatting that affects the way tokens are recognized

• Indentation can be used to determine structure• Free-format language: one in which format has no

effect on program structure other than satisfying the principle of longest substring

• Fixed format language: one in which all tokens must occur in pre-specified locations on the page

• Tokens can be formally described by regular expressions

Programming Languages, Third Edition 10

Page 11: Programming Languages Third Edition

Example: Scanner’s job

The Java statement:total = total + value;

Looks to compiler like stream of characters:

So scanner has to split this up into tokens:

total = total + value ;

Programming Languages, Third Edition 11

Page 12: Programming Languages Third Edition

Example: Scanner’s job

The Java statement:if (x==y) a[2]=;

Looks to compiler like stream of characters:

So scanner has to split this up into tokens:

if ( x == y ) a [ 2 ] = ;

Programming Languages, Third Edition 12

i f ( x = = y ) a [ 2 ] = ;

Page 13: Programming Languages Third Edition

Parser’s Job is to take tokens andsee if they form legal “sentences”

Programming Languages, Third Edition 13

Page 14: Programming Languages Third Edition

ScanningRegular Expressions

• Metalanguage for describing patterns for strings of characters – metasymbols are

| means choice* means zero or more occurrences+ means one or more occurrences? means one optional occurrence[ ] choose one of list of chars in brackets

can use a range. (period) means one of any character( ) can be used for grouping\ can precede metasymbol with this to use metasymbol in string

Programming Languages, Third Edition 14

Page 15: Programming Languages Third Edition

Regular Expressions (cont’d.)

• Most modern text editors use regular expressions in text searches

• Utilities such as lex can automatically turn a regular expression description of a language’s tokens into a scanner

Programming Languages, Third Edition 15

Page 16: Programming Languages Third Edition

Regular Expressions (cont’d.)

• Examples:

[aeiou][aeiouAEIOU][aeiouAEIOU]+[aeiouAEIOU]*(a|b)*c[ab]*c(ab|ba|aa)*c

[A-Z][a-z]*[A-Z]+[a-z][A-Za-z]*[0-9]+[0-9]+(\.[0-9]+)?[a-z].[0-9][^aeiou][a-z]+

Programming Languages, Third Edition 16

Page 17: Programming Languages Third Edition

Regular Expressions (cont’d.)

• Let’s try writing some for license plates:– Start with VA, followed by zero or more digits– Start with VA, followed by one or more digits– Start with VA, followed by 2 digits, followed by zero

or more lower case letters– Start with V or A, followed by -, followed by 2-4 digits– Start with VA, any case, followed by 2-3 digits or 2-3

letters

Programming Languages, Third Edition 17

Page 18: Programming Languages Third Edition

Regular Expressions (cont’d.)

• Let’s try writing some:– Signed integers, sign not optional– Signed integers, sign optional– Signed integers, sign optional, no signed zero– Signed integers, sign optional, no signed zero, but

allow leading zeroes. (+0, -0 are invalid, but 0, +005, -06 are valid)

Programming Languages, Third Edition 18

Page 19: Programming Languages Third Edition

Regular Expression Fun

• Regular Expression Crossword Puzzles– http://regexcrossword.com/

Programming Languages, Third Edition 19