Download ppt - CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols

CSE 425: Syntax I

Syntax and Semantics• Syntax gives the structure of statements in a language

– Allowed ordering, nesting, repetition, omission of symbols– Can automate the process of checking correct syntax

• Semantics give meaning to the (structured) symbols– E.g., kinds of labels, types of variables, layout of classes– For example, what does 11 mean? (at least 3 good answers)

• Separating syntactic and semantic evaluation helps– Can isolate the problem of syntactic recognition in an engine– Can use the structure produced by the engine directly– Sometimes called syntax-directed (compiler in charge)

CSE 425: Syntax I

Syntax and Lexical Structure• Syntax gives the structure of statements in a language

– E.g., the format of tokens and how they can be arranged– Lexical structure also describes how to recognize them

• Scanning obtains tokens from a stream characters– E.g., whitespace delimited vs. regular-expression based– Tokens include keywords, constants, symbols, identifiers– Usually based on assumption of taking longest substring

• Parsing recognizes more complex expressions– E.g., well-formed statements in logic, arithmetic, etc. – Free-format languages ignore indentation, etc. while fixed

format languages have specific restrictions/requirements

CSE 425: Syntax I

Scanning vs. Parsing Roles

• It is often possible to simplify a grammar’s structure by making its tokens more sophisticated– For example, scanning for the terminal token NUMBER vs.

parsing for the non-terminal number → nonzerodigit digit*

• Such simplification delegates work to a scanner– Often this is a good separation of concerns, especially since

scanning may appropriately specialize it logic, etc.– E.g., a fairly general scanner built from classification

functions (which look for all digits, all alphabetic, etc.) can be re-used or refactored easily for scanning different grammars

– E.g., the C++11 <regex> library is worth studying and using

CSE 425: Syntax I

Regular Expressions, DFAs, NDFAs• Regular expressions capture lexical structure of symbols

that can be built using 3 composition rules– Concatenation (ab) , selection (a | b), repetition (b*)

• Finite automata can recognize regular expressions– Deterministic finite automata (DFAs) associate a unique state with

each sequence generated by a regular expression– Non-deterministic finite automata (NDFAs) let multiple states to be

reached by the same input sequence (adding “choice”)

• Can generate a unique (minimal) DFA in 3 steps– Generate NDFA from the regular expression (Scott pp. 56)– Convert NDFA to (possibly larger) DFA (Scott pp. 56-58)– Minimize the DFA (Scott pp. 59) to get a unique automaton

• C++11 <regex> library automates all this for you

CSE 425: Syntax I

Today’s Studio Exercises• We’ll code up some ideas from Scott Chapter 2.1-2.2

– Looking at mechanisms for recognizing tokens and for parsing basic CFGs with straightforward recursion

– Next studio we’ll look at more complicated variations

• Today’s exercises are all in C++– We’ll write our own code, but check out the <regexp> library

too, since you’ll be allowed to use it for lab assignments!– Please take advantage of the on-line tutorial and reference

manual pages that are linked on the course web site– As always, please ask us for help as needed

• When done, email your answers to the course account with “Syntax Studio I” in the subject line