CSE 425: Syntax I
Syntax and Semantics• Syntax gives the structure of statements in a language
– Allowed ordering, nesting, repetition, omission of symbols– Can automate the process of checking correct syntax
• Semantics give meaning to the (structured) symbols– E.g., kinds of labels, types of variables, layout of classes– For example, what does 11 mean? (at least 3 good answers)
• Separating syntactic and semantic evaluation helps– Can isolate the problem of syntactic recognition in an engine– Can use the structure produced by the engine directly– Sometimes called syntax-directed (compiler in charge)
CSE 425: Syntax I
Syntax and Lexical Structure• Syntax gives the structure of statements in a language
– E.g., the format of tokens and how they can be arranged– Lexical structure also describes how to recognize them
• Scanning obtains tokens from a stream characters– E.g., whitespace delimited vs. regular-expression based– Tokens include keywords, constants, symbols, identifiers– Usually based on assumption of taking longest substring
• Parsing recognizes more complex expressions– E.g., well-formed statements in logic, arithmetic, etc. – Free-format languages ignore indentation, etc. while fixed
format languages have specific restrictions/requirements
CSE 425: Syntax I
Scanning vs. Parsing Roles
• It is often possible to simplify a grammar’s structure by making its tokens more sophisticated– For example, scanning for the terminal token NUMBER vs.
parsing for the non-terminal number → nonzerodigit digit*
• Such simplification delegates work to a scanner– Often this is a good separation of concerns, especially since
scanning may appropriately specialize it logic, etc.– E.g., a fairly general scanner built from classification
functions (which look for all digits, all alphabetic, etc.) can be re-used or refactored easily for scanning different grammars
– E.g., the C++11 <regex> library is worth studying and using
CSE 425: Syntax I
Regular Expressions, DFAs, NDFAs• Regular expressions capture lexical structure of symbols
that can be built using 3 composition rules– Concatenation (ab) , selection (a | b), repetition (b*)
• Finite automata can recognize regular expressions– Deterministic finite automata (DFAs) associate a unique state with
each sequence generated by a regular expression– Non-deterministic finite automata (NDFAs) let multiple states to be
reached by the same input sequence (adding “choice”)
• Can generate a unique (minimal) DFA in 3 steps– Generate NDFA from the regular expression (Scott pp. 56)– Convert NDFA to (possibly larger) DFA (Scott pp. 56-58)– Minimize the DFA (Scott pp. 59) to get a unique automaton
• C++11 <regex> library automates all this for you
CSE 425: Syntax I
Today’s Studio Exercises• We’ll code up some ideas from Scott Chapter 2.1-2.2
– Looking at mechanisms for recognizing tokens and for parsing basic CFGs with straightforward recursion
– Next studio we’ll look at more complicated variations
• Today’s exercises are all in C++– We’ll write our own code, but check out the <regexp> library
too, since you’ll be allowed to use it for lab assignments!– Please take advantage of the on-line tutorial and reference
manual pages that are linked on the course web site– As always, please ask us for help as needed
• When done, email your answers to the course account with “Syntax Studio I” in the subject line