5
CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language Allowed ordering, nesting, repetition, omission of symbols Can automate the process of checking correct syntax Semantics give meaning to the (structured) symbols E.g., kinds of labels, types of variables, layout of classes For example, what does 11 mean? (at least 3 good answers) Separating syntactic and semantic evaluation helps Can isolate the problem of syntactic recognition in an engine Can use the structure produced by the engine directly Sometimes called syntax-directed (compiler in charge)

CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols

Embed Size (px)

Citation preview

Page 1: CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols

CSE 425: Syntax I

Syntax and Semantics• Syntax gives the structure of statements in a language

– Allowed ordering, nesting, repetition, omission of symbols– Can automate the process of checking correct syntax

• Semantics give meaning to the (structured) symbols– E.g., kinds of labels, types of variables, layout of classes– For example, what does 11 mean? (at least 3 good answers)

• Separating syntactic and semantic evaluation helps– Can isolate the problem of syntactic recognition in an engine– Can use the structure produced by the engine directly– Sometimes called syntax-directed (compiler in charge)

Page 2: CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols

CSE 425: Syntax I

Syntax and Lexical Structure• Syntax gives the structure of statements in a language

– E.g., the format of tokens and how they can be arranged– Lexical structure also describes how to recognize them

• Scanning obtains tokens from a stream characters– E.g., whitespace delimited vs. regular-expression based– Tokens include keywords, constants, symbols, identifiers– Usually based on assumption of taking longest substring

• Parsing recognizes more complex expressions– E.g., well-formed statements in logic, arithmetic, etc. – Free-format languages ignore indentation, etc. while fixed

format languages have specific restrictions/requirements

Page 3: CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols

CSE 425: Syntax I

Scanning vs. Parsing Roles

• It is often possible to simplify a grammar’s structure by making its tokens more sophisticated– For example, scanning for the terminal token NUMBER vs.

parsing for the non-terminal number → nonzerodigit digit*

• Such simplification delegates work to a scanner– Often this is a good separation of concerns, especially since

scanning may appropriately specialize it logic, etc.– E.g., a fairly general scanner built from classification

functions (which look for all digits, all alphabetic, etc.) can be re-used or refactored easily for scanning different grammars

– E.g., the C++11 <regex> library is worth studying and using

Page 4: CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols

CSE 425: Syntax I

Regular Expressions, DFAs, NDFAs• Regular expressions capture lexical structure of symbols

that can be built using 3 composition rules– Concatenation (ab) , selection (a | b), repetition (b*)

• Finite automata can recognize regular expressions– Deterministic finite automata (DFAs) associate a unique state with

each sequence generated by a regular expression– Non-deterministic finite automata (NDFAs) let multiple states to be

reached by the same input sequence (adding “choice”)

• Can generate a unique (minimal) DFA in 3 steps– Generate NDFA from the regular expression (Scott pp. 56)– Convert NDFA to (possibly larger) DFA (Scott pp. 56-58)– Minimize the DFA (Scott pp. 59) to get a unique automaton

• C++11 <regex> library automates all this for you

Page 5: CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols

CSE 425: Syntax I

Today’s Studio Exercises• We’ll code up some ideas from Scott Chapter 2.1-2.2

– Looking at mechanisms for recognizing tokens and for parsing basic CFGs with straightforward recursion

– Next studio we’ll look at more complicated variations

• Today’s exercises are all in C++– We’ll write our own code, but check out the <regexp> library

too, since you’ll be allowed to use it for lab assignments!– Please take advantage of the on-line tutorial and reference

manual pages that are linked on the course web site– As always, please ask us for help as needed

• When done, email your answers to the course account with “Syntax Studio I” in the subject line