View
217
Download
3
Category
Preview:
Citation preview
2 CS 471 – Fall 2007
Semantic Analysis
Source code
Lexical Analysis
Parsing
Semantic Analysis
Valid programs: decorated AST
lexical errors
syntax errors
semantic errors
tokens
AST
3 CS 471 – Fall 2007
Goals of a Semantic Analyzer
Compiler must do more than recognize whether a sentence belongs to the language…
• Find all possible remaining errors that would make program invalid
– undefined variables, types– type errors that can be caught statically
• Figure out useful information for later phases– types of all expressions– data layout
4 CS 471 – Fall 2007
Semantic Actions
Can do useful things with the parsed phrases– Each terminal and nonterminal may be
associated with type, e.g. exp: INT type is int– For rule: A B C D •Type must match A•Value can be built with BCD
5 CS 471 – Fall 2007
Semantic Actions
Semantic action executed when grammar production is reduced
• Recursive-descent parser: semantic code interspersed with control flow
• Yacc: fragments of C code attached to a grammar production
6 CS 471 – Fall 2007
Interpreter
Could develop an interpreter that executes the program as part of the semantic actions!
Example Grammar:
E id
E E + E
E E – E
E E * E
E -E
7 CS 471 – Fall 2007
Unions in Yacc
%union allows us to declare a union datatype
used to package the types/attributes of symbols
%union {
int pos;
int ival;
string sval;
struct {
int intval;
enum Types valtype;
} constantval;
A_exp exp;
}
Exported asYYSTYPE
8 CS 471 – Fall 2007
Types in Yacc
Using the values of union structs, tell Yacc the types
Terminals
%token <sval> ID STRING
%token <ival> INT
%token <pos> COMMA SEMI LBRACE RBRACE …
And Nonterminals (use %type)
%type <exp> expression program
LHS of productiontype
9 CS 471 – Fall 2007
Symbols in Yacc
•The symbol $n (n > 0) refers to the attribute of nth symbol on the RHS
•The symbol $$ refers the attribute of the LHS
•The symbol $n (n 0) refers to contextual information
Note: actions in middle contribute as a symbol!
expr : expr1 PLUS expr2
$$ $1 $3
10 CS 471 – Fall 2007
Interpreter in Yacc
%{ declarations of yylex and yyerror %}%union {int num; string id}% token <num> INT% token <id> ID% type <num> exp% start exp
%left PLUS MINUS%left TIMES%left UMINUS%%
[please fill in solution]
E id E E + EE E – EE E * EE -E
Recall
expr : expr1 PLUS expr2
$$ $1 $3
11 CS 471 – Fall 2007
Internally: A Semantic Stack
Implemented using a stack parallel to the state stack
Stack Input Action
1 + 2 * 3 $ shift
INT: 1 + 2 * 3 $ reduce
exp: 1 + 2 * 3 $ shift
exp: 1 +: 2 * 3 $ shift
exp: 1 +: INT: 2 * 3 $ reduce
exp: 1 +: exp: 2 3 $ shift
exp: 1 +: exp: 2 *: $ shift
exp: 1 +: exp: 2 *: INT: 3 $ reduce
exp: 1 +: exp: 2 *: exp: 3 $ reduce
exp: 1 +: exp: 6 $ reduce
exp: 7 $ accept
12 CS 471 – Fall 2007
Inlined TypeChecker and CodeGen
You can even type check and generate code:
expr : expr PLUS expr {
if ($1.type == $3.type &&
($1.type == IntType ||
$1.type == RealType)) $$.type = $1.type
else error(“+ applied on wrong type!”);
GenerateAdd($1, $3, $$);
}
13 CS 471 – Fall 2007
Problems
•Difficult to read
•Difficult to maintain
•Compiler must analyze program in order parsed
•Instead … we split up tasks
14 CS 471 – Fall 2007
Compiler ‘main program’
void Compile() {
TokenStream l = Lexer(input);
AST tree = Parser(l);
if (TypeCheck(tree))
IR ir = genIntermediateCode(tree);
emitCode(ir);
}
}
15 CS 471 – Fall 2007
Thread of control
Input Stream
Lexer
Parser
characters
tokens
AST
compile
parse
getToken
readStream
AST
16 CS 471 – Fall 2007
Producing the Parse Tree
Separates issues of syntax (parsing) from issues of semantics (type checking, translation to machine code)
• One leaf for every token
• One internal node for every reduction during parsing
• Concrete parse tree represents concrete syntax
But … parse tree has problems
• Punctuation tokens redundant
• Structure of the tree conveys this info
Enter the Abstract Syntax Tree
17 CS 471 – Fall 2007
AST
• Abstract Syntax Tree is a tree representation of the program. Used for
– semantic analysis (type checking)– some optimization (e.g. constant folding)– intermediate code generation (sometimes
intermediate code = AST with somewhat different set of nodes)
• Compiler phases = recursive tree traversals
18 CS 471 – Fall 2007
Do We Need An AST?
• Old-style compilers: semantic actions generate code during parsing
Problems:
• hard to maintain
• limits language features
• not modular!
expr ::= expr PLUS expr {: emitCode(add); :}input
parser
code
stack
19 CS 471 – Fall 2007
Interesting Detour
•Old compilers didn’t create ASTs … not enough memory to store entire program
•Can also see reasons for C requiring forward declarations - avoids an extra compilation pass
20 CS 471 – Fall 2007
Positions
In one pass compiler – errors reported using position of the lexer as approximation (global var)
Abstract syntax data structures must have pos fields
• Line number
• Char number
•Line number is unambiguous
•Char number is a matter of style
21 CS 471 – Fall 2007
Abstract Syntax for Tiger
/* absyn.h */
typedef struct A_var_ * A_var;
struct A_var_
{ enum {A_simpleVar,A_fieldVar,A_subscriptVar}kind;
A_pos pos;
union {S_symbol simple;
struct {A_var var;
S_symbol sym;} field;
struct {A_var var;
A_exp exp;} subscript;
} u;
};
22 CS 471 – Fall 2007
More Syntax (Constructors…p.98)
A_var A_SimpleVar(A_pos pos, S_symbol sym);
…
A_exp A_WhileExp(A_pos pos, A_exp test, A_exp body);
…
A_expList A_ExpList(A_exp head, A_expList tail);
23 CS 471 – Fall 2007
Tiger Program
(a := 5; a+1) translates to:
A_SeqExp(2,
A_ExpList(A_AssignExp(4,
A_SimpleVar(2,
S_Symbol(“a”)), A_IntExp(7,5)),
A_ExpList((A_OpExp(11,A_plusOp,
A_VarExp(A_SimpleVar(10,
S_Symbol(“a”))),A_IntExp(12,1))),
NULL)))
• AssignExp choose column of “:=“ for pos
• OpExp choose column of “+” for pos
24 CS 471 – Fall 2007
Some Odd Tiger Features
Tiger allows mutually recursive declarations:
let var a + 5
function f() : int = g(a)
function g(i: int) = f()
in f()
end
Thus: FunctionDec constructor takes a list of functions
Recommended