View
216
Download
1
Category
Preview:
Citation preview
1
CS 153: Concepts of Compiler DesignAugust 31 Class Meeting
Department of Computer ScienceSan Jose State University
Fall 2015Instructor: Ron Mak
www.cs.sjsu.edu/~mak
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
2
Conceptual Design (Version 3)
A compiler and an interpreter can both use the
same front end and intermediate tier.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
3
Three Java Packages
TO:
UML package andclass diagrams.
Package
Class
FROM:
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
4
Front End Class Relationships
+ public
- private
# protected
~ package
“owns a”
transientrelationship
abstractclass
These four framework classesshould be source language-independent.
class
field
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
5
Messages from the Front End
The Parser generates messages. Syntax error messages Parser summary
number of source lines parsed number of syntax errors total parsing time
The Source generates messages. For each source line:
line number contents of the line
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
6
Front End Messages, cont’d
We want the message producers (Parser and Source) to be loosely-coupled from the message listeners.
The producers shouldn’t care who listens to their messages.
The producers shouldn’t care what the listeners do with the messages.
The listeners should have the flexibility to do whatever they want with the messages.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
7
Front End Messages, cont’d
Producers implement the MessageProducer interface.
Listeners implement the MessageListener interface.
A listener registers its interest in the messages from a producer.
Whenever a producer generates a message, it “sends” the message to all of its registered listeners.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
8
Front End Messages, cont’d
A message producer can delegate message handling to a MessageHandler.
This is the Observer Design Pattern.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
9
Message Implementation Message producers
implement the MessageProducer interface.
Message listeners implement the MessageListener interface.
A message producer can delegate message handling to a MessageHandler.
Each Message has a message type and a body.
“implements”
multiplicity“zero or more”
This appears to be a lot of extra work, but it will be easy to use and it will pay back large dividends.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
10
Two Message Types
SOURCE_LINE message the source line number text of the source line
PARSER_SUMMARY message number of source lines read number of syntax errors total parsing time
By convention, the message producers and the message listeners agree on the format and content of the messages.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
11
Good Framework Symmetry
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
12
An Apt Quote?
Before I came here, I was confused about this subject. Having listened to your lecture, I am still confused, but on a higher level. Enrico Fermi, physicist, 1901-1954
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
13
Pascal-Specific Front End Classes PascalParserTD
is a subclass of Parser and implements the parse() and getErrorCount() methods for Pascal. TD for “top down”
PascalScanner is a subclass of Scanner and implements the extractToken() method for Pascal.
StrategyDesign Pattern
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
14
The Pascal Parser Class
The initial version of method parse() does hardly anything, but it forces the scanner into action and serves our purpose of doing end-to-end testing.
public void parse() throws Exception{ Token token; long startTime = System.currentTimeMillis();
while (!((token = nextToken()) instanceof EofToken)) {}
// Send the parser summary message. float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; sendMessage(new Message(PARSER_SUMMARY, new Number[] {token.getLineNumber(), getErrorCount(), elapsedTime}));}
What does thiswhile loop do?
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
15
The Pascal Scanner Class The initial version of method extractToken() doesn’t
do much either, other than create and return either a default token or the EOF token.
protected Token extractToken() throws Exception{ Token token; char currentChar = currentChar();
// Construct the next token. The current character determines the // token type. if (currentChar == EOF) { token = new EofToken(source); } else { token = new Token(source); }
return token;}
Remember that the Scannermethod nextToken() calls theabstract method extractToken().
Here, the Scanner subclassPascalScanner implementsmethod extractToken().
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
16
The Token Class
The Token class’s default extract() method extracts just one character from the source. This method will be overridden by the various
token subclasses. It serves our purpose of doing end-to-end testing.
protected void extract() throws Exception{ text = Character.toString(currentChar()); value = null;
nextChar(); // consume current character}
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
17
The Token Class, cont’d
A character (or a token) is “consumed” after it has been read and processed, and the next one is about to be read.
If you forget to consume, you will loop forever on the same character or token.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
18
A Front End Factory Class
A language-specific parser goes together with a scanner for the same language.
But we don’t want the framework classes to be tied to a specific language. Framework classes should be language-independent.
We use a factory class to create a matching parser-scanner pair.
Factory MethodDesign Pattern
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
19
A Front End Factory Class, cont’d
Good:
Arguments to the createParser() method enable it to create and return a parser bound to an appropriate scanner.
Variable parser doesn’t have to know what kind of parser subclass the factory created.
Once again, the idea is to maintain loose coupling.
Parser parser = FrontendFactory.createParser( … );
“Coding to the interface.”
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
20
A Front End Factory Class, cont’d
Good:
Bad:
Why is this bad? Now variable parser is tied to a specific language.
Parser parser = FrontendFactory.createParser( … );
PascalParserTD parser = new PascalParserTD( … )
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
21
A Front End Factory Class, cont’d
public static Parser createParser(String language, String type, Source source) throws Exception{ if (language.equalsIgnoreCase("Pascal") && type.equalsIgnoreCase("top-down")) { Scanner scanner = new PascalScanner(source); return new PascalParserTD(scanner); } else if (!language.equalsIgnoreCase("Pascal")) { throw new Exception("Parser factory: Invalid language '" + language + "'"); } else { throw new Exception("Parser factory: Invalid type '" + type + "'"); }}
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
22
Initial Back End Subclasses The CodeGenerator and Executor subclasses
will only be (do-nothing) stubs for now.
StrategyDesign Pattern
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
23
The Code Generator Class
All the process() method does for now is send the COMPILER_SUMMARY message. number of instructions generated (none for now) code generation time (nearly no time at all for now)
public void process(ICode iCode, SymTab symTab) throws Exception{ long startTime = System.currentTimeMillis(); float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; int instructionCount = 0;
// Send the compiler summary message. sendMessage(new Message(COMPILER_SUMMARY, new Number[] {instructionCount, elapsedTime}));}
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
24
The Executor Class All the process() method does for now is
send the INTERPRETER_SUMMARY message. number of statements executed (none for now) number of runtime errors (none for now) execution time (nearly no time at all for now)
public void process(ICode iCode, SymTab symTab) throws Exception{ long startTime = System.currentTimeMillis(); float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; int executionCount = 0; int runtimeErrors = 0;
// Send the interpreter summary message. sendMessage(new Message(INTERPRETER_SUMMARY, new Number[] {executionCount, runtimeErrors, elapsedTime}));}
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
25
A Back End Factory Class
public static Backend createBackend(String operation) throws Exception{ if (operation.equalsIgnoreCase("compile") { return new CodeGenerator(); } else if (operation.equalsIgnoreCase("execute")) { return new Executor(); } else { throw new Exception("Backend factory: Invalid operation '" + operation + "'"); }}
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
26
End-to-End: Program Listings
Here’s the heart of the main Pascal class’s constructor:
source = new Source(new BufferedReader(new FileReader(filePath)));source.addMessageListener(new SourceMessageListener());
parser = FrontendFactory.createParser("Pascal", "top-down", source);parser.addMessageListener(new ParserMessageListener());
backend = BackendFactory.createBackend(operation);backend.addMessageListener(new BackendMessageListener());
parser.parse();iCode = parser.getICode();symTab = parser.getSymTab();
backend.process(iCode, symTab);source.close();
The front end parser creates the intermediate codeand the symbol table of the intermediate tier.
The back end processes the intermediate code and the symbol table .
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
27
Listening to Messages Class Pascal has inner classes that implement the MessageListener interface.
private static final String SOURCE_LINE_FORMAT = "%03d %s";
private class SourceMessageListener implements MessageListener{ public void messageReceived(Message message) { MessageType type = message.getType(); Object body[] = (Object []) message.getBody();
switch (type) {
case SOURCE_LINE: { int lineNumber = (Integer) body[0]; String lineText = (String) body[1];
System.out.println(String.format(SOURCE_LINE_FORMAT, lineNumber, lineText)); break; } } }}
Demo
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
28
Is it Really Worth All this Trouble?
Major software engineering challenges: Managing change. Managing complexity.
To help manage change, use the open-closed principle. Close the code for modification.
Open the code for extension.
Closed: The language-independent framework classes.
Open: The language-specific subclasses.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
29
Is it Really Worth All this Trouble? cont’d
Techniques to help manage complexity: Partitioning Loose coupling Incremental development
Always build upon working code.
Good object-oriented designwith design patterns.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
30
Source Files from the Book
Download the Java source code from each chapter of the book: http://www.apropos-logic.com/wci/
You will not survive this course if you use a simple text editor like Notepad to view and edit the Java code.
The complete Pascal interpreter in Chapter 12 contains 127 classes and interfaces.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
31
Integrated Development Environment (IDE)
You can use either Eclipse or NetBeans.
Eclipse is preferred because there is a JavaCC plug-in.
Learn how to create projects, edit source files, single-step execution, set breakpoints, examine variables, read stack dumps, etc.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
32
Pascal-Specific Front End Classes
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
33
The Payoff
Now that we have …
Source language-independent framework classes Pascal-specific subclasses
Mostly just placeholders for now An end-to-end test (the program listing generator)
… we can work on the individual components
Without worrying (too much) about breaking the rest of the code.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
34
Front End Framework Classes
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
35
Pascal-Specific Subclasses
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
36
PascalTokenType Each token is an enumerated value.
public enum PascalTokenType implements TokenType{ // Reserved words. AND, ARRAY, BEGIN, CASE, CONST, DIV, DO, DOWNTO, ELSE, END, FILE, FOR, FUNCTION, GOTO, IF, IN, LABEL, MOD, NIL, NOT, OF, OR, PACKED, PROCEDURE, PROGRAM, RECORD, REPEAT, SET, THEN, TO, TYPE, UNTIL, VAR, WHILE, WITH,
// Special symbols. PLUS("+"), MINUS("-"), STAR("*"), SLASH("/"), COLON_EQUALS(":="), DOT("."), COMMA(","), SEMICOLON(";"), COLON(":"), QUOTE("'"), EQUALS("="), NOT_EQUALS("<>"), LESS_THAN("<"), LESS_EQUALS("<="), GREATER_EQUALS(">="), GREATER_THAN(">"), LEFT_PAREN("("), RIGHT_PAREN(")"), LEFT_BRACKET("["), RIGHT_BRACKET("]"), LEFT_BRACE("{"), RIGHT_BRACE("}"), UP_ARROW("^"), DOT_DOT(".."),
IDENTIFIER, INTEGER, REAL, STRING, ERROR, END_OF_FILE; ...}
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
37
PascalTokenType, cont’d
The static set RESERVED_WORDS contains all of Pascal’s reserved word strings in lower case: "and" , "array" , "begin" , etc.
We can test whether a token is a reserved word:
// Set of lower-cased Pascal reserved word text strings.public static HashSet<String> RESERVED_WORDS = new HashSet<String>();static { PascalTokenType values[] = PascalTokenType.values(); for (int i = AND.ordinal(); i <= WITH.ordinal(); ++i) { RESERVED_WORDS.add(values[i].getText().toLowerCase()); }}
if (RESERVED_WORDS.contains(text.toLowerCase())) …
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
38
PascalTokenType, cont’d
Static hash table SPECIAL_SYMBOLS contains all of Pascal’s special symbols. Each entry’s key is the string, such as "<" , "=" , "<=” Each entry’s value is the corresponding enumerated
value.
// Hash table of Pascal special symbols. // Each special symbol's text is the key to its Pascal token type.public static Hashtable<String, PascalTokenType> SPECIAL_SYMBOLS = new Hashtable<String, PascalTokenType>();static { PascalTokenType values[] = PascalTokenType.values(); for (int i = PLUS.ordinal(); i <= DOT_DOT.ordinal(); ++i) { SPECIAL_SYMBOLS.put(values[i].getText(), values[i]); }}
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
39
PascalTokenType, cont’d
We can test whether a token is a special symbol:
if (PascalTokenType.SPECIAL_SYMBOLS .containsKey(Character.toString(currentChar))) …
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
40
Pascal-Specific Token Classes Each class
PascalWordToken, PascalNumberToken, PascalStringToken, PascalSpecial-SymbolToken, and PascalErrorToken is is a subclass of class PascalToken. PascalToken
is a subclass of class Token.
Each Pascal token subclass overrides the default extract() method of class Token. The default method
could only create single-character tokens.
Loosely coupled.Highly cohesive.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
41
Syntax Diagrams
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
42
How to Scan for Tokens
Suppose the source line contains
IF (index >= 10) THEN
The scanner skips over the leading blanks. The current character is I, so the next token must be a word.
The scanner extracts a word token by copying characters up to but not including the first character that is not valid for a word, which in this case is a blank. The blank becomes the current character. The scanner determines that the word is a reserved word.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
43
How to Scan for Tokens, cont’d
The scanner skips over any blanks between tokens. The current character is (. The next token must be a special symbol.
After extracting the special symbol token, the current character is i. The next token must be a word.
After extracting the word token, the current character is a blank.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
44
How to Scan for Tokens, cont’d Skip the blank. The current character is >.
Extract the special symbol token. The current character is a blank.
Skip the blank. The current character is 1, so the next token must be a number.
After extracting the number token, the current character is ).
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
45
How to Scan for Tokens, cont’d Extract the special symbol token. The current character is a blank.
Skip the blank. The current character is T, so the next token must be a word.
Extract the word token. Determine that it’s a reserved word.
The current character is \n, so the scanner is done with this line.
Computer Science Dept.Fall 2015: August 31
CS 153: Concepts of Compiler Design© R. Mak
46
Basic Scanning Algorithm Skip any blanks until the current character is nonblank.
In Pascal, a comment and the end-of-line character each should be treated as a blank.
The current (nonblank) character determines what the next token is and becomes that token’s first character.
Extract the rest of the next token by copying successive characters up to but not including the first character that does not belong to that token.
Extracting a token consumes all the source characters that constitute the token. After extracting a token, the current character is the first character after the last character of that token.
Recommended