Upload
hanhi
View
215
Download
0
Embed Size (px)
Citation preview
Structure of Programming Languages – Lecture 2
CSCI 6636 – 4536
Spring 2017
CSCI 6636 – 4536 Lecture 2. . . 1/41 Spring 2017 1 / 41
Outline
1 Translators have LayersTranslationDozens of LanguagesRepresentation IssuesGood or Bad: Fundamental Considerations
2 The Layers of FORTHLayer 6: The FORTH SystemLayer 4: Lexing FORTH
3 Homework
CSCI 6636 – 4536 Lecture 2. . . 2/41 Spring 2017 2 / 41
Translators have Layers
Part 1
1. Languages and Translators have Layers
Stages of TranslationFront End: Lexical and Syntactic Analysis
Middle: Semantic Analysis and OptimizationNative-code Compilers: Back End and Execution
Byte-code Compilers: Back End and InterpretationInterpreters: Back End and Interpretation
: count
CSCI 6636 – 4536 Lecture 2. . . 3/41 Spring 2017 3 / 41
Translators have Layers
What is a Program?
We can view a program two ways:
Architect’s view: A program is the implementation of a design (amodel) for a piece of software.
Builder’s view: A program is a description of a set of actions that wewant a computer to carry out on some data.
Similarly, we can view a language more than one way:
High level: It permits us to express a model easily and precisely.
Low level: It permits us to define a correct and efficient set ofinstructions for the computer.
CSCI 6636 – 4536 Lecture 2. . . 4/41 Spring 2017 4 / 41
Translators have Layers
A Language Definition has Layers
Forth
Link with Library Functions; Load into Memory
Language Runtime System
Semantic Interpretation
Syntactic Analysis
Lexical Analysis, Preprocessing
Project construction: Files and Folders
IDE operating on Your Program
C++
C
Scheme
F.I.
Libraries
1
2
3
4
Java
JVM S.I. 0
compi ler
Interpret or Execute on System and Hardware Platform
The beginning programmer learns to use items marked (1).
An intermediate student learns more about (1) and begins (2).
A professional programmer needs to know more about both levels andadd knowledge of (3).
CSCI 6636 – 4536 Lecture 2. . . 5/41 Spring 2017 5 / 41
Translators have Layers
The IDE
Structured Editor
Error Analysis
Configure and call translator
Console Window
Link and run code
Output Window
Projects or a Make System
CSCI 6636 – 4536 Lecture 2. . . 6/41 Spring 2017 6 / 41
Translators have Layers
Workspaces
Used in self-contained interpreted languages such as Lisp, Forth.
Function definitions
Garbage-collected dynamic storage area
Global variables
Runtime stack
CSCI 6636 – 4536 Lecture 2. . . 7/41 Spring 2017 7 / 41
Translators have Layers
Programs, Modules, Folders and Files
Used for compiled languages that run on top of the OS.
Explicit Includes
Folder=PackageFile=Public Class
Single Module
Pathnames Search PathsJars
CSCI 6636 – 4536 Lecture 2. . . 8/41 Spring 2017 8 / 41
Translators have Layers Translation
Stages of Translation: Front End
Starting with source code, the compiler works in several stages. The frontend of a compiler does not depend on the host system or machine. Itconverts source code to a parse tree.
Lexical analysis: Using lexical rules written as regular expressions,identify the tokens and the comments in the source code. Illegalcharacters are identified.
Preprocessing: An optional stage: directives are identified andfollowed. Missing include files are identified.
Syntactic analysis: Using grammatical rules written as a context-freegrammar, the source code is parsed into a tree-form. Syntax errorsand undefined identifiers are identified.
CSCI 6636 – 4536 Lecture 2. . . 9/41 Spring 2017 9 / 41
Translators have Layers Translation
Lexical Analysis:Tokens, Keywords and Comments
Delimiters
CommentsNumbers
IdentifiersKeywords
CSCI 6636 – 4536 Lecture 2. . . 10/41 Spring 2017 10 / 41
Translators have Layers Translation
Syntactic Analysis: Expressions, Statements, and Control
ExpressionsControl Structures
Grouping Symbols
Data Structuring Methods
Declarations
NamesName Spaces Variables
CSCI 6636 – 4536 Lecture 2. . . 11/41 Spring 2017 11 / 41
Translators have Layers Translation
Semantic Analysis and Optimization: the Middle Stages
This stage of translation does not depend on the host system or machine.It analyzes and manipulates the parse tree to improve it and check forsemantic errors.
Semantic analysis: The types of operands and arguments are checkedagainst the types defined for parameters. Conversions are generated ifnecessary to produce a match. Type errors are identified.
Flow analysis: uninitialized variables are found.
Tree-optimization: Common subexpressions and constant expressionsare found and optimized.
These translation stages developed more recently than the front end andthe back end. A particular language might some, all, or none of them.
CSCI 6636 – 4536 Lecture 2. . . 12/41 Spring 2017 12 / 41
Translators have Layers Translation
Semantic Interpretation
These issues are the concern of the semantic interpretation step:
TypesType Checking
ScopeVisibility
Parameter Passing
Lifetime
Memory Model
Type Coercion
Binding timeDefaults
CSCI 6636 – 4536 Lecture 2. . . 13/41 Spring 2017 13 / 41
Translators have Layers Translation
Optimization
Compilers can often be configured to do more of these things, or less.
Eliminate Common
Subexpressions
Find Unreachable
Code
Detect Uninitialized
variables
Evaluate Constant
Expressions
CSCI 6636 – 4536 Lecture 2. . . 14/41 Spring 2017 14 / 41
Translators have Layers Translation
Native-code Compilers: Back End and Execution
Starting with the type-checked parse tree, the compiler finishes thecompilation thus:
Code generation: Relocatable memory addresses are assigned forvariables. (They will be adjusted later, when the executable code isloaded into memory.) Object code is generated. Symbol files for thedebugger are written out.
Linking: The object code and code from other modules (libraries) arelinked together into an executable format. An executable file may bewritten.
Missing and mis-named functions are identified. Definitions that havebeen included twice are identified. Any of these problems willterminate linking and prevent execution.
Loading: The executable code is loaded into the computer andcontrol is transferred to the first line of the main program.
CSCI 6636 – 4536 Lecture 2. . . 15/41 Spring 2017 15 / 41
Translators have Layers Translation
Byte-code Compilers: Back End and Interpretation
Starting with the parse tree, the compiler finishes the compilation thus:
Code generation: Memory addresses are assigned for variables. Aplatform-independent byte-code program is generated. Symbol filesfor the debugger are written out.
The byte-code program becomes input to a virtual machine whichsearches for other modules to link in. Missing modules are identified.
The virtual machine interprets the codes and runs the program.Just-in-time optimization may happen during execution.
CSCI 6636 – 4536 Lecture 2. . . 16/41 Spring 2017 16 / 41
Translators have Layers Translation
Interpreters: Back End and Interpretation
An interpreter runs within a dedicated environment that and implements aread-execute-write cycle (REW).
Starting with the parse tree, the interpreter finishes the execution thus:
The interpreter operates incrementally on the parsed source code.
It does not wait for the end of the program to begin processing it.This limits its ability to do optimization.
The program may be stored in a partially-processed form that is easierto handle than source code.
When run, the program code becomes input to an interpreter (avirtual machine) that runs a program unit and displays the output.
Type errors may be identified during execution.
Undefined words are identified.
CSCI 6636 – 4536 Lecture 2. . . 17/41 Spring 2017 17 / 41
Translators have Layers Translation
The Front End of Translation
Lex, parse and typecheck
InterpreterByte-Code compiler
Compiler
Lex, parse and typecheck
Lex and Parse
CSCI 6636 – 4536 Lecture 2. . . 18/41 Spring 2017 18 / 41
Translators have Layers Translation
The Back End of Translation: Code Generation
Generate object code
(machine language)
InterpreterByte-Code compiler
Compiler
Generate byte code
(VM language)
CSCI 6636 – 4536 Lecture 2. . . 19/41 Spring 2017 19 / 41
Translators have Layers Translation
Runtime: Execution
Link, load, and execute
in hardware
InterpreterByte-Code Interpreter
Compiler
Link, load, and execute in VM
Typecheck primitives.
Run machine code for parsed
program.
CSCI 6636 – 4536 Lecture 2. . . 20/41 Spring 2017 20 / 41
Translators have Layers Dozens of Languages
Part 3
3. Languages Come in Many Flavors.
Possible Design GoalsDesign Examples
Representation Issues
CSCI 6636 – 4536 Lecture 2. . . 21/41 Spring 2017 21 / 41
Translators have Layers Dozens of Languages
Possible Design Goals
These apply both to entire languages and to features within a language.
1 Utility. Is the language or language feature often useful?
2 Efficiency. Does it lead to efficient software?
3 Portability. Does a program produce the same results on anymachine?
4 Convenience. Is it easy to use? Does it support concise code?
5 Readability. Is it naturally readable?
6 Modeling ability. Will this feature or language help the programmermodel a problem more fully, more precisely, or more easily?
7 Simplicity. Is the language design as a whole simple, unified, andgeneral, or is it full of dozens of special-purpose features?
8 Clarity. Does every legal program have one defined, unambiguous,time-invariant meaning?
CSCI 6636 – 4536 Lecture 2. . . 22/41 Spring 2017 22 / 41
Translators have Layers Dozens of Languages
Design Examples
Scheme was created to clean up the semantics of LISP and make itmore true to the underlying mathematical model (Lambda Calculus).
C is useful, efficient, simple, and mostly-portable. Some people thinkit is convenient. Its readability is good if a programmer uses adisciplined style, but modeling ability is limited. Clarity and portabilityare damaged by the ambiguous nature of type int and the undefinedorder of evaluation of expressions.
C++ is useful, efficient, convenient, and has excellent modelingability. Its readability, clarity and portability are about the same as C.It is absolutely NOT a simple language.
Java is useful, highly portable, convenient, and has excellent modelingability and clarity. Its readability is damaged by the length of theidentifiers in the standard libraries. It is absolutely NOT a simplelanguage; the API is massive, complex, and confusing. Its efficiency islimited by the fact that it runs inside a virtual machine.
CSCI 6636 – 4536 Lecture 2. . . 23/41 Spring 2017 23 / 41
Translators have Layers Representation Issues
Representation Issues
Semantic IntentPower
Explicit vs. ImplicitCoherence
LocalityDistinct Representation
CSCI 6636 – 4536 Lecture 2. . . 24/41 Spring 2017 24 / 41
Translators have Layers Representation Issues
Semantic Intent
A programmer has some idea or model of what he wants and expectsthe program to do. This is his semantic intent.
A program has semantic validity if it carries out the programer’ssemantic intent.
Or... the program works properly, as expected.
CSCI 6636 – 4536 Lecture 2. . . 25/41 Spring 2017 25 / 41
Translators have Layers Representation Issues
Language Power
A language is powerful to the extent that it permits the programmer toeasily and explicitly state his semantic intent, and that intent will behonored and enforced.
There are two kinds of power a language can have:
The power to do something easily.
The power to prevent something you do not want to happen.
Example: The template classes in C++ and Collection classes in Javamake it very easy to use stacks, queues, maps, trees, etc.
The private qualifier in Java or C++ prevents unwanted access to avariable.
CSCI 6636 – 4536 Lecture 2. . . 26/41 Spring 2017 26 / 41
Translators have Layers Representation Issues
Explicit vs. Implicit
The structure of an object can be reflected in a program either
Implicitly: the object has structure but nothing in the program (ormaybe only the comments) describe that structure.
Explicitly: something that is part of the language defines the intendedstructure.
Example:In FORTRAN-77, a table of objects would be defined as a set of parallelarrays, each storing one property of the objects.
In C, the same table can be defined explicitly as an array of structs.
CSCI 6636 – 4536 Lecture 2. . . 27/41 Spring 2017 27 / 41
Translators have Layers Representation Issues
Explicit vs. Implicit Typing
The type of an object or a function can be either explicit or implicit.
Implicit: In C, you do not need to specify the return type of afunction. If it is omitted, it will default to type int, and return valueswill be coerced to type int.
Explicit: In C, you MAY specify the return type of a function. If it isdeclared, return values will be coerced to match the declared type.
It is misleading, error prone, and unmaintainable when a programmer relieson implicit type declarations.
CSCI 6636 – 4536 Lecture 2. . . 28/41 Spring 2017 28 / 41
Translators have Layers Representation Issues
Coherence
A object, idea, or process is represented coherently if it is representedby a single symbol in the program so that it may be used as a unit.
A coherently represented object may have parts that can also be usedseparately.
It does not need to be stored in consecutive memory locations.
Example:Five #define symbols vs. an enum declaration with 5 enum constants.
The former leaves us in doubt about any relationship among the fivesymbols. The latter explicitly says that these constants belong togetherand define a set of alternatives.
CSCI 6636 – 4536 Lecture 2. . . 29/41 Spring 2017 29 / 41
Translators have Layers Representation Issues
Coherent Arrays
The advantage of coherence is that information that belongs together iskept together at all times. Two examples arise out of arrays.
Sometimes we use parallel arrays to represent two or more propertiesof a set of objects. For example, the x-coordinates and y-coordinatesof a set of points. The same data can be represented coherently as anarray of 2-member structures.
How do we handle passing an array to functions?
To use an array in C, you need the array itself and either its allocationlength or the number of items currently stored in it. The beginnerdeclares these as three separate variables. When the array is passed toa function, two or three parameters are required.
In C++, you can use a vector, which is a structure consisting of anarray and the two integers needed to manage it. This can be passedto a function as a single parameter.
CSCI 6636 – 4536 Lecture 2. . . 30/41 Spring 2017 30 / 41
Translators have Layers Representation Issues
Locality
It is easier to write, debug, modify, and maintain a program with highlocality.
Locality is high if related things are written together.
Locality is low if many lines or pages of code separate a definitionfrom its uses, or if parts of the same object are defined in two or moreplaces.
Example: A Java class has better locality than a C++ class.
In Java, all parts of a class are defined in the same file.
In C++, a class is split into two files: a header and an implementation.
CSCI 6636 – 4536 Lecture 2. . . 31/41 Spring 2017 31 / 41
Translators have Layers Representation Issues
Distinct Representation
If the same word or construct is used simultaneously for two purposes,trouble will follow.Example: In early Basic, each line of code starts with a line number. Linenumbers have two purposes:
To designate the execution order of the code lines.
As the targets of GOTO commands.
The problem is, these purposes conflict.A programmer will often need to add more code between two lines thatwere written earlier. This can easily force renumbering for a part of theprogram. But when lines are renumbered, any GOTO’s into therenumbered part will go to the wrong place.
CSCI 6636 – 4536 Lecture 2. . . 32/41 Spring 2017 32 / 41
Translators have Layers Good or Bad: Fundamental Considerations
What Makes a Language Design Good (or Bad)
A language should provide or encourage or support:
Lexical rules that do not ascribe meaning to invisible things.
Syntax that is easy to type and not prone to errors.
Semantics relatively free of “gotchas” ( if (a < b < c)...)
Consistent semantics for the same syntax in different contexts.
Syntax that is kind to program modification and maintenance.
Readable layout for programs.
Ability to define necessary restrictions on data access.
Ability to group related things on the same page or screen.
. . . and I could continue. . .
CSCI 6636 – 4536 Lecture 2. . . 33/41 Spring 2017 33 / 41
Translators have Layers Good or Bad: Fundamental Considerations
Religious Wars: Good Design or Bad?
People argue without end about whether these features are good or bad!
In Basic, you do not need to declare variables. (Good for small jobs,not good for big ones.)
In C++, a class definition involves a lot of typing that is not requiredin C. (Classes are good for large, complex jobs, bad for beginners andsmall jobs.)
C thinks compact names are good because they minimize time wastedtyping, spelling errors, and typing errors. Java thinks long names aregood because they are not cryptic.
Python is good because you can interactively write and debug it.Java is good because the compiler can and will catch type errors.
Java is good because it uses a garbage collector to manage memory.C++ is good because memory management CAN BE done efficientlywith little overhead.
CSCI 6636 – 4536 Lecture 2. . . 34/41 Spring 2017 34 / 41
Translators have Layers Good or Bad: Fundamental Considerations
Some “features” are flaws.
Many programmers will defend the flaws in their favorite language. Theycall them “features” and love to show you the tricks you can do withthem.
In APL, you can compute a number then “go to” that line.
In APL, you can write a fairly complex program on one line.
In Basic, you don’t have to worry about the difference betweenintegers and reals.
In Python, programs look great because the indentation defines thescope of each statement.
In C, you can walk off the end of an array.
CSCI 6636 – 4536 Lecture 2. . . 35/41 Spring 2017 35 / 41
The Layers of FORTH
Part 5: The Layers of FORTH
1 Fundamental concepts: postfix notation, integer-based, small set ofkeywords.
2 The runtime environment: Dictionary, Stack, Return stack.
3 The interpreter and stack operations
4 The compiler
5 Type declarations and language extension mechanisms
FORTH also has a text editor and an assembler but we will not use them.
CSCI 6636 – 4536 Lecture 2. . . 36/41 Spring 2017 36 / 41
The Layers of FORTH Layer 6: The FORTH System
Layer 6 (Top): The FORTH Environment
See: gforth manual and tutorialDownload: gforth-0.7.0.exe (gforth implementation for Windows)
FORTH, like many languages of its class, is a self-contained system.In addition to the compiler and interpreter, every standard FORTHsystem supplies an editor, an assembler, and libraries.
In this course, we will not be using the assembler or the libraries; onlythe core language, the compiler, and the interpreter.
My advice is to use your favorite editor to write programs and storethem in normal files called xxxx.for . When you want to bring apiece of program into the FORTH system, you can either LOAD a fileor use the mouse to paste the lines into the FORTH window.
CSCI 6636 – 4536 Lecture 2. . . 37/41 Spring 2017 37 / 41
The Layers of FORTH Layer 6: The FORTH System
FORTH implements a REW cycle
When you run a FORTH system, you start in the interpreter window.
You can type in arithmetic expressions and pre-defined commands.
Your code will be executed and the results (if any) displayed to thescreen.
If all is well, you will see the system prompt em OK.
If there was an error, you will see an error comment. This will causethe parameter stack to be emptied.
The system is then ready for another command.
CSCI 6636 – 4536 Lecture 2. . . 38/41 Spring 2017 38 / 41
The Layers of FORTH Layer 4: Lexing FORTH
Lex: FORTH is different!
Forth is simple. . . so simple that it is “hard”.
Whitespace delimited.Don’t forget to type the whitespace after every symbol.
Any ASCII character can begin a word or occur in a word.
Everything is written in postfix order.
There is no distinction between operators and other symbols. Noprecedence is needed in a postfix language.
Each symbol has one unambiguous meaning.
All of these things permit the FORTH compiler and interpreter to be verysimple and very fast.Why?
CSCI 6636 – 4536 Lecture 2. . . 39/41 Spring 2017 39 / 41
The Layers of FORTH Layer 4: Lexing FORTH
Example: Lexing a few lines of FORTH
-3 5 + .integer -3
integer 5
word +
word .
\ A basic hello program: hello ( -- ) ." Hi, I’m here." ;
comment A basic hello program
word :
word hello
comment --
string Hi, I’m here.
word ;
CSCI 6636 – 4536 Lecture 2. . . 40/41 Spring 2017 40 / 41
Homework
This Week’s WorkDue on September 22
Read Chapter 3 of the text.
Hw 2: Written Exercises; Answer these questions, briefly, precisely,and clearly.
1 Explain the difference between lexical analysis and syntactic analysis.What happens at each stage?
2 List six kinds of errors that the compiler and/or language system canidentify. For each, explain the stage of translation or execution wherethe error is detected.
3 Choose one of the properties of representation, listed on pages 24–32.Think of the languages you have used. Give a good example and a badexample of the property you chose. Please don’t repeat my examples orthe examples from the textbook.
4 Take the first three lines of the gcd program used for lab 1. Identifythe lexical tokens on these three lines.
CSCI 6636 – 4536 Lecture 2. . . 41/41 Spring 2017 41 / 41