Structure of Programming Languages Lecture 2eliza.newhaven.edu/lang/attach/L2.pdfSyntactic Analysis: Expressions, Statements, and Control Control Structures Expressions Grouping Symbols

Structure of Programming Languages – Lecture 2

CSCI 6636 – 4536

Spring 2017

CSCI 6636 – 4536 Lecture 2. . . 1/41 Spring 2017 1 / 41

Outline

1 Translators have LayersTranslationDozens of LanguagesRepresentation IssuesGood or Bad: Fundamental Considerations

2 The Layers of FORTHLayer 6: The FORTH SystemLayer 4: Lexing FORTH

3 Homework


Translators have Layers

Part 1

1. Languages and Translators have Layers

Stages of TranslationFront End: Lexical and Syntactic Analysis

Middle: Semantic Analysis and OptimizationNative-code Compilers: Back End and Execution

Byte-code Compilers: Back End and InterpretationInterpreters: Back End and Interpretation

: count



What is a Program?

We can view a program two ways:

Architect’s view: A program is the implementation of a design (amodel) for a piece of software.

Builder’s view: A program is a description of a set of actions that wewant a computer to carry out on some data.

Similarly, we can view a language more than one way:

High level: It permits us to express a model easily and precisely.

Low level: It permits us to define a correct and efficient set ofinstructions for the computer.



A Language Definition has Layers

Forth

Link with Library Functions; Load into Memory

Language Runtime System

Semantic Interpretation

Syntactic Analysis

Lexical Analysis, Preprocessing

Project construction: Files and Folders

IDE operating on Your Program

C++

C

Scheme

F.I.

Libraries

1

2

3

4

Java

JVM S.I. 0

compi ler

Interpret or Execute on System and Hardware Platform

The beginning programmer learns to use items marked (1).

An intermediate student learns more about (1) and begins (2).

A professional programmer needs to know more about both levels andadd knowledge of (3).



The IDE

Structured Editor

Error Analysis

Configure and call translator

Console Window

Link and run code

Output Window

Projects or a Make System



Workspaces

Used in self-contained interpreted languages such as Lisp, Forth.

Function definitions

Garbage-collected dynamic storage area

Global variables

Runtime stack



Programs, Modules, Folders and Files

Used for compiled languages that run on top of the OS.

Explicit Includes

Folder=PackageFile=Public Class

Single Module

Pathnames Search PathsJars


Translators have Layers Translation

Stages of Translation: Front End

Starting with source code, the compiler works in several stages. The frontend of a compiler does not depend on the host system or machine. Itconverts source code to a parse tree.

Lexical analysis: Using lexical rules written as regular expressions,identify the tokens and the comments in the source code. Illegalcharacters are identified.

Preprocessing: An optional stage: directives are identified andfollowed. Missing include files are identified.

Syntactic analysis: Using grammatical rules written as a context-freegrammar, the source code is parsed into a tree-form. Syntax errorsand undefined identifiers are identified.



Lexical Analysis:Tokens, Keywords and Comments

Delimiters

CommentsNumbers

IdentifiersKeywords



Syntactic Analysis: Expressions, Statements, and Control

ExpressionsControl Structures

Grouping Symbols

Data Structuring Methods

Declarations

NamesName Spaces Variables



Semantic Analysis and Optimization: the Middle Stages

This stage of translation does not depend on the host system or machine.It analyzes and manipulates the parse tree to improve it and check forsemantic errors.

Semantic analysis: The types of operands and arguments are checkedagainst the types defined for parameters. Conversions are generated ifnecessary to produce a match. Type errors are identified.

Flow analysis: uninitialized variables are found.

Tree-optimization: Common subexpressions and constant expressionsare found and optimized.

These translation stages developed more recently than the front end andthe back end. A particular language might some, all, or none of them.



Semantic Interpretation

These issues are the concern of the semantic interpretation step:

TypesType Checking

ScopeVisibility

Parameter Passing

Lifetime

Memory Model

Type Coercion

Binding timeDefaults



Optimization

Compilers can often be configured to do more of these things, or less.

Eliminate Common

Subexpressions

Find Unreachable

Code

Detect Uninitialized

variables

Evaluate Constant

Expressions



Native-code Compilers: Back End and Execution

Starting with the type-checked parse tree, the compiler finishes thecompilation thus:

Code generation: Relocatable memory addresses are assigned forvariables. (They will be adjusted later, when the executable code isloaded into memory.) Object code is generated. Symbol files for thedebugger are written out.

Linking: The object code and code from other modules (libraries) arelinked together into an executable format. An executable file may bewritten.

Missing and mis-named functions are identified. Definitions that havebeen included twice are identified. Any of these problems willterminate linking and prevent execution.

Loading: The executable code is loaded into the computer andcontrol is transferred to the first line of the main program.



Byte-code Compilers: Back End and Interpretation

Starting with the parse tree, the compiler finishes the compilation thus:

Code generation: Memory addresses are assigned for variables. Aplatform-independent byte-code program is generated. Symbol filesfor the debugger are written out.

The byte-code program becomes input to a virtual machine whichsearches for other modules to link in. Missing modules are identified.

The virtual machine interprets the codes and runs the program.Just-in-time optimization may happen during execution.



Interpreters: Back End and Interpretation

An interpreter runs within a dedicated environment that and implements aread-execute-write cycle (REW).

Starting with the parse tree, the interpreter finishes the execution thus:

The interpreter operates incrementally on the parsed source code.

It does not wait for the end of the program to begin processing it.This limits its ability to do optimization.

The program may be stored in a partially-processed form that is easierto handle than source code.

When run, the program code becomes input to an interpreter (avirtual machine) that runs a program unit and displays the output.

Type errors may be identified during execution.

Undefined words are identified.



The Front End of Translation

Lex, parse and typecheck

InterpreterByte-Code compiler

Compiler

Lex, parse and typecheck

Lex and Parse



The Back End of Translation: Code Generation

Generate object code

(machine language)

InterpreterByte-Code compiler

Compiler

Generate byte code

(VM language)



Runtime: Execution

Link, load, and execute

in hardware

InterpreterByte-Code Interpreter

Compiler

Link, load, and execute in VM

Typecheck primitives.

Run machine code for parsed

program.


Translators have Layers Dozens of Languages

Part 3

3. Languages Come in Many Flavors.

Possible Design GoalsDesign Examples

Representation Issues



Possible Design Goals

These apply both to entire languages and to features within a language.

1 Utility. Is the language or language feature often useful?

2 Efficiency. Does it lead to efficient software?

3 Portability. Does a program produce the same results on anymachine?

4 Convenience. Is it easy to use? Does it support concise code?

5 Readability. Is it naturally readable?

6 Modeling ability. Will this feature or language help the programmermodel a problem more fully, more precisely, or more easily?

7 Simplicity. Is the language design as a whole simple, unified, andgeneral, or is it full of dozens of special-purpose features?

8 Clarity. Does every legal program have one defined, unambiguous,time-invariant meaning?



Design Examples

Scheme was created to clean up the semantics of LISP and make itmore true to the underlying mathematical model (Lambda Calculus).

C is useful, efficient, simple, and mostly-portable. Some people thinkit is convenient. Its readability is good if a programmer uses adisciplined style, but modeling ability is limited. Clarity and portabilityare damaged by the ambiguous nature of type int and the undefinedorder of evaluation of expressions.

C++ is useful, efficient, convenient, and has excellent modelingability. Its readability, clarity and portability are about the same as C.It is absolutely NOT a simple language.

Java is useful, highly portable, convenient, and has excellent modelingability and clarity. Its readability is damaged by the length of theidentifiers in the standard libraries. It is absolutely NOT a simplelanguage; the API is massive, complex, and confusing. Its efficiency islimited by the fact that it runs inside a virtual machine.


Translators have Layers Representation Issues

Representation Issues

Semantic IntentPower

Explicit vs. ImplicitCoherence

LocalityDistinct Representation



Semantic Intent

A programmer has some idea or model of what he wants and expectsthe program to do. This is his semantic intent.

A program has semantic validity if it carries out the programer’ssemantic intent.

Or... the program works properly, as expected.



Language Power

A language is powerful to the extent that it permits the programmer toeasily and explicitly state his semantic intent, and that intent will behonored and enforced.

There are two kinds of power a language can have:

The power to do something easily.

The power to prevent something you do not want to happen.

Example: The template classes in C++ and Collection classes in Javamake it very easy to use stacks, queues, maps, trees, etc.

The private qualifier in Java or C++ prevents unwanted access to avariable.



Explicit vs. Implicit

The structure of an object can be reflected in a program either

Implicitly: the object has structure but nothing in the program (ormaybe only the comments) describe that structure.

Explicitly: something that is part of the language defines the intendedstructure.

Example:In FORTRAN-77, a table of objects would be defined as a set of parallelarrays, each storing one property of the objects.

In C, the same table can be defined explicitly as an array of structs.



Explicit vs. Implicit Typing

The type of an object or a function can be either explicit or implicit.

Implicit: In C, you do not need to specify the return type of afunction. If it is omitted, it will default to type int, and return valueswill be coerced to type int.

Explicit: In C, you MAY specify the return type of a function. If it isdeclared, return values will be coerced to match the declared type.

It is misleading, error prone, and unmaintainable when a programmer relieson implicit type declarations.



Coherence

A object, idea, or process is represented coherently if it is representedby a single symbol in the program so that it may be used as a unit.

A coherently represented object may have parts that can also be usedseparately.

It does not need to be stored in consecutive memory locations.

Example:Five #define symbols vs. an enum declaration with 5 enum constants.

The former leaves us in doubt about any relationship among the fivesymbols. The latter explicitly says that these constants belong togetherand define a set of alternatives.



Coherent Arrays

The advantage of coherence is that information that belongs together iskept together at all times. Two examples arise out of arrays.

Sometimes we use parallel arrays to represent two or more propertiesof a set of objects. For example, the x-coordinates and y-coordinatesof a set of points. The same data can be represented coherently as anarray of 2-member structures.

How do we handle passing an array to functions?

To use an array in C, you need the array itself and either its allocationlength or the number of items currently stored in it. The beginnerdeclares these as three separate variables. When the array is passed toa function, two or three parameters are required.

In C++, you can use a vector, which is a structure consisting of anarray and the two integers needed to manage it. This can be passedto a function as a single parameter.



Locality

It is easier to write, debug, modify, and maintain a program with highlocality.

Locality is high if related things are written together.

Locality is low if many lines or pages of code separate a definitionfrom its uses, or if parts of the same object are defined in two or moreplaces.

Example: A Java class has better locality than a C++ class.

In Java, all parts of a class are defined in the same file.

In C++, a class is split into two files: a header and an implementation.



Distinct Representation

If the same word or construct is used simultaneously for two purposes,trouble will follow.Example: In early Basic, each line of code starts with a line number. Linenumbers have two purposes:

To designate the execution order of the code lines.

As the targets of GOTO commands.

The problem is, these purposes conflict.A programmer will often need to add more code between two lines thatwere written earlier. This can easily force renumbering for a part of theprogram. But when lines are renumbered, any GOTO’s into therenumbered part will go to the wrong place.


Translators have Layers Good or Bad: Fundamental Considerations

What Makes a Language Design Good (or Bad)

A language should provide or encourage or support:

Lexical rules that do not ascribe meaning to invisible things.

Syntax that is easy to type and not prone to errors.

Semantics relatively free of “gotchas” ( if (a < b < c)...)

Consistent semantics for the same syntax in different contexts.

Syntax that is kind to program modification and maintenance.

Readable layout for programs.

Ability to define necessary restrictions on data access.

Ability to group related things on the same page or screen.

. . . and I could continue. . .



Religious Wars: Good Design or Bad?

People argue without end about whether these features are good or bad!

In Basic, you do not need to declare variables. (Good for small jobs,not good for big ones.)

In C++, a class definition involves a lot of typing that is not requiredin C. (Classes are good for large, complex jobs, bad for beginners andsmall jobs.)

C thinks compact names are good because they minimize time wastedtyping, spelling errors, and typing errors. Java thinks long names aregood because they are not cryptic.

Python is good because you can interactively write and debug it.Java is good because the compiler can and will catch type errors.

Java is good because it uses a garbage collector to manage memory.C++ is good because memory management CAN BE done efficientlywith little overhead.



Some “features” are flaws.

Many programmers will defend the flaws in their favorite language. Theycall them “features” and love to show you the tricks you can do withthem.

In APL, you can compute a number then “go to” that line.

In APL, you can write a fairly complex program on one line.

In Basic, you don’t have to worry about the difference betweenintegers and reals.

In Python, programs look great because the indentation defines thescope of each statement.

In C, you can walk off the end of an array.


The Layers of FORTH

Part 5: The Layers of FORTH

1 Fundamental concepts: postfix notation, integer-based, small set ofkeywords.

2 The runtime environment: Dictionary, Stack, Return stack.

3 The interpreter and stack operations

4 The compiler

5 Type declarations and language extension mechanisms

FORTH also has a text editor and an assembler but we will not use them.


The Layers of FORTH Layer 6: The FORTH System

Layer 6 (Top): The FORTH Environment

See: gforth manual and tutorialDownload: gforth-0.7.0.exe (gforth implementation for Windows)

FORTH, like many languages of its class, is a self-contained system.In addition to the compiler and interpreter, every standard FORTHsystem supplies an editor, an assembler, and libraries.

In this course, we will not be using the assembler or the libraries; onlythe core language, the compiler, and the interpreter.

My advice is to use your favorite editor to write programs and storethem in normal files called xxxx.for . When you want to bring apiece of program into the FORTH system, you can either LOAD a fileor use the mouse to paste the lines into the FORTH window.


http://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Tutorial.html#Tutorial

The Layers of FORTH Layer 6: The FORTH System

FORTH implements a REW cycle

When you run a FORTH system, you start in the interpreter window.

You can type in arithmetic expressions and pre-defined commands.

Your code will be executed and the results (if any) displayed to thescreen.

If all is well, you will see the system prompt em OK.

If there was an error, you will see an error comment. This will causethe parameter stack to be emptied.

The system is then ready for another command.


The Layers of FORTH Layer 4: Lexing FORTH

Lex: FORTH is different!

Forth is simple. . . so simple that it is “hard”.

Whitespace delimited.Don’t forget to type the whitespace after every symbol.

Any ASCII character can begin a word or occur in a word.

Everything is written in postfix order.

There is no distinction between operators and other symbols. Noprecedence is needed in a postfix language.

Each symbol has one unambiguous meaning.

All of these things permit the FORTH compiler and interpreter to be verysimple and very fast.Why?


The Layers of FORTH Layer 4: Lexing FORTH

Example: Lexing a few lines of FORTH

-3 5 + .integer -3

integer 5

word +

word .

\ A basic hello program: hello ( -- ) ." Hi, I’m here." ;

comment A basic hello program

word :

word hello

comment --

string Hi, I’m here.

word ;


Homework

This Week’s WorkDue on September 22

Read Chapter 3 of the text.

Hw 2: Written Exercises; Answer these questions, briefly, precisely,and clearly.

1 Explain the difference between lexical analysis and syntactic analysis.What happens at each stage?

2 List six kinds of errors that the compiler and/or language system canidentify. For each, explain the stage of translation or execution wherethe error is detected.

3 Choose one of the properties of representation, listed on pages 24–32.Think of the languages you have used. Give a good example and a badexample of the property you chose. Please don’t repeat my examples orthe examples from the textbook.

4 Take the first three lines of the gcd program used for lab 1. Identifythe lexical tokens on these three lines.


Documents

Structure of Programming Languages Lecture 2eliza.newhaven.edu/lang/attach/L2.pdfSyntactic Analysis: Expressions, Statements, and Control Control Structures Expressions Grouping Symbols