41
Structure of Programming Languages – Lecture 2 CSCI 6636 – 4536 Spring 2017 CSCI 6636 – 4536 Lecture 2. . . 1/41 Spring 2017 1 / 41

Structure of Programming Languages Lecture 2eliza.newhaven.edu/lang/attach/L2.pdfSyntactic Analysis: Expressions, Statements, and Control Control Structures Expressions Grouping Symbols

  • Upload
    hanhi

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Structure of Programming Languages – Lecture 2

CSCI 6636 – 4536

Spring 2017

CSCI 6636 – 4536 Lecture 2. . . 1/41 Spring 2017 1 / 41

Outline

1 Translators have LayersTranslationDozens of LanguagesRepresentation IssuesGood or Bad: Fundamental Considerations

2 The Layers of FORTHLayer 6: The FORTH SystemLayer 4: Lexing FORTH

3 Homework

CSCI 6636 – 4536 Lecture 2. . . 2/41 Spring 2017 2 / 41

Translators have Layers

Part 1

1. Languages and Translators have Layers

Stages of TranslationFront End: Lexical and Syntactic Analysis

Middle: Semantic Analysis and OptimizationNative-code Compilers: Back End and Execution

Byte-code Compilers: Back End and InterpretationInterpreters: Back End and Interpretation

: count

CSCI 6636 – 4536 Lecture 2. . . 3/41 Spring 2017 3 / 41

Translators have Layers

What is a Program?

We can view a program two ways:

Architect’s view: A program is the implementation of a design (amodel) for a piece of software.

Builder’s view: A program is a description of a set of actions that wewant a computer to carry out on some data.

Similarly, we can view a language more than one way:

High level: It permits us to express a model easily and precisely.

Low level: It permits us to define a correct and efficient set ofinstructions for the computer.

CSCI 6636 – 4536 Lecture 2. . . 4/41 Spring 2017 4 / 41

Translators have Layers

A Language Definition has Layers

Forth

Link with Library Functions; Load into Memory

Language Runtime System

Semantic Interpretation

Syntactic Analysis

Lexical Analysis, Preprocessing

Project construction: Files and Folders

IDE operating on Your Program

C++

C

Scheme

F.I.

Libraries

1

2

3

4

Java

JVM S.I. 0

compi ler

Interpret or Execute on System and Hardware Platform

The beginning programmer learns to use items marked (1).

An intermediate student learns more about (1) and begins (2).

A professional programmer needs to know more about both levels andadd knowledge of (3).

CSCI 6636 – 4536 Lecture 2. . . 5/41 Spring 2017 5 / 41

Translators have Layers

The IDE

Structured Editor

Error Analysis

Configure and call translator

Console Window

Link and run code

Output Window

Projects or a Make System

CSCI 6636 – 4536 Lecture 2. . . 6/41 Spring 2017 6 / 41

Translators have Layers

Workspaces

Used in self-contained interpreted languages such as Lisp, Forth.

Function definitions

Garbage-collected dynamic storage area

Global variables

Runtime stack

CSCI 6636 – 4536 Lecture 2. . . 7/41 Spring 2017 7 / 41

Translators have Layers

Programs, Modules, Folders and Files

Used for compiled languages that run on top of the OS.

Explicit Includes

Folder=PackageFile=Public Class

Single Module

Pathnames Search PathsJars

CSCI 6636 – 4536 Lecture 2. . . 8/41 Spring 2017 8 / 41

Translators have Layers Translation

Stages of Translation: Front End

Starting with source code, the compiler works in several stages. The frontend of a compiler does not depend on the host system or machine. Itconverts source code to a parse tree.

Lexical analysis: Using lexical rules written as regular expressions,identify the tokens and the comments in the source code. Illegalcharacters are identified.

Preprocessing: An optional stage: directives are identified andfollowed. Missing include files are identified.

Syntactic analysis: Using grammatical rules written as a context-freegrammar, the source code is parsed into a tree-form. Syntax errorsand undefined identifiers are identified.

CSCI 6636 – 4536 Lecture 2. . . 9/41 Spring 2017 9 / 41

Translators have Layers Translation

Lexical Analysis:Tokens, Keywords and Comments

Delimiters

CommentsNumbers

IdentifiersKeywords

CSCI 6636 – 4536 Lecture 2. . . 10/41 Spring 2017 10 / 41

Translators have Layers Translation

Syntactic Analysis: Expressions, Statements, and Control

ExpressionsControl Structures

Grouping Symbols

Data Structuring Methods

Declarations

NamesName Spaces Variables

CSCI 6636 – 4536 Lecture 2. . . 11/41 Spring 2017 11 / 41

Translators have Layers Translation

Semantic Analysis and Optimization: the Middle Stages

This stage of translation does not depend on the host system or machine.It analyzes and manipulates the parse tree to improve it and check forsemantic errors.

Semantic analysis: The types of operands and arguments are checkedagainst the types defined for parameters. Conversions are generated ifnecessary to produce a match. Type errors are identified.

Flow analysis: uninitialized variables are found.

Tree-optimization: Common subexpressions and constant expressionsare found and optimized.

These translation stages developed more recently than the front end andthe back end. A particular language might some, all, or none of them.

CSCI 6636 – 4536 Lecture 2. . . 12/41 Spring 2017 12 / 41

Translators have Layers Translation

Semantic Interpretation

These issues are the concern of the semantic interpretation step:

TypesType Checking

ScopeVisibility

Parameter Passing

Lifetime

Memory Model

Type Coercion

Binding timeDefaults

CSCI 6636 – 4536 Lecture 2. . . 13/41 Spring 2017 13 / 41

Translators have Layers Translation

Optimization

Compilers can often be configured to do more of these things, or less.

Eliminate Common

Subexpressions

Find Unreachable

Code

Detect Uninitialized

variables

Evaluate Constant

Expressions

CSCI 6636 – 4536 Lecture 2. . . 14/41 Spring 2017 14 / 41

Translators have Layers Translation

Native-code Compilers: Back End and Execution

Starting with the type-checked parse tree, the compiler finishes thecompilation thus:

Code generation: Relocatable memory addresses are assigned forvariables. (They will be adjusted later, when the executable code isloaded into memory.) Object code is generated. Symbol files for thedebugger are written out.

Linking: The object code and code from other modules (libraries) arelinked together into an executable format. An executable file may bewritten.

Missing and mis-named functions are identified. Definitions that havebeen included twice are identified. Any of these problems willterminate linking and prevent execution.

Loading: The executable code is loaded into the computer andcontrol is transferred to the first line of the main program.

CSCI 6636 – 4536 Lecture 2. . . 15/41 Spring 2017 15 / 41

Translators have Layers Translation

Byte-code Compilers: Back End and Interpretation

Starting with the parse tree, the compiler finishes the compilation thus:

Code generation: Memory addresses are assigned for variables. Aplatform-independent byte-code program is generated. Symbol filesfor the debugger are written out.

The byte-code program becomes input to a virtual machine whichsearches for other modules to link in. Missing modules are identified.

The virtual machine interprets the codes and runs the program.Just-in-time optimization may happen during execution.

CSCI 6636 – 4536 Lecture 2. . . 16/41 Spring 2017 16 / 41

Translators have Layers Translation

Interpreters: Back End and Interpretation

An interpreter runs within a dedicated environment that and implements aread-execute-write cycle (REW).

Starting with the parse tree, the interpreter finishes the execution thus:

The interpreter operates incrementally on the parsed source code.

It does not wait for the end of the program to begin processing it.This limits its ability to do optimization.

The program may be stored in a partially-processed form that is easierto handle than source code.

When run, the program code becomes input to an interpreter (avirtual machine) that runs a program unit and displays the output.

Type errors may be identified during execution.

Undefined words are identified.

CSCI 6636 – 4536 Lecture 2. . . 17/41 Spring 2017 17 / 41

Translators have Layers Translation

The Front End of Translation

Lex, parse and typecheck

InterpreterByte-Code compiler

Compiler

Lex, parse and typecheck

Lex and Parse

CSCI 6636 – 4536 Lecture 2. . . 18/41 Spring 2017 18 / 41

Translators have Layers Translation

The Back End of Translation: Code Generation

Generate object code

(machine language)

InterpreterByte-Code compiler

Compiler

Generate byte code

(VM language)

CSCI 6636 – 4536 Lecture 2. . . 19/41 Spring 2017 19 / 41

Translators have Layers Translation

Runtime: Execution

Link, load, and execute

in hardware

InterpreterByte-Code Interpreter

Compiler

Link, load, and execute in VM

Typecheck primitives.

Run machine code for parsed

program.

CSCI 6636 – 4536 Lecture 2. . . 20/41 Spring 2017 20 / 41

Translators have Layers Dozens of Languages

Part 3

3. Languages Come in Many Flavors.

Possible Design GoalsDesign Examples

Representation Issues

CSCI 6636 – 4536 Lecture 2. . . 21/41 Spring 2017 21 / 41

Translators have Layers Dozens of Languages

Possible Design Goals

These apply both to entire languages and to features within a language.

1 Utility. Is the language or language feature often useful?

2 Efficiency. Does it lead to efficient software?

3 Portability. Does a program produce the same results on anymachine?

4 Convenience. Is it easy to use? Does it support concise code?

5 Readability. Is it naturally readable?

6 Modeling ability. Will this feature or language help the programmermodel a problem more fully, more precisely, or more easily?

7 Simplicity. Is the language design as a whole simple, unified, andgeneral, or is it full of dozens of special-purpose features?

8 Clarity. Does every legal program have one defined, unambiguous,time-invariant meaning?

CSCI 6636 – 4536 Lecture 2. . . 22/41 Spring 2017 22 / 41

Translators have Layers Dozens of Languages

Design Examples

Scheme was created to clean up the semantics of LISP and make itmore true to the underlying mathematical model (Lambda Calculus).

C is useful, efficient, simple, and mostly-portable. Some people thinkit is convenient. Its readability is good if a programmer uses adisciplined style, but modeling ability is limited. Clarity and portabilityare damaged by the ambiguous nature of type int and the undefinedorder of evaluation of expressions.

C++ is useful, efficient, convenient, and has excellent modelingability. Its readability, clarity and portability are about the same as C.It is absolutely NOT a simple language.

Java is useful, highly portable, convenient, and has excellent modelingability and clarity. Its readability is damaged by the length of theidentifiers in the standard libraries. It is absolutely NOT a simplelanguage; the API is massive, complex, and confusing. Its efficiency islimited by the fact that it runs inside a virtual machine.

CSCI 6636 – 4536 Lecture 2. . . 23/41 Spring 2017 23 / 41

Translators have Layers Representation Issues

Representation Issues

Semantic IntentPower

Explicit vs. ImplicitCoherence

LocalityDistinct Representation

CSCI 6636 – 4536 Lecture 2. . . 24/41 Spring 2017 24 / 41

Translators have Layers Representation Issues

Semantic Intent

A programmer has some idea or model of what he wants and expectsthe program to do. This is his semantic intent.

A program has semantic validity if it carries out the programer’ssemantic intent.

Or... the program works properly, as expected.

CSCI 6636 – 4536 Lecture 2. . . 25/41 Spring 2017 25 / 41

Translators have Layers Representation Issues

Language Power

A language is powerful to the extent that it permits the programmer toeasily and explicitly state his semantic intent, and that intent will behonored and enforced.

There are two kinds of power a language can have:

The power to do something easily.

The power to prevent something you do not want to happen.

Example: The template classes in C++ and Collection classes in Javamake it very easy to use stacks, queues, maps, trees, etc.

The private qualifier in Java or C++ prevents unwanted access to avariable.

CSCI 6636 – 4536 Lecture 2. . . 26/41 Spring 2017 26 / 41

Translators have Layers Representation Issues

Explicit vs. Implicit

The structure of an object can be reflected in a program either

Implicitly: the object has structure but nothing in the program (ormaybe only the comments) describe that structure.

Explicitly: something that is part of the language defines the intendedstructure.

Example:In FORTRAN-77, a table of objects would be defined as a set of parallelarrays, each storing one property of the objects.

In C, the same table can be defined explicitly as an array of structs.

CSCI 6636 – 4536 Lecture 2. . . 27/41 Spring 2017 27 / 41

Translators have Layers Representation Issues

Explicit vs. Implicit Typing

The type of an object or a function can be either explicit or implicit.

Implicit: In C, you do not need to specify the return type of afunction. If it is omitted, it will default to type int, and return valueswill be coerced to type int.

Explicit: In C, you MAY specify the return type of a function. If it isdeclared, return values will be coerced to match the declared type.

It is misleading, error prone, and unmaintainable when a programmer relieson implicit type declarations.

CSCI 6636 – 4536 Lecture 2. . . 28/41 Spring 2017 28 / 41

Translators have Layers Representation Issues

Coherence

A object, idea, or process is represented coherently if it is representedby a single symbol in the program so that it may be used as a unit.

A coherently represented object may have parts that can also be usedseparately.

It does not need to be stored in consecutive memory locations.

Example:Five #define symbols vs. an enum declaration with 5 enum constants.

The former leaves us in doubt about any relationship among the fivesymbols. The latter explicitly says that these constants belong togetherand define a set of alternatives.

CSCI 6636 – 4536 Lecture 2. . . 29/41 Spring 2017 29 / 41

Translators have Layers Representation Issues

Coherent Arrays

The advantage of coherence is that information that belongs together iskept together at all times. Two examples arise out of arrays.

Sometimes we use parallel arrays to represent two or more propertiesof a set of objects. For example, the x-coordinates and y-coordinatesof a set of points. The same data can be represented coherently as anarray of 2-member structures.

How do we handle passing an array to functions?

To use an array in C, you need the array itself and either its allocationlength or the number of items currently stored in it. The beginnerdeclares these as three separate variables. When the array is passed toa function, two or three parameters are required.

In C++, you can use a vector, which is a structure consisting of anarray and the two integers needed to manage it. This can be passedto a function as a single parameter.

CSCI 6636 – 4536 Lecture 2. . . 30/41 Spring 2017 30 / 41

Translators have Layers Representation Issues

Locality

It is easier to write, debug, modify, and maintain a program with highlocality.

Locality is high if related things are written together.

Locality is low if many lines or pages of code separate a definitionfrom its uses, or if parts of the same object are defined in two or moreplaces.

Example: A Java class has better locality than a C++ class.

In Java, all parts of a class are defined in the same file.

In C++, a class is split into two files: a header and an implementation.

CSCI 6636 – 4536 Lecture 2. . . 31/41 Spring 2017 31 / 41

Translators have Layers Representation Issues

Distinct Representation

If the same word or construct is used simultaneously for two purposes,trouble will follow.Example: In early Basic, each line of code starts with a line number. Linenumbers have two purposes:

To designate the execution order of the code lines.

As the targets of GOTO commands.

The problem is, these purposes conflict.A programmer will often need to add more code between two lines thatwere written earlier. This can easily force renumbering for a part of theprogram. But when lines are renumbered, any GOTO’s into therenumbered part will go to the wrong place.

CSCI 6636 – 4536 Lecture 2. . . 32/41 Spring 2017 32 / 41

Translators have Layers Good or Bad: Fundamental Considerations

What Makes a Language Design Good (or Bad)

A language should provide or encourage or support:

Lexical rules that do not ascribe meaning to invisible things.

Syntax that is easy to type and not prone to errors.

Semantics relatively free of “gotchas” ( if (a < b < c)...)

Consistent semantics for the same syntax in different contexts.

Syntax that is kind to program modification and maintenance.

Readable layout for programs.

Ability to define necessary restrictions on data access.

Ability to group related things on the same page or screen.

. . . and I could continue. . .

CSCI 6636 – 4536 Lecture 2. . . 33/41 Spring 2017 33 / 41

Translators have Layers Good or Bad: Fundamental Considerations

Religious Wars: Good Design or Bad?

People argue without end about whether these features are good or bad!

In Basic, you do not need to declare variables. (Good for small jobs,not good for big ones.)

In C++, a class definition involves a lot of typing that is not requiredin C. (Classes are good for large, complex jobs, bad for beginners andsmall jobs.)

C thinks compact names are good because they minimize time wastedtyping, spelling errors, and typing errors. Java thinks long names aregood because they are not cryptic.

Python is good because you can interactively write and debug it.Java is good because the compiler can and will catch type errors.

Java is good because it uses a garbage collector to manage memory.C++ is good because memory management CAN BE done efficientlywith little overhead.

CSCI 6636 – 4536 Lecture 2. . . 34/41 Spring 2017 34 / 41

Translators have Layers Good or Bad: Fundamental Considerations

Some “features” are flaws.

Many programmers will defend the flaws in their favorite language. Theycall them “features” and love to show you the tricks you can do withthem.

In APL, you can compute a number then “go to” that line.

In APL, you can write a fairly complex program on one line.

In Basic, you don’t have to worry about the difference betweenintegers and reals.

In Python, programs look great because the indentation defines thescope of each statement.

In C, you can walk off the end of an array.

CSCI 6636 – 4536 Lecture 2. . . 35/41 Spring 2017 35 / 41

The Layers of FORTH

Part 5: The Layers of FORTH

1 Fundamental concepts: postfix notation, integer-based, small set ofkeywords.

2 The runtime environment: Dictionary, Stack, Return stack.

3 The interpreter and stack operations

4 The compiler

5 Type declarations and language extension mechanisms

FORTH also has a text editor and an assembler but we will not use them.

CSCI 6636 – 4536 Lecture 2. . . 36/41 Spring 2017 36 / 41

The Layers of FORTH Layer 6: The FORTH System

Layer 6 (Top): The FORTH Environment

See: gforth manual and tutorialDownload: gforth-0.7.0.exe (gforth implementation for Windows)

FORTH, like many languages of its class, is a self-contained system.In addition to the compiler and interpreter, every standard FORTHsystem supplies an editor, an assembler, and libraries.

In this course, we will not be using the assembler or the libraries; onlythe core language, the compiler, and the interpreter.

My advice is to use your favorite editor to write programs and storethem in normal files called xxxx.for . When you want to bring apiece of program into the FORTH system, you can either LOAD a fileor use the mouse to paste the lines into the FORTH window.

CSCI 6636 – 4536 Lecture 2. . . 37/41 Spring 2017 37 / 41

The Layers of FORTH Layer 6: The FORTH System

FORTH implements a REW cycle

When you run a FORTH system, you start in the interpreter window.

You can type in arithmetic expressions and pre-defined commands.

Your code will be executed and the results (if any) displayed to thescreen.

If all is well, you will see the system prompt em OK.

If there was an error, you will see an error comment. This will causethe parameter stack to be emptied.

The system is then ready for another command.

CSCI 6636 – 4536 Lecture 2. . . 38/41 Spring 2017 38 / 41

The Layers of FORTH Layer 4: Lexing FORTH

Lex: FORTH is different!

Forth is simple. . . so simple that it is “hard”.

Whitespace delimited.Don’t forget to type the whitespace after every symbol.

Any ASCII character can begin a word or occur in a word.

Everything is written in postfix order.

There is no distinction between operators and other symbols. Noprecedence is needed in a postfix language.

Each symbol has one unambiguous meaning.

All of these things permit the FORTH compiler and interpreter to be verysimple and very fast.Why?

CSCI 6636 – 4536 Lecture 2. . . 39/41 Spring 2017 39 / 41

The Layers of FORTH Layer 4: Lexing FORTH

Example: Lexing a few lines of FORTH

-3 5 + .integer -3

integer 5

word +

word .

\ A basic hello program: hello ( -- ) ." Hi, I’m here." ;

comment A basic hello program

word :

word hello

comment --

string Hi, I’m here.

word ;

CSCI 6636 – 4536 Lecture 2. . . 40/41 Spring 2017 40 / 41

Homework

This Week’s WorkDue on September 22

Read Chapter 3 of the text.

Hw 2: Written Exercises; Answer these questions, briefly, precisely,and clearly.

1 Explain the difference between lexical analysis and syntactic analysis.What happens at each stage?

2 List six kinds of errors that the compiler and/or language system canidentify. For each, explain the stage of translation or execution wherethe error is detected.

3 Choose one of the properties of representation, listed on pages 24–32.Think of the languages you have used. Give a good example and a badexample of the property you chose. Please don’t repeat my examples orthe examples from the textbook.

4 Take the first three lines of the gcd program used for lab 1. Identifythe lexical tokens on these three lines.

CSCI 6636 – 4536 Lecture 2. . . 41/41 Spring 2017 41 / 41