© Kenneth C. Louden, 20031 Chapter 5 - Basic Semantics Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden

© Kenneth C. Louden, 2003 1

Chapter 5 - Basic SemanticsChapter 5 - Basic Semantics

Programming Languages:

Principles and Practice, 2nd Ed.

Kenneth C. Louden

Chapter 5 K. Louden, Programming Languages 2

AttributesAttributes Properties of language entities, especially

identifiers used in a program. Important examples:

– Value of an expression– Data type of an identifier– Maximum number of digits in an integer– Location of a variable– Code body of a function or method

Declarations ("definitions") bind attributes to identifiers.

Different declarations may bind the same identifier to different sets of attributes.


Binding times can vary widely:Binding times can vary widely: Value of an expression: during execution or

during translation (constant expression). Data type of an identifier: translation time

(Java) or execution time (Smalltalk, Lisp). Maximum number of digits in an integer:

language definition time or language implementation time.

Location of a variable: load or execution time.

Code body of a function or method: translation time or link time or execution time.


Symbol table and environmentSymbol table and environment A dictionary or table is used to maintain the

identifier/attribute bindings. It can be maintained either during translation or

execution or both. (Pre-translation entities are entered into the initial or default table.)

During translation this table is called the symbol table.

During execution this table is called the environment.

If both are maintained, the environment can usually dispense with names, keeping track only of locations (names are maintained implicitly).


Semantic functionsSemantic functions

Formally, we can think of the symbol table and environment as functions.

If the symbol table and environment are maintained separately (compiler):

SymbolTable: Names Static Attributes Environment: Names LocationsMemory: Locations Values

If the symbol table and environment are maintained together (interpreter):

Environment: Names Attributes (including locations and values)


DeclarationsDeclarations Declarations bind identifiers to attributes. The collection of bound attributes (including the

identifier) can be viewed as equivalent to the declaration itself.

Attributes may be explicit or implicit in a declaration.

A declaration may fail to fully specify all necessary attributes—a supplemental declaration is then necessary elsewhere.

By abuse of language, we sometimes refer to the declaration of x, or just x—using the identifier to stand for the declaration.


Examples of declarations (C)Examples of declarations (C)

int x = 0;Explicitly specifies data type and initial value. Implicitly specifies scope (see next slide) and location in memory.

int f(double);Explicitly specifies type (double int). Implicitly specifies nothing else: needs another declaration specifying code.

The former is called a definition in C, the latter is simply a declaration.


ScopeScope The scope of a declaration is the region of the

program to which the bindings established by the declaration apply. (If individual bindings apply over different regions: scope of a binding.)

Scope is typically indicated implicitly by the position of the declaration in the code, though keywords can modify it.

In a block-structured language, the scope is typically the code from the end of the declaration to the end of the "block" (indicated by braces {…} in C and Java) in which the declaration occurs.

Scope can extend backwards to the beginning of the block in certain cases (class declarations in Java and C++, top-level declarations in Scheme).


Lexical vs. dynamic scopeLexical vs. dynamic scope Scope is maintained by the properties of the lookup

operation in the symbol table or environment. If scope is managed statically (prior to execution),

the language is said to have static or lexical scope ("lexical" because it follows the layout of the code in the file).

If scope is managed directly during execution, then the language is said to have dynamic scope.

The next slide has an example showing the difference.

It is possible to maintain lexical scope during execution (i.e. by the environment), but it requires extra links and a somewhat unusual lookup operation (see Chapter 8). (Scheme does this.)


Java scope exampleJava scope example

public class Scope{ public static int x = 2; public static void f() { System.out.println(x); } public static void main(String[] args) { int x = 3; f(); }}

Of course, this prints 2, but under dynamic scope it would print 3 (the most recent declaration of x in the execution path is found).


Dynamic scope evaluatedDynamic scope evaluated

Almost all languages use lexical scope: with dynamic scope the meaning of a variable cannot be known until execution time, thus there cannot be any static checking.

In particular, no static type checking. Originally used in Lisp. Scheme could still use it,

but doesn't. Some languages still use it: VBScript, Javascript, Perl (older versions).

Lisp inventor (McCarthy) now calls it a bug. Still useful as a pedagogical tool to understand the

workings of scope. In some ways a lot like dynamic binding of methods.


Scope holesScope holes Under either lexical or dynamic scope, a

nested or more recent declaration can mask a prior declaration. Indeed, in slide 10, the local declaration of x in main masks the static declaration of x in the Scope class.

How would you access the static x inside main in Java?

Use Scope.x in place of x:public static void main(String[] args){ int x = 3; Scope.x = 4; ...}

Exercise: how to do this for non-statics?


Symbol table structureSymbol table structure A table of little stacks of declarations under each

name. For example the table for the Scope class of slide 10 would look as follows inside main (using lexical scope):

int local to main

x

public static void method

f

name bindings

main public static void method

public static int

args String[]

parameter of main


Symbol table structure (2)Symbol table structure (2) Alternatively, a stack of little tables, one for each

scope. For example, the previous example would look as follows (lexical scope):

x


Symtab:

f

name bindings


Symtab:

public static int

name bindings

Symbol table for Scope

(empty)

name bindings

int local x

args String[] parameter

Can be deletedafter leaving f

Current tableinside main


Symbol table structure (3)Symbol table structure (3) Symbol table is constructed as declarations are

encountered (insert operation). Insertions follow static structure of source code with

lexical scope. Insertions follow execution path with dynamic

scope. Lookups occur as names are encountered in

dynamic scope (in symbol table to that point). In lexical scope, lookups occur either as names are

encountered in symbol table to that point (declaration before use—C), or all lookups are delayed until after the symbol table is fully constructed and then performed (Java class—scope applies backwards to beginning of class).


Symbol table structure (4)Symbol table structure (4) Using dynamic scope, the same example would

look as follows:

x


Symtab:

f

name bindings


Symtab:

public static int

name bindings

Symbol table for Scope

(empty)

name bindings

int local x

args String[] parameter

Current tableinside f

Current tableinside main


Symbol table structure evaluatedSymbol table structure evaluated

Which organization is better? Table of little stacks is simpler (C, Pascal). Stack of little tables is more versatile, and

helpful when you need to recover outer scopes from within inner ones or from elsewhere in the code (Ada, Java, C++).

Normally, no specific table structure is part of a language specification: any structure that provides the appropriate properties will do.


Ada example (Fig. 5.17, p.151):Ada example (Fig. 5.17, p.151):(global symbol table:)

ex procedure symtab

procedure symtab

p

integer

x

character

y

name

bindings

float

x

block symtab

A

name

bindings

y

array (1..10) of integer

name bindings

name

bindings


OverloadingOverloading Overloading is a property of symbol tables that

allows them to successfully handle declarations that use the same name within the same scope.

It is the job of the symbol table to pick the correct choice from among the declarations for the same name in the same scope. This is called overload resolution.

It must do so by using extra information, typically the data type of each declaration, which it compares to the probable type at the use site, picking the best match.

If it cannot successfully do this, a static semantic error occurs.


Overloading (2)Overloading (2) Overloading typically applies only to functions or

methods. Overloading must be distinguished from dynamic

binding in an OO language. Overloading is made difficult by weak typing,

particularly automatic conversions. In the presence of partially specified types, such

as in ML, overload resolution becomes even more difficult, which is why ML disallows it.

Scheme disallows it for a different reason: there are no types on which to base overload resolution, even during execution.


Overloading (3)Overloading (3) An example in Java:

public class Overload { public static int max(int x, int y) { return x > y ? x : y;} public static double max(double x, double y) { return x > y ? x : y;} public static int max(int x, int y, int z) { return max(max(x,y),z);} public static void main(String[] args) { System.out.println(max(1,2)); System.out.println(max(1,2,3)); System.out.println(max(4,1.3)); }}

Adding more max functions that mix double and int parameters is ok. But adding ones that mix double and int return values is not!


Overloading (4)Overloading (4) C++ and Ada are even more challenging for

overload resolution: C++, because it allows many more automatic conversions, Ada because the return type is also used to resolve overloading (Ada gets away with this only because it allows no automatic conversions).

It is possible for languages to also keep different symbol tables for different kinds of declarations. In Java these are called "name spaces," and they also represent a kind of overloading.

Java is particularly ugly in this respect: there are different name spaces for classes, methods, params/vars, labels, and even packages -- see Figure 5.23, page 158.


The EnvironmentThe Environment Can be constructed entirely statically (Fortran):

all vars and functions have fixed locations for the duration of execution.

Can also be entirely dynamic: functional languages like Scheme and ML.

Most language use a mix: C, C++, Java, Ada. Consists of three components:

– A fixed area for static allocation– A stack area for lifo allocation (usually the

processor stack)– A "heap" area for on-demand dynamic

allocation (with or without garbage collection)


Typical Typical environment environment organization organization (possible C)(possible C)

[Figure 5.25, p. 165)][Figure 5.25, p. 165)]

© 2003 Brooks/Cole - Thomson Learning™


The Runtime StackThe Runtime Stack

Used for:

– Procedure/function/method calls

– temporaries

– local variables

Temporaries: intermediate results that cannot be kept in registers; not considered here further.

Procedure calls: Chapter 8.

Local variables: part of calls, but can be considered independently, showing LIFO behavior for nested scopes (next slide).


Example of stack-based allocation in Example of stack-based allocation in C within a procedure:C within a procedure:(1) A: { int x;

(2) char y;

(4) B: { double x;

(5) int a;

(7) } /* end B */

(8) C: { char y;

(9) int b;

(11) D: { int x;

(12) double y;

(14) } /* end D */

(16) } /* end C */

(18) } /* end A */

Point #1

Point #2


Stack at Point #1:Stack at Point #1:



Stack at Point #2:Stack at Point #2:



An alternative: a flat local spaceAn alternative: a flat local space All local variables allocated at once, regardless of

nesting. Wastes some space, but not critically. With this approach, only complete

function/method calls get allocated on the stack. Even with the previous approach, the primary

structure of the stack is still the call structure: a complete record of a call on the stack is called an activation record or frame, and the stack is referred to as the call stack. (Chapter 8)

Java promotes a flat space by forbidding nested redeclarations, but this is not an essential property: a symbol table can easily distinguish nested declarations as A.x, A.y, A.B.x, A.B.a, etc.


Heap AllocationHeap Allocation In "standard" languages (C, C++, Java) heap

allocation requires a special operation: new. Any kind of data can be allocated on the heap in

C/C++; in Java all objects and only objects are allocated on the heap.

Even with heap allocation available in Java & C/C++, the stack is still used to represent calls.

In C/C++, deallocation is typically by hand (destructors), but it is hard to do right.

Java uses a garbage collector that periodically sweeps the heap looking for data that cannot be accessed any more by the program and adding it back to free space.


Heap Allocation (2)Heap Allocation (2) In functional languages (Scheme, ML) heap

allocation is performed automatically, and virtually everything, including function calls, are allocated on the heap.

Of course, functional languages also use garbage collection, since deallocation is automatic as well. (Indeed, SML/NJ as its default still quaintly announces calls to the garbage collector, including some statistics.)

A lot of study and effort has been made by both the functional language and OO language community to make garbage collection efficient in both time and space. Sadly, C and C++ still lack standard garbage collectors.


Lifetime/ExtentLifetime/Extent The lifetime or extent of a program entity is the

duration of its allocation in the environment.

Allocation is static when the lifetime is the duration of the entire program execution.

Lifetime is related to but not identical to scope. With scope holes, lifetime can extend to regions of the program where the program entity is not accessible.

It is also possible for scope to exceed lifetime when a language allows locations to be manipulated directly (as for example manual deallocation). This is of course very dangerous!


Variables and ConstantsVariables and Constants

A variable is an object whose stored value can change during execution.

A constant is an object whose value does not change throughout its lifetime.

Constants are often confused with literals: constants have names, literals do not.

Constants may be:– compile-time static (may not ever be allocated)– load-time static– dynamic


Constants (2)Constants (2)

Compile-time constant in Java:static final int zero = 0;

Load-time constant in Java:static final Date now = new Date();

Dynamic constant in Java:any non-static final assigned in a constructor.

Java takes a very general view of constants, since it is not very worried about getting rid of them during compilation.

C takes a much stricter view of constants, essentially forcing them to be capable of elimination during compilation.


Aliases, Dangling References, and Aliases, Dangling References, and GarbageGarbage An alias occurs when the same object is bound to

two different names at the same time. This is fairly common with Java objects.

A dangling reference is a location that has been deallocated from the environment, but is still accessible within the program. Dangling references are impossible in a garbage-collected environment with no direct access to addresses.

Garbage is memory that is still allocated in the environment but has become inaccessible to the program. Garbage can be a problem in a non-garbage collected environment, but is much less serious than dangling references.

Documents

© Kenneth C. Louden, 20031 Chapter 5 - Basic Semantics Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden