Upload
stephen-cameron
View
226
Download
0
Embed Size (px)
Citation preview
PLLab, NTHU,Cs2403 Programming Languages
Scope binding and Data types
Kun-Yuan Hsieh
Programming Language Lab., NTHU
PLLab, NTHU,Cs2403 Programming Languages
Names• Identify labels, variables, subprograms, parameters,
etc.• Name length
– Earliest PL: 1– FORTRAN I: max 6– CORBOL: max 30– FORTRAN 90 & ANSI C: max 31– Java, C++: no limit, implementers often impose one
• Case sensitive (e.g. StrLeng != strleng)– C, C++, and Java are– Problems: poor readability & writability
PLLab, NTHU,Cs2403 Programming Languages
Names (con’t)• Keyword:
– word that is special only in certain contexts(in FORTRAN) (in FORTRAN)REAL APPLE REAL INTEGERREAL = 3.4 INTEGER REAL
• Reserved word: – word that cannot be used as a name(in C) for, while, if, else
• Predefined words: – words that have been defined by public libraries (in C) printf, scan, createfile
PLLab, NTHU,Cs2403 Programming Languages
Variable (var)• A var is an abstraction of a memory cell• A var can be viewed as below:
(Name, Address, Value, Type, Lifetime, Scope)• Name
– Referred to as identifier (id)
• Address– The memory address with which it is associated– May have different address at different times during
execution (e.g. local vars in a recursive subprogram)– May have different address at different places in a
program (e.g. count defined in sub1 and sub2
PLLab, NTHU,Cs2403 Programming Languages
Variable• Alias
– More than one variables are used to access the same memory location
– (in C++) &p = d ; (p and d are aliases)
• Type– Determines the range of its values, and the set of ops
defined– (in C) unsigned int count ; (range: 0 – 65535)
• Value– The contents of the memory cell associated with the
name
PLLab, NTHU,Cs2403 Programming Languages
Binding
• Binding is an association between two attributes– Ex:
int count ;count = count + 5;
• Type of count: bound at compile time• Value range of count: bound at compiler design time• Value of count: bound at execution time• Definition of + op: bound at language design time
PLLab, NTHU,Cs2403 Programming Languages
Static & dynamic binding
• Static binding: it first occurs before run time and remains unchanged in execution
• Dynamic binding: it first occurs during run time or can change in execution
• Discussed issues: type & storage– Type binding
• How is type specified?• When does the binding take place?
– Storage binding• When does the allocation take place?
PLLab, NTHU,Cs2403 Programming Languages
Static type binding
• Static type binding: explicit & implicit declaration– Explicit declaration
• (in C) int A, B ; double C ;
– Implicit declaration• (in FORTRAN) name begins with I, J, K, L, M, N are INTEGER,
others are REAL
• (C/C++) declaration & definition– declaration specifies types but do not cause allocation– external declaration is used for type checking– definition specifies types also causes allocation– only declares in header files but defines in C files
PLLab, NTHU,Cs2403 Programming Languages
Dynamic type binding
• Variable is bound to a type when assigned a value in an assignment statement– (in JavaScript) list = [10.2, 3.5]
list = count• Advantage: generic programming
– process a list of data with any type• Disadvantage: poor error detection
– i = x (int) and i = y (double)!!!• Disadvantage: large cost
– storage of a variable must be of varying size
PLLab, NTHU,Cs2403 Programming Languages
Storage binding• The lifetime of a variable is the time during
which it is bound to a specific memory location– static variables– stack-dynamic variables– explicit heap-dynamic variables– implicit heap-dynamic variables
PLLab, NTHU,Cs2403 Programming Languages
Static variables• Vars that bound to memory cells before
execution begins and unbound when execution terminates– static storage binding– e.g. global variables– (in C) static int count ; // history sensitive
• Advantages:– code efficiency by direct addressing– no allocation/deallocation during runtime
• Disadvantages:– less flexibility : cannot support recursive– no storage sharing
PLLab, NTHU,Cs2403 Programming Languages
Stack-dynamic variables
• Vars whose storage bindings are created when declarations are executed– dynamic storage binding– static type binding– allocated from the run-time stack
• Advantages:– storage sharing, support recursive
• Disadvantages:– overhead of allocation / deallocation
PLLab, NTHU,Cs2403 Programming Languages
Example I
#include <stdio.h>
int count;
main( ) {
int i ;
for (i=0; i<=10; i++)
{ test( ) ; }
}
test( ) {
int i ;
static int count = 0;
count = count + 1 ;
}
count
test::count
main::i
test::i
static varptr
run-timestack ptr
virtual address space
PLLab, NTHU,Cs2403 Programming Languages
Example I
#include <stdio.h>
int count;
main( ) {
int i ;
for (i=0; i<=10; i++)
{ test( ) ; }
}
test( ) {
int i ;
static int count = 0;
count = count + 1 ;
}
Type-binding
Storage-binding
count static static
main::i static stack-dynamic
test::i static stack-dynamic
test::count Static static
PLLab, NTHU,Cs2403 Programming Languages
Explicit heap-dynamic variables
• Vars that are allocated and deallocated by explicit statements during run-time– Dynamic storage binding– Referenced through pointers or references– (in C) malloc( ) and free( ) (in C++) new and delete– Memory leak is a major cause of SW bugs
• Heap: a section of virtual address space reserved for dynamic memory allocation– heap fragmentation: a major cause of SW perf bugs
• Java objects are explicit heap-dynamic vars– implicit garbage collection (no free or delete)
PLLab, NTHU,Cs2403 Programming Languages
Example II
#include <stdio.h>main( ) { int i ; test( ) ; }test( ) { int i ; char *space; static int count = 0; space = (char *) malloc(64); // memory leak}
main::i
test::i
static varptr
run-timestack ptr
virtual address space
test::count
heaptest::space
PLLab, NTHU,Cs2403 Programming Languages
Example II
#include <stdio.h>main( ) { int i ; test( ) ; }test( ) { int i ; char *space; static int count = 0; space = (char *) malloc(64); free(space);}
main::i
test::i
static varptr
run-timestack ptr
virtual address space
test::count
heap
PLLab, NTHU,Cs2403 Programming Languages
Implicit heap-dynamic variables
• Vars that are bound to heap storage only when they are assigned values– dynamic storage binding– Strings and arrays in Perl and JavaScript@names = { “door”, “windows”, “table”};$names[3] = “bed”;
• Advantages:– highest degree of flexibility
• Disadvantages:– huge run-time overhead– some loss of error detection by compiler
PLLab, NTHU,Cs2403 Programming Languages
Type checking
• The activity of ensuring that the operands of an operator are of compatible types– (int) A = (int) B + (real) C– int + real okay for the + operator– (int + real) converts to (real + real) type
coercion– int = real type error
• Static type checking: done in compile time• Dynamic type checking: done in run time
PLLab, NTHU,Cs2403 Programming Languages
Strong typing• Strong-typed: if type errors are always detected
– Yes (Ada, Java), No (Pascal, C, C++)• Example: union in C: type error cannot be detected
typedef union { float A; int B; } Sample;Sample data;main( ) { float R1 = 3.231; int N1; data.A = R1; ….. N1 = data.B; // N1 = 1078905012}
PLLab, NTHU,Cs2403 Programming Languages
Type compatibility
• Name type compatibility: two variables declared with the same type name– Easy to implement but less flexibility
type days = integer ; type months= integer ;days N1; months N2;
– N1 and N2 are not compatible• Structured type compatibility: two variables
have identical structures– More flexible, but harder to implement– N1 and N2 above are compatible
PLLab, NTHU,Cs2403 Programming Languages
Scope
• The range of statements over which it is visible (it can be referenced)
• Static scope: determined in compile time– Search: based on declaration sequence
• Dynamic scope: determined in run-time– Search: based on the call stack sequence
PLLab, NTHU,Cs2403 Programming Languages
Scope example proc main var X1, X2; proc A
var X1, X2; end proc B var X1, X2; call A; end call B;end
Proc A reference environment
Static scope: A main
Dynamic scope: A B main
PLLab, NTHU,Cs2403 Programming Languages
Static & dynamic scopeProcedure big;
var x : integer;
procedure sub1;begin
…x… (1)end;
procedure sub2;var x : integer;begin
…x… (2)sub1;
end;
beginsub1; (3)…x… (4)sub2; (5)end
Static scope:(31) x = big’s x(4) x = big’s x(2) x = sub2’s x(51) x = big’s x
Dynamic scope(31) x = big’s x(4) x = big’s x(2) x = sub2’s x(51) x = sub2’s x
PLLab, NTHU,Cs2403 Programming Languages
Static vs. dynamic scope
• Static scope– easier to read– reference statically resolved – fast execution
• Dynamic scope– harder to read– reference dynamically resolved – slow execution
PLLab, NTHU,Cs2403 Programming Languages
Block• Block: a method of creating static scopes inside
program unit• Global reference uses :: operator
(in C/C++)for (…) {
int index; index = temp; ….
temp = ::index;}
PLLab, NTHU,Cs2403 Programming Languages
Lifetime & Scope• Scope and lifetime are closely related but
different concepts
void office( ) { …} /* end of office */
void home( ) { int count; … office( );} /* end of home */
• Example in C– static scope– count scope only in
home( )– count lifetime ends at
the return of office( )
PLLab, NTHU,Cs2403 Programming Languages
Named constants• Named constant: variable that is bound to a
value only when it is bound to storage– value cannot be changed later– e.g. final in Java, const in C or C++– readability & modifiability– different from #define (no storage allocated)– Ex:
(in C++)void students( ) {
const int count = 100; for (i = 0; i < count; i++) total += scores[i];
}
PLLab, NTHU,Cs2403 Programming Languages
Data types
PLLab, NTHU,Cs2403 Programming Languages
Primitive data types
• Those are not defined in terms of other data types
• Numeric types– Integer– Floating-point– Decimal
• Boolean types• Character types
PLLab, NTHU,Cs2403 Programming Languages
Numeric Types
• Integer– Several sizes– Byte, short, int, long…– Range limited
• Floating point– Approximation to model real numbersIEEE float:IEEE double:
exp fraction (23)
exp (11) fraction (52)
PLLab, NTHU,Cs2403 Programming Languages
Numeric Types
• Decimal– For business applications– Store a fixed number of decimal digits (coded)– Advantage: accuracy– Disadvantages: limited range, wastes memory– e.g. 924.1 in an 6-byte data w/ a 4-byte integer
part
00 00 09 24 10 00
PLLab, NTHU,Cs2403 Programming Languages
Boolean & character types
• Boolean– Could be implemented as bits, but often
as bytes– Advantage: readability
• Character– ASCII: one byte per char (0 – 127)– Unicode: 2 bytes per char (multi-
language support)
PLLab, NTHU,Cs2403 Programming Languages
Character String Types
• Values are sequences of characters
• C and C++– Not primitive– Use char arrays and a library of functions
that provide operations
PLLab, NTHU,Cs2403 Programming Languages
User-defined ordinal types• Ordinal type: the range of possible values can be
easily associated with positive integers– enumeration & subrange
• Enumeration type:– (C++)
enum colors {red, blue, yellow};colors mycolor = blue;
– C and C++: are implicitly converted to integermycolor++ assigns green to mycolor
– Readability: code is more meaningful– Reliability: compiler can check operations and ranges
PLLab, NTHU,Cs2403 Programming Languages
Subrange types• A contiguous subsequence of an ordinal type
– (Pascal) type uppercase = ‘A’ .. ‘Z’;
index = 1..100;
• Subrange types are different from derived types (Ada)
type Cash is new INTEGER range 1..100;
subtype Score is INTEGER range 1..100;– type Cash (derived) is not compatible with any
INTEGER type– type Score (subrange) is compatible with any
INTEGER type
PLLab, NTHU,Cs2403 Programming Languages
Array types• An aggregate of homogeneous data elements• Indexing: mapping to an element
– Subscript range checking:• C, C++, FORTRAN: no range checking• Pascal, Ada, and Java: yes
– Subscript type: • FORTRAN, C, Java: integer only• Pascal: any ordinal type (int, char, boolean, enum)
– # of subscripts:• FORTRAN I: 3 (max)• FORTRAN 77: 7 (max)• Others: no limit
PLLab, NTHU,Cs2403 Programming Languages
Array categories (4)• Static array
– storage: statically bound– subscript range: statically
bound– e.g. global arrays in C/C++
• Fixed stack-dynamic array– storage: dynamically bound– subscript range: statically
bound– e.g. local arrays in C/C++
#include <stdio.h>int data[10];main( ) { …}
#include <stdio.h>int data[10];main( ) { int bounds[20]; …}
PLLab, NTHU,Cs2403 Programming Languages
Array categories (4) (con’t)
• Stack-dynamic array– storage: dynamic bound– subscript range: dynamic– only bound once
(Ada) GET(LEN)declare
LIST : array (1..LEN) of integerbegin … end;
• Heap-dynamic array– both: dynamic bound– bound multiple times– e.g. (FORTRAN 90)
INTEGER, ALLOCTABLE, ARRAY (:,:) :: MAT(MAT is 2-dim array)
ALLOCATE(MAT(10, 5))(MAT has 10 rows, 5 cols)
DEALLOCATE MAT(deallocate MAT’s storage)
PLLab, NTHU,Cs2403 Programming Languages
Array categories (4) (con’t)• Heap-dynamic array in
C/C++
sample( ) { int bounds*, A; bound = (int *) malloc(40); A = bound[3]; free(bound); bound = (int *) malloc(80); A = bound[10]; free(bound);}
• Heap-dynamic array in Perl/JavaScript
@home = {“couch”, “chair”, “table”, “stove” };
// $home[0] = “couch”// $home[1] = “chair”// $home[2] = “table”// $home[3] = “stove”
$home[4] = “bed”;
PLLab, NTHU,Cs2403 Programming Languages
Array implementation• One-dim array address calculation
– addr(a[j]) = addr(a[0]) + j * elementSize;– Ex:
int bounds[10]; // one int = 4 bytes&bounds[0] = 1200 &bounds[4] =
• Row-major allocation– addr(a[i,j]) = addr(a[0,0]) + ((i * n) + j) * elementSize;– ex:
a[0, ] = 3 4 7a[1, ] = 6 2 5 3, 4, 7, 6, 2, 5, 1, 3, 8a[2, ] = 1 3 8
&a[0,0] = 1200 &a[1,2] =
1216
1220
PLLab, NTHU,Cs2403 Programming Languages
Column-major allocation
• Column-major allocation– addr(a[i, j]) = addr(a[0,0]) + ((j * n) + i) * elementSize;
– Ex:a[0, ] = 3 4 7
a[1, ] = 6 2 5 3, 6, 1, 4, 2, 3, 7, 5, 8
a[2, ] = 1 3 8
&a[0,0] = 1200 &a[1,2] = 1228
PLLab, NTHU,Cs2403 Programming Languages
Associative arrays• an unordered collection of data elements
that are indexed by an equal # of keys
• example in Perl%salaries = { “John” => 5000, “Tom” => 3500, “Mary” => 6000};
$salaries{“Perry”} = 6500; { add an entry into this array }
tax = $salaries{“John”}; {tax is assigned with 5000}
delete $salaries{“Tom”}; {remove the entry with key “Tom”}
@salaries = (); {remove all entries in this array}
PLLab, NTHU,Cs2403 Programming Languages
Record types• A heterogeneous aggregate of data elements
identified by names• Ex:
(Ada) EMPLOYEE_RECORD :
record EMPLOYEE_NAME : record FIRST : STRING (1..20); LAST : STRING (1..20); end record; SALARY : FLOAT;end record
PLLab, NTHU,Cs2403 Programming Languages
Record types (con’t)
• C/C++: struct• Field reference
– COBOL: field OF Name1 OF … OF NameN– Others: Name1.Name2. … .NameN.field– Pascal and Module-2 provide with clausewith employee do begin
name := ‘Bob’; age := 42; end
• Implementation: field are stored in adjacent memory, and accessed through offset address
PLLab, NTHU,Cs2403 Programming Languages
Array vs. Record• Array: same type of data• int data[100];• A = data[35];• &data[35] =
data 1200
size 4
lower 0
upper 0
• Record: diff type of data• struct {
char name[32]; int age;
float salary; } dd;• A = dd.age;• &dd.age =
dd 1600
name 0
age 32
salary 36
1340
1632
PLLab, NTHU,Cs2403 Programming Languages
Union types• Whose variables are
allowed to store diff type values at diff time– FORTRAN:
EQUIVALENCE– C and C++: union (free
union: free from type checking)
typedef union { float A; short B; } Sample;Sample data;
main( ) { float R1 = 3.231; short N1; data.A = R1; N1 = data.B; // N1 = error number}
data B
A
PLLab, NTHU,Cs2403 Programming Languages
Pointer type• Usage: indirect addressing & dynamic storage
management• Operator: * (C, C++), access (Ada), ^ (Pascal)• Problems:
– Dangling pointer: points to a heap-dynamic var that has been deallocated
– Memory leakage (aka lost heap-dynamic var): an allocated var that is no longer accessible
– These 2 problems cause a large part of bugs in software
int *p1, *p2;p1 = (int) malloc(sizeof(int));p2 = p1;delete p1;value = *p2;
int *p1, *p2;p1 = (int) malloc(sizeof(int));p2 = (int) malloc(sizeof(int));p1 = (int) malloc(sizeof(int));delete p1;
PLLab, NTHU,Cs2403 Programming Languages
Solution to pointer problems• Tombstone approach
– Each dynamic storage has a special cell, called tombstone, that points to this storage
– Pointer(s) point to the tombstone– Deallocate: set tombstone to NULL– Disads: cost in time & space as tombstone is never
deallocated and reused
dynamicvariable
Heap
dynamicvariable
Heap
tombstone
0
dangling pointers
PLLab, NTHU,Cs2403 Programming Languages
Solution to pointer problems
• Locks-and-keys approach– Pointer value represented by (key,
address)– Dynamic storage represented by (lock,
storage)– Each access will first check if (key == lock),
unmatched pair causes run-time error.– Deallocation: lock invalid value
PLLab, NTHU,Cs2403 Programming Languages
Heap management for reclaiming garbage
• Reference counters: – maintaining a counter in every cell– reclaim if (counter == 0)– Space required is significant– Execution time overhead for mantaining the counter
• Garbage collection: – Run-time system allocated storage – Garbage collection if no available cells– 3 phases:
• Invalidate all cell• Trace pointer to mark valid ones• Sweep the invalid to available list