45
Data Types and the Type System James Brucker

Data Types and the Type System James Brucker. Important Topics what is a "type system"? what are common data types? how are numeric types stored and operated

Embed Size (px)

Citation preview

Data Types and the Type System

James Brucker

Important Topics

what is a "type system"? what are common data types? how are numeric types stored and operated on? compound types: arrays, struct, records enumerated types character and string types strong type checking versus not-so-strong

enumerations, type compatibility, and type safety advantages & disadvantages of compile-time checking

Important Topics

type compatibility what are compatibility rules in C, C++, and Java? when are user-defined types compatible?

type conversion what conversions are automatic in C, C++, Java? what conversions are allowed using a cast?

Importance of knowing data types

for(int k=0; k<9999999; k++) /* do something */;int k = Integer.MAX_VALUE; // = 2,147,483,647

Need to know the valid range of data values.

Need to know rules for operations.

int k = Integer.MAX_VALUE; // = 2,147,483,647k = k + 1; // overflow? int m = 7, n = 4;float x = m / n; // 1.75, 2.0 or 1.0

? Need to know what assignments are valid and how the compiler will convert from one type to another.

Need to know what variables represent: value of data or a reference to a storage location.

Data Type

A data type is a set of possible values and operations on those values

Example: int

set of values: -2,147,483,648 ..., 0, 1, 2,147,483,647

( -231 to 231 - 1 )

operations:

+: int int int * : int int int etc.

internal representation:

32-bit 2's complement

Data Types define meaning

To the computer, a stored value is just bits. The data type assigns meaning to those bits. Example C function:

/* "An int is a unsigned is a float" --the cpu */void rawdata( ) { union { int i;

unsigned int u;float f; } x;

while( 1 ) { printf("Input a value: "); scanf("%d", &x.i); // read as an int printf("int %d is unsigned %u is float %g\n", x.i, x.u, x.f); }}

Type System

The type system is the collection of all data types rules for type equivalence, type compatibility, and type

conversion between data types

Type System

Example: int n = 0.5 * 99; cannot directly multiply a float times int type conversion rule: "int" can be automatically

converted to "float". type system says float * float is float (49.5) in C, assignment compatibility rule says that you can

convert float back to int by truncation. in Java or C#, result is double and it is not assignment

compatible with int (assignment error)

Memory Concepts

We will cover memory management later, but first... When OS runs a program it allocates at least 2

memory segments: text segment for program instructions (Read Only) data segment for data (variables, constants, ...)

Data segment is divided into 3 parts: static area - static data stack area - stack oriented data heap - dynamic, non-stack data

Memory Concepts

int count = 0;const int MAXSIZE = 4000;

int *getarray(int n) {int *a = (int *)malloc(

n*sizeof(int) );return a;

}

int main( ) {int size;scanf("%d",&n);int *a = getarray(n);scanf("%f",a);a++;

}

countMAXSIZE

Stack Frame for mainsizea (pointer only)

a[0], a[1], ...

Stack Frame for getarray

unused space

StaticArea

StackArea

HeapArea

Virtual Memory

Most OS use virtual memory. The actual location of memory pages varies. Accessing memory efficiently affects program speed.

Program virtual

memory

page n Memory manager

Real memory

page n

page n+1

page n+1

Integer Data Types

C/C++ support both “unsigned” and “signed” integer types. Type # Bytes Range of values

short int 2 -32,768 (-215) to 32,767 (215 - 1)

unsigned short 2 0 to 65,535 (216 - 1)

int 4 -2,147,483,648 (-231)

to 2,147,483,647 (231 - 1)

unsigned int 4 0 to 4,294,967,295 (232 - 1)

long int same as "int" on Pentium and Athlon CPU

C permits “char” type for integer values, too…char 1 -128 to 127

unsigned char 1 0 to 255

Example use of unsigned int

To display the address of a variable in C:

printf("address of %s is %d\n", "x", (unsigned int)&x);

IEEE 754 Floating Point Standard

Problem: some numerical algorithms would run on one computer, but fail on another computer. with arithmetic overflow/underflow error on another.

Worse problem: results from different computers could differ greatly! This reduced trust in the answer from computer. In fact, when numerical results differ greatly it

usually indicates a problem in the algorithm! Solution: IEEE 754 (1985) defines a standard for

computer storage of floating point numbers.

IEEE Floating Point Data Types

0 1 1 1 0 0 0 0 . . . 1 1 0 0 0 1 0 1 0

Sign bit Mantissa Biased Exponent

-1.011100 x 211 =

Float: 1 8 bits bias= 127 23 bits

Double: 1 11 bits bias=1023 52 bits

PrecisionRange

Float: 10-38 - 10+38 24 bits =~ 7 dec. digits

Double: 10-308 - 10+308 53 bits =~ 15 dec. digits

Stored exponent = actual exponent + bias

Implicit Leading "1"

Floating point numbers are stored in normalized form:

13.2525 =1101.01010 = 1.1010100 x 23

3/16 =0.00110000 = 1.1000000 x 2-3

Normalized form: the leading digit is always one. So, IEEE 752 doesn't store it.

Rule: if the stored value has exp. 2-bias to 2+bias then the floating point value is stored in normalized format:

1011.01110 = 1.011011100 x 23

mantissa: 011011100...

exponent: 3+bias = 130 (single prec)

Gradual underflow

To extend the precision for small numbers, very small numbers are not stored in normalized form.

In this case the leading "1" is also stored and the biased exponent has value 0 (smallest exponent)

Value Mantissa Biased Exp.1.01101110x 2-126 01101110000000000000000 -126+bias = 11.01101110x 2-127 10110111000000000000000 -127+bias = 01.01101110x 2-128 01011011100000000000000 -127+bias = 01.01101110x 2-129 00101101110000000000000 -127+bias = 01.01101110x 2-130 00010110111000000000000 -127+bias = 0... as number gets smaller, leading significant digits shift right1.01101110x 2-147 00000000000000000000101 -127+bias = 01.01101110x 2-148 00000000000000000000010 -127+bias = 01.01101110x 2-149 00000000000000000000001 -127+bias = 0

IEEE 754 Floating Point Values

The standard defines special values: +/-Infinity: 1/0 = +Infinity, -3/0 = -Infinity,

exp(5000)= +Infinity, Infinity+Infinity = Infinity

NaN (Not-a-Number). 0/0 = NaN, Infinity*0 = NaN, ...

Value Mantissa Exponent

Normalized f.p. 0, 1 1 to 2*bias any

Denormalized 0, 1 00000000 any

Zero 0, 1 00000000 0

+Infinity 0 11111111 0

-Infinity 1 11111111 0

NaN 0, 1 11111111 any non-0

Sign Bit

Floating Point Questions

Question: How do you store 2.50 as a "float"?

2.50 = 1.25 x 2 = 1.01000000000 x 21

Implicit leading 1 rule: mantissa = 010000000000000

Exponent: 1 + bias = 128

Stored value: 0 10000000 010000000000000000000

Question: What is the decimal value of:

1 10000000 100000000000000000000

0 00000000 000000000000000000000

0 11111111 000000000000000000000

Floating Point Questions (cont'd)

Question: How do you store 0.1 as a "float"?

0.1 = 0.0011001100110011001100 ...

Normalized mantissa = 10011001100110011001100

Exponent: -3 + bias = 124

Stored val: 0 01111100 10011001100110011001100

0.1 has no exact representation in binary!

Question: what decimal values have an exact binary representation (no truncation error)???

Consequence of inexact conversion 0.1 does not have exact binary representation. Therefore, we may have: 10 * 0.1 != 1.0 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1

!= 1.0 Don't use "==" as test criteria in loops with floats.

This loop never terminates:

double x = 0.1;

while( x != 1.0 ) { // better: ( x <= 1.0 )

System.out.println( x );

x = x + 0.1;

}

Type Compatibility for built-in types Operations in most languages will automatically convert

("promote") some data types:

2 * 1.75 convert 2 (int) to floating point Assignment compatibility: what automatic type

conversions are allowed on assignment?

int n = 1234567890;

float x = n; // OK is C or Java

n = x; // allowed in C? Java? char -> short -> int -> long -> double

short -> int -> float -> double What about long -> float ? Rules for C/C++ not same as Java.

C/C++ Arithmetic Type Conversion

For +, -, *, /, both operands must be the same type C/C++ compiler "promotes" mixed type operands to

make all operands same using the following rules:

Operand Types Promote Result

short op int short => int int

long op int int => long long

int op float int => float float

int op double int => double double

float op double float => double double

etc...

"op" is any arithmetic operation: + - * /

Assignment Type Conversion is not Arithmetic Type Conversion (1)

What is the result of this calculation?

int m = 15;

int n = 16;

double x = m / n;

Forcing Type Conversion Since arguments are integer, integer division is used:

double x = 15 / 16; // = 0 ! you must coerce "int" values to floating point.

There are two ways:

int m = 15;

int n = 16;

/** Efficient way: cast as a double */

double x = (double)m / (double)n ;

/** Clumsy way: multiply by a float (ala Fortran) */

double x = 1.0*m / n;

Assignment Type Conversion is not Arithmetic Type Conversion (2)

Many students wrote this in Fraction program:

public class Fraction {

int numerator; // numerator of the fraction

int denominator; // denominator of the fraction

...etc...

/** compare this fraction to another. */

public int compareTo( Fraction frac ) {

double r1 = this.numerator / this.denominator;

double r2 = frac.numerator / frac.denominator;

if ( r1 > r2 ) return 1;

else if ( r1 == r2 ) return 0;

else return -1;

}

Arrays

An array is a series of elements of the same type, with an index, which occupy consecutive memory locations.

float x[10]; // C: array of 10 “float” vars

char [] c = new char[40]; // Java: array of 40 "char"

x[0] x[1] x[2] x[9]. . .

Array x[ ] in memory:

4 Bytes = sizeof(float)

c[0] c[1] c[39]. . .

Array c[ ] in memory :

Array "dope vector"

In C or Fortran an array is just a set on continuous elements. No type or length information is stored.

Some languages store a "dope vector" (aka array descriptor) describing the array.

x[0]x[1]x[2]x[3]...

x

/* C language */double x[10];

01E4820

/* Language with dope */double x[10];

x[0]x[1]x[2]x[3]...x[9]

x double01001E4820

Array as Object

In Java, arrays are objects:

double [ ] x = new double[10]; x is an Object; x[10] is a double (primitive type).

x double[ ] +length = 10

x[0]x[1]x[2]...

x.getClass( ).toString( ) returns "[D"

1-Dimensional Arrays

Element of 1-D array computed as offset from start: float f[20]; address of f[n] = address(f) + n*sizeof(float)

Some languages permit arbitrary index bounds: Pascal:

var a: array [ 2..5 ] of real; FORTRAN

REAL (100) X array is X(1) ... X(100)

REAL (2:5) Y array is Y(2) ... Y(5) In any case, array element can be computed as offset:

address of a[n] = address(a) + (n-start)*sizeof(real)

2-Dimensional Arrays

There are different organizations of 2-D arrays: Rectangular array in row major order:

float r[4,3];

In memory (row major order):

r[0,0] r[0,1] r[0,2] r[1,0] r[1,1] r[1,2] r[2,0]

Rectangular array in column major order (Fortran):

real(4,3) x

in memory (column major order)

x(1,1) x(2,1) x(3,1) x(4,1) x(1,2) x(2,2) x(3,2) x(4,2) x(1,3)...

2-Dimensional Arrays

Computing address of array elements Rectangular arrays in row major order:

float x[ROWS][COLS];

address of x[j][k] = address(x)

+ (j*COLS + k) * sizeof(float) Three dimensional array:

float y[J][K][L];

address of y[j][k][l] = address(y)

+ j*K*L + k*L + l 2-D and 3-D arrays require more time to access due to this

calculation. Compiler can optimize when you access consecutive items

for(k = 0; k<COLS; k++) sum += x[j][k];

Arrays of Pointers: ragged arrays

Each element of a vector is a pointer to a vector char *days[7] = { "Sunday", "Monday",

"Tuesday", "Wednesday", "Thursday",

"Friday", "Saturday" };

days[0]

days[1]

days[2]

days[3]

days[4]

days[5]

days[6]

days[7]

S u n d a y 0 M o n d a y 0 T u e s d a y 0 W e d n e s d a y 0 T h u r s d a y 0 F r i d a y 0 S a t u r d a y 0

Vector of pointers: 7 x 4 bytes = 28 bytes

Array of characters: = 57 bytes

days =

Arrays of Pointers: ragged arrays (2)

Compare previous slide with 2-D array:char days[ ][10] = { "Sunday", "Monday",

"Tuesday", "Wednesday", "Thursday",

"Friday", "Saturday" };

S u n d a y 0

M o n d a y 0

T u e s d a y 0

W e d n e s d a y 0

T h u r s d a y 0

F r i d a y 0

S a t u r d a y 0

days =

2-D array = 7 x 10 bytes = 70 bytes

What is sizeof( ) for 2-D arrays?

Rectangular array in C:

char days[7][10] = { "Sunday", "Monday", ... };

int m = sizeof( days );

int n = sizeof( days[0] );

char *days[7] = { "Sunday", "Monday", ... };

int m = sizeof( days );

int n = sizeof( days[0] );

Array of pointers in C:

Java: always uses array of pointers

2-D arrays in Java are always treated as array of pointers

final int N = 10;

double [][] a;

a = new double[N][ ]; // create row pointers

for(int k=0; k<N; k++)

a[k] = new double[k+1]; // create columns

// array dimensions determined by initial values

int [][] m = { { 1, 2, 3, 4},

{ 5, 6}, { 8, 9, 10}, { 11 }

};

What are the sizes of each row of m ?

C#: rectangular and ragged arrays

A rectangular array in C# (one set of brackets)

const int N1 = 10, N2 = 20, N3=25;

// 2-dimensional array

double [,] a = new double[N1,N2];

// 3-dimensional array

double [,,] a = new double[N1,N2,N3];

// create array of row pointers

double [][] b = new double[N1][ ];

// allocate space for each row (can differ)

for (k=0; k<N1; k++) b[k] = new double[N2];

A ragged array in C# or Java uses multiple brackets:

In Java (but not C#) can write: b = new double[N1][N2]

Accessing Array Elements

Ragged Arrays require multiple levels of dereferencing

result = b[i][j];

double [ , ] b = new double[N1,N2];

result = b[i,j];

Rectangular array computes address as offset:

1. get address of b.

2. get b[i]. _addr = valueat( address(b) + i*sizeof( b[ ][ ] ) )

3. result = valueat( _addr + j*sizeof( b[ ][ ] ) )

1. get address of b.

2. result = valueat( address(b) +i*N2 + j )

In Java and C#, arrays are objects, so address is not this simple.

Efficiency and multi-dimensional array Multi-dimensional array access is much slower than 1-D array. Access in row order is more efficient, and can minimize paging.

// search a[ROWS,COLS] in row major order

for(int r=0; r<ROWS; r++) for (int c=0; c<COLS; c++)

if ( a[r,c] > max ) max = a[r,c];

r[0,0] r[0,1] ... r[0,ROWS-1] r[1,0] r[1,1]

// search a[ROWS,COLS] in column major order

for(int c=0; c<COLS; c++) for (int r=0; r<ROWS; r++)

if ( a[r,c] > max ) max = a[r,c];

r[0,0] r[0,1] ... r[0,ROWS-1] r[1,0] r[1,1]

Type Checking

Verifying that the actual value of an expression is valid for the type to which it is assigned.

A strongly typed language is one in which all type errors are detected at compile time or run time.

Example: Java is strongly typed: most type errors are detected

by compiler. Others, like casts, are checked at runtime and generate exceptions:

Object obj = new Double(2.5);

String s = obj; // compile time error

String s = (String) obj; // run-time ClassCastException

Type Compatibility for user types

typedef int type_a;

typedef int type_b;

int main( ) {

type_a a;

type_b b;

b = 5; // assign integer to "type_b" variable OK?

a = b; // assign "type_b" to "type_a" variable OK?

In C, "typedef" defines an alias for a type -- it doesn't create a new type.

Type Compatibility for user types (2)

struct A {

float x;

char c;

};

struct B {

float x;

char c;

};

typedef C {

float z;

char c;

};

int main( ) {

struct A a;

struct B b;

struct C c;

a.x = 0.5;

a.c = 'a';

b = a; // OK

c = a; // Error

if (b == a) // OK?

Type Compatibility for classes (3)

public class A {

public float x;

public char c;

}

public class B {

public float x;

public char c;

}

public class C {

public float z;

public char c;

}

public static void main(...) {

A a = new A();

B b;

C c;

a.x = 0.5;

a.c = 'a';

b = a; // Error

c = a; // Error

Type Conversion and Polymorphism

int max( int a, int b) { if ( a > b ) return a; else return b; }float max( float a, float b){ if ( a > b ) return a; else return b; }

int main( ) {int m, n;float x, y, z;x = 5.5;m = x; // OK to convert float to intz = max(x, y); // EASY! call max(float,float)n = max(m, x); // which max function?y = max(x, m); // which max function?

Explicit Polymorphism in C++

/* This template generates "max" functions of * any parameter type that the program needs. */template <typename T>T max( T a, T b ) { if ( a > b ) return a; else return b; }

int main( ) {int m = 4, n = 9;float x = 0.5, y = 2.7;

n = max(m, m); // generate max(int, int)y = max(x, y); // generate max(float, float)