Basic data types and their representation

Basic data types and their representation

CS101 2012.1

CS101 2012.1

Announcements If biometric ID does not work, write your roll number

and sign your name on a piece of paper All lab batches should be stable now Lab this week will continue with familiarization and

small programs No tutorials this week either But we will begin posting homework on Moodle Will give you an impression of exam questions Will not be graded Will be discussed in tutorials

CS101 2012.1

Layers of abstraction Three layers Implement fixed size

primitive types by mapping possible/supported values to bit patterns

Add collection types on top of primitive types to assist writing complicated programs

Collections usually change sizes and memory layout during program execution

Memory as arrayof bytes

Primitive data types: character, integer,

float, double

Collection types: arrays, matrices, lists,

maps, strings

CS101 2012.1

Memory, values, variables Unit of storage: bit (0/1) Because such computers are easier to

implement by switching transistors off and on A byte is 8 bits wide

• Values range from 00000000 to 11111111• 28 = 256 possible bit configurations• Can be interpreted as integers from 0 to 255

(“unsigned char”)• Electronic and magnetic memory is allocated in

units of bytes

CS101 2012.1

Binary arithmetic Byte value in binary: 00000000 (8 bits) Corresponding decimal value = 0 Written as 0dec to avoid confusion

In decimal, to increment a number, increment the unit position, unless there is overflow, in which case carry over… etc.

Same in binary Next few values are 00000001 (=1dec),

00000010 (=2dec), 00000011 (=3dec), 00000100 (=4dec), 00000101 (=5dec) etc.

CS101 2012.1

Character (char) To a first approximation, a character is the

same as a 8-bit byte• (More recently, multi-byte characters have been

designed to support all the world’s languages)

The key difference is in how the byte is interpreted and processed (e.g., printed)

E.g., 97 means ‘a’, 98=‘b’, 65=‘A’, 66=‘B’ etc. C++ lets you compare characters using the

corresponding integer Useful for sorting strings in dictionary order

CS101 2012.1

Hexadecimal notation (hex) Byte (8 bits) consists of two “nibbles” (4 bits) Nibble ranges between 0 and 15 Expressed in hexadecimal, 0 to 9, a to f

• a=10, b=11, c=12, d=13, e=14, f=15

So a byte is written as two hexadecimal digits, e.g. 0a or c5

Note that 23 hex is not 23 decimal! To make clear, written as 0x23 printf demo

CS101 2012.1

Fixed size integer types “Short integers” (short) are 16 bits wide

• 65536 possible values

Standard integers (int) are 32 bits wide• 4,294,967,296 possible values• Adequate for most purposes except

governments bailing out banks and airlines

A long long int is 64 bits wide• Will sometimes call long for brevity (as in Java)

Real numbers are represented using float and double (“double precision”) … later

CS101 2012.1

Two’s complement representation Want to represent both positive and negative

integers with a bit sequence (say 4 bits) Trivial: use one bit for sign

• Waste one configuration (plus and minus zero)

0000 (0) through 0111 (7) are positive 8 more values, so assign to 8 through 1

Binary Decimal Binary Decimal

1000 8 1100 41001 7 1101 31010 6 1110 21011 5 1111 1

CS101 2012.1

The wrap-around

Zero is one position to the right of center

-1=1…1 0=0…0 Max=01…1Min=10…0

CS101 2012.1

Two’s complement, continued One sudden “wrap-around” from 7 to 8 Works exactly the same for short, int and long int, with corresponding wrap, max, min values

Most programming systems will not detect if the wrap happens

If your program uses values near the edges, be careful in doing arithmetic and check the result!

Library packages exist to support arbitrarily large integers, not as efficient as fixed length

CS101 2012.1

Real number representations “Floating (decimal) point” In decimal we write 0.3141011

0.314 is the mantissa, 11 is the exponent Mantissa has decimal point at beginning Same approach in computers, with radix 2

instead of 10 In a float

• 1 sign, 8 exponent, 23 mantissa bits

In a double• 1 sign, 11 exponent, 52 mantissa bits

CS101 2012.1

Floating point numbersCosts how many bits to store

Magnitude of maximum value

Magnitude of minimum value

float 32 3.41038 1.410-45

double 64 1.79810308 4.910-324

Finite bits cannot represent all real values Gaps between numbers that can be represented Need care in writing expressions that combine

values to avoid errors, minimize loss of precision

CS101 2012.1

Some finite precision pitfalls Some 32- and 64-bit patterns have been set

aside to represent• Positive and negative infinity• Not a number or NaN (e.g. result of 0/0)

Most systems will detect overflow but not underflow

float a = 3.3e38 / 0.01; correctly results in a being “inf”

But 3.3e38 + 5 silently equals 3.3e38 (not enough bits in mantissa)

CS101 2012.1

Operations on numeric types All integers support +, , *, /, % (remainder) Even characters support + and

• E.g., ‘a’ + 1 = ‘b’; what is ‘Z’+1? (Try it) Float and double support +, , *, / More complicated operations like log, exp,

sine, etc. are implemented as functions You can compare numbers using

comparison operators <, <=, ==, >=, !=• The result is a Boolean (0/1) value (next)• cout << (5 > 7);• cout << (4 != 3);

CS101 2012.1

Boolean values and operations In C++, int can be

reused as Boolean (0 = false, anything else is true)

Binary operator && (and)

Binary operator || (or) Short-circuit evaluation

x y x || y

0 0 0

0 1 1

1 0 1

1 1 1

x y x&& y

0 0 0

0 1 0

1 0 0

1 1 1

CS101 2012.1

Not and ex(clusive) or Unary operator ! (not) Binary operator exor is

not available on single Booleans but instead on bit vectors (next)

Input x Output !x

0 1

1 0

x y x ^ y

0 0 0

0 1 1

1 0 1

1 1 0

CS101 2012.1

The bool type Old C++ used int to store Boolean values But ANSI standard C++ does offer a type

called bool bool tval = true, fval = false; int ival = int(tval); However, old bad habits still allowed

• if (37) { … }• bool bval = 37;

Overall value unclear

CS101 2012.1

Bit array manipulation Fixed size integers are arrays of bits C++ lets you do bitwise Boolean algebra a & b (and), a | b (or), a^b (exor), ~b (not)

1011011010010101

10010100&

1011011010010101

00100011^

1011011010010101

10110111|

00100011

11011100~

CS101 2012.1

Bit shift operations int c = 5; cout << (c << 2);

Bits lost from the left (msb) Zero bits inserted from the right (lsb) Result is 20 (= 5 22) Cheap way to multiply by powers of two

00000000,00000000,00000000,00010100

00000000,00000000,00000000,00000101

CS101 2012.1

Right shift c >> 2 Bits discarded to the right (lsb) If msb of c was 0, then 0 bits injected from

left (msb)• 5 >> 2 gives 1

If msb of c was 1 (c was negative) then 1 bits injected from left• -5 >> 2 gives -2 (work it out)• 0xfffffffb >> 2 gives 0xfffffffe

Preserves sign of number

CS101 2012.1

Some applications of bit operations Is an int x odd or even?

• int isOdd = (x & 1); Remainder when divided by 8

• int remain = (x & 7);• Faster than x % 8

How many one bits in a 32-bit int? Repeat 32 times:

• numOnes = numOnes + (x & 0x8000000);• x = x << 1;

In binary this looks like a one followed by 31 zeros

CS101 2012.1

Primitive variable declaration and literals float fahrenheit;

• Uninitialized, may get garbage on read

float fahrenheit = 95; const float fahrenheit = 9.52e14;

• Value will never change• Scientific notation saves typing lots of zeros

int x = 3, y = x/2;• Can initialize variables based on others already

initialized

CS101 2012.1

Why bother to declare Variable names

• What if you type it incorrectly later?• To initialize before any use

Types• To check all assignments to the variable• To interpret a bit sequence as intended in your

program (e.g. float and int are both 32 bits)

There are languages that do not enforce variable name and type declarations• Can be lazy, but generally a Bad Idea

CS101 2012.1

Type conversions Some conversions are implicit

• short x = 20000; int y = x;• int x = 40000; short y = x;

Others may result in overflow• double x = 5e40; float y = 2*x;

Some are errors• float x = (float) “hello world”;

Implicit typing• float x = 7/3;• float x = 7/3.;

CS101 2012.1

Polymorphic operators and literals 7/3 vs 7/3. / represents division for int, float, double Which one is invoked depends on the

(inferred) type of arguments

`7’ ‘3’

toInt toInt

intDiv

toFloat

`7’ ‘3.’

toInt toFloat

floatDiv

intToFloat

CS101 2012.1

The string data type When we saidcout << “Hello world\n”“Hello world\n” was stored as an array of characters

Byte corresponding to H, e, …, \n, and finally a “null byte” or 00000000 (in binary) to mark the end of the string

A more modern and better way is to use the string data type

string message(“Hello world”);

CS101 2012.1

Calling a method on a string object

Common string operations Get the number of characters in the string

• message.size() Get the character at a specific position

• message.at(5) or message[5] Get a substring of the given string

• message.substr(1, 3) Index out of bound?

• Some operations throw exceptions• Some silently truncate• Some may return garbage

CS101 2012.1

More string operations Find the first (leftmost) or last (rightmost)

occurrence of a character• message.find_first_of(‘o’)• message.find_last_of(‘e’)

Compare two strings (dictionary or lexicographic order)• msg1.compare(msg2)• Returns an integer

• Negative if msg1 should appear before msg2• Zero if msg1 and msg2 are equal• Positive if msg1 should appear after msg2

Documents

Basic data types and their representation