Lecture 13 - Review

Preview:

DESCRIPTION

Lecture 13 - Review. Review. L ecture 1 - Address Map - Global vs Local. Pointer. int var (var is a variable and occupies 4 bytes ) int *var (*var is a pointer that points to an integer). Example. Example 1 *. Pointer to an integer. int var = 108; int *varpointer; - PowerPoint PPT Presentation

Citation preview

1

Lecture 13 - ReviewLecture 13 - Review

22

Review Review

33

LLecture 1 - Address Map - Global vs ecture 1 - Address Map - Global vs LocalLocal

44

Pointer Pointer

int var (var is a variable and occupies 4 bytes )

int *var (*var is a pointer that points to an integer)

55

ExampleExample

66

Example 1 *Example 1 *

int var = 108;

int *varpointer;

&varpt = var; (*varpt = 123)

Pointer to an integer

77

Example 2Example 2Point to location

0x0012FF7C Value is

0x61 = ‘a’

88

Array and Pointer * Array and Pointer *

char a[10]=“1234567890”;a[0] =“1”;a[1] =“2”;char *ptr;ptr = &a; (or just a in C);*ptr -> 1 (a[0])*(ptr + 1) ->2 (a[1])

array[i] * (array + i) &array[i] array + i array[i + j] * (array + i + j) &array[i + j] array + i + j

Same result,

different expression

99

Multi-Dimensional Arrays – pointer of Multi-Dimensional Arrays – pointer of pointerpointer

array2[7][10] or **array2

1010

Naughty PointersNaughty Pointers

The value pointed by pointer is modified.

It will destroy the program and is not recommended.

However, if you can master pointer, you can write a very elegant program.

1111

SummarySummary

Integer : 4 bytes such as int a = 3;

0x0065FDF1:03

0x0065FDF2:00

0x0065FDF3:00

0x0065FDF4:00

You have to rotate the data 0x00000003

1212

SummarySummary

Short: two bytes

Short a = 3;

0x0065FDF3: 03

0x0065FDF4: 00

Short b = 4;

0x0065FDF0: 04

0x0065FDF1: 00

Not used

As short uses 2 bytes, remaining two bytes in memory 0x0065FDF2

(0xCC) and 0x0065FDF1 (0xCC) are not used

1313

Lecture 2 Lecture 2

1414

Example – abc Example – abc program nameprogram name

#include <stdio.h> int first; int second;

void callee ( int first ) { int second;

second = 1; first = 2; printf("callee: first = %d second = %d\n", first, second); }

int main (int argc, char *argv[]) { first = 1; second = 2; callee(first); printf("caller: first = %d second = %d\n", first, second); return 0; }

DOS>abc 12 34

Here, argc = 3,argv[0] = abcargv[1]= 12argv[2] =34

Same variable “second”, but different memory

location

1515

Example (passed by pointer) *Example (passed by pointer) *

void callee ( int * first ) //not a variable, but an address { int second; second = 1; *first = 2; printf("callee: first = %d second = %d\n", *first, second); } int main (int argc, char *argv[]) { first = 1; second = 2; callee(&first); //passed by address --- printf("caller: first = %d second = %d\n", first, second); return 0; }

Content by address

1616

Diagram - stack push (create)Diagram - stack push (create)

1717

Diagram - stack pop (return)Diagram - stack pop (return)

1818

The CPU also Has MemoryThe CPU also Has Memory

The CPU also maintains its own banks of memory called registers.

They temporarily hold the data

As a result, the program is faster.register

Cache memory

Main memory

Disk

Memoryhierarch

y

1919

Lecture 3 - attentionLecture 3 - attention

2020

Bit OperationsBit Operations

AND &

OR |

ONE'S COMPLEMENT ~

EXCLUSIVE OR ^

SHIFT (right) >>

SHIFT (left) <<

2121

Operation - examplesOperation - examples

AND 1 & 1 = 1; 1& 0 = 0

OR 1 |1 = 1; 1| 0 = 1; 0|0 = 0

~ 0 =~1; 1 =~0;

^ 0^ 0 = 0; 1^1 = 0; 1^0 =1; 0^1 = 1

>> 0x010 = 0x001 <<1

<< 0x001 = 0x010 >>1

2222

One’s complementOne’s complement

1111 0010 (0xf2)

-------------- ~0000 1101 (0x0d)

char c = 0xf2;char e = ~c; //e is 0x0d

2323

EXCLUSIVE OREXCLUSIVE OR

1111 0010 (0xf2)1111 1110 (0xfe)-------------- (^) 0000 1100 (0x0c)

char c = 0xf2;char d = 0xfe;char e = c ^ d; //e is 0x0c

2424

SHIFT >> (right) by one bitSHIFT >> (right) by one bit

1111 0010 (0xf2)>> 1 (shift right by one bit)---------------------

0111 10001 (0x79)

char c = 0xf2;char e = c >>1; //e is 0x79

2525

SHIFT << (left) by one bitSHIFT << (left) by one bit

1111 0010 (0xf2)<< 1 (shift right by one bit)---------------------

1110 0100 (0xe4)

char c = 0xf2;char e = c <<1; //e is 0xe4

2626

SHIFT << by two bitsSHIFT << by two bits

1111 0010 (0xf2)>> 2 (shift right by one bit)---------------------

1100 1000 (0xc8)

char c = 0xf2;char e = c <<2; //e is 0xc8

2727

Lecture 4Lecture 4

2828

ExpressionExpression

1 bit sign bit, 8 bit exponent, and 23 bit Mantissa (total 32 bits)

-1^Sign * 2^(Exponent - 127) * (1 + Mantissa * 2^-23)

Zero, sign bit is 0, Negative, sign bit is 1

Exponent is unsigned, minus 127. That is if the value is 128, it means 128 – 127 = 1, if the value is 256, it means 256 – 127 = 128, or the value is zero, it means 0 – 127 = -127.

2929

Example Example

3030

ExampleExample

2.5 (floating point)

0100 0000 0010 0000 0000 0000 0000 0000

Sign: positive (1)

Exponent : 1000 0000 : 128 (128 – 127 = 1)

Mantissa: 1. 010 0000 0000 0000 0000 0000, 1.25

Result 1 x 1.25 x 2^1 = 2.5

3131

StringString

Is an array of character and is terminated by a null character (0x00)

char a[4] = “Hi?”;

a[0] = H;

a[1] = I;

a[2] =?;

a[3] = 0x00

Incorrect declaration: char char[3] = “Hi?”,

as 0x00 is missing

3232

An exampleAn example

struct {

char a, b, c, cc;

int i;

double d;

} mystruct;

Name is mystruct

3333

Lecture 5Lecture 5

3434

Static AllocationStatic Allocation

The word static (fix) refers to things that happen at compile time (compile) and link (link) time when the program is constructed.For example, you can define

char a[9] =“12345678”; //assign 9 bytes for array a

The compiler will assign 9 bytes during compilationLinker will assign the correct address for array aYou cannot change it even you think you need 10 bytes while running this program

3535

An exampleAn example

int my_var[128]; // a statically allocated variable static bool my_var_initialized = false; //static

declaration int my_fn(int x) { if (my_var_initialized) return; my_var_initialized = true; for (int i = 0; i < 128; i++) my_var[i] = 0; }

Initially, it

is false

3636

Dynamic allocationDynamic allocation

Limitations of Static AllocationIf two procedures use a local variable named i, there will be a conflict if both i's are globally visible. If i is only declared once, then i will be shared by the two procedures. One might call the other, even indirectly, and cause i to be overwritten unexpectedly. It would be better if each procedure could have its own copy of i.

3737

Grab memoryGrab memory

To grab memory, we have to use malloc(size). For exampleptr = malloc(4) will return a pointer with memory size of 4 bytesptr = malloc(4*int) will return a pointer with 16 bytes = 4 x 4 (integer) = 16 bytesmalloc(4*long) will return a pointer with 16 bytes = 4 x 4 (long) = 16 bytesfree(ptr), free this pointer to the memory

3838

Fragmentation – holes Fragmentation – holes

Although it has memory

3939

Example of First fitExample of First fit

4040

Example of Best fitExample of Best fit

4141

Example of Worst fitExample of Worst fit

4242

Lecture 6Lecture 6

4343

Block sizes – the size to hold the data for users’ usageBlock sizes – the size to hold the data for users’ usage

The standard method for determining the size of a block, given a pointer to the block, is to store its size in the word before the pointer.

Here, the memory block that can be used is 16 bytes, the block size is 20 bytes including 4 bytes for the size

Only 16

bytes

4444

Determine the sizeDetermine the size

Note that it uses [-1] to point to location before the pointer (location that contains block size)As the size is a multiple of 4, it clears the lowest two bits (3 =0000 0000 0000 0011 (hex), ~3 = 1111 1111 1111 1100Free means the block can be used by user (binary 1)

size = ((int *) ptr)[-1]; // read integer before the memory block

correct_size = size & ~3; // clear the lower 2 bits

free = size & 1; // get low-order bit

4545

Splitting a Free BlockSplitting a Free Block

The heap is normally initialized to look like one giant free block. (40 bytes)

When allocations occur, it would be wasteful to return a large block of free memory when a small one would do just as well. (I need 8 bytes, no point to return 40 bytes)

Therefore, the memory allocator will typically split a block if the block size is larger than the requested size. (12 allocated and 28 free)

4646

Common bug in scanf Common bug in scanf

Note that you should supply the address rather than the variable

Use &i; instead of i

It is important in your exam.

int i;

double d;

scanf("%d %g", i, d); // wrong!!!

// here is the correct call:

scanf("%d %g", &i, &d);

4747

Overwriting MemoryOverwriting Memory

Here, i will be incremented from 0 to array_size, not array_size – 1;The solution is

; i < array_size; not; i <= array_size;

#define array_size 100

int *a = (int *) malloc(sizeof(int *) * array_size);

for (int i = 0; i <= array_size; i++)

a[i] = NULL;

4848

Memory bugMemory bug

Here, the memory allocated is 100 bytes, not 400 bytes and a[] is defined as array pointer

The solution is:

int *a = (int *) malloc( array_size* sizeof(int));

#define array_size 100

int *a = (int *) malloc(array_size);

a[99] = 0; // this overwrites memory beyond the block

4949

String must be terminated by 0x00String must be terminated by 0x00

String must be terminated by 0x00;

The solution is:

char *new_s = (char *) malloc(len + 1);

char *heapify_string(char *s)

{ int len = strlen(s);

char *new_s = (char *) malloc(len);

strcpy(new_s, s);

return new_s;

}

By 0x00

5050

Memory leaksMemory leaks

The memory that is on longer used is not returned to the memory pool.

The result is that the system will run out of memory.

The failure to deallocate (free) a block of memory when it is no longer needed is often called a memory leak

Do not return the memory block to

the pool

5151

Lecture 7Lecture 7

5252

ProcedureProcedure

A slow but correctProgram

Modify the programTo make it faster

5353

What to Measure (Wall clock)What to Measure (Wall clock)

An alternative is to measure real time or "wall clock time“This is the time an ordinary clock on the wall or a wrist watch shows.

The difference between CPU time and wall time can give some indication of the time spent waiting for I/O.

Wall time

CPU time

I/O time

5454

Principles - PerformancePrinciples - Performance

The 80/20 Rule – It means 80% of the CPU time is spent in 20% of the program.

In this case, you can have better performance by looking at this 20%.

Amdahl's Law – for parallel processing, the performance is limited by sequential part of the program.

5555

Example of 80/20: Example of 80/20: 10% on one means 2% as a whole10% on one means 2% as a whole

A module consists of 5 modules

20 ms

20 ms

20 ms

20 ms

20 ms

20 ms

18 ms

20 ms

20 ms

20 ms

5656

Example of 80/20: Example of 80/20: 10% on one means 5% as a whole10% on one means 5% as a whole

A module consists of 5 modules

10 ms

50 ms

10 ms

10 ms

10 ms

10 ms

45 ms

10 ms

10 ms

10 ms

Conclusion: focus on

module with more CPU time

5757

Lecture 8Lecture 8

5858

Coding for Speed Coding for Speed http://http://www.abarnett.demon.co.uk/tutorial.htmlwww.abarnett.demon.co.uk/tutorial.html mainly from this web mainly from this web

sitesite

Array Indices Aliases Registers Integers Loop Jamming Dynamic Loop Unrolling Faster for() loops Switch Pointers Early loop breaking Misc Using array indices

There are many ways to speed up

the operation.

5959

Aliases (1)Aliases (1)

void func1( int *data ) {     int i; for(i=0; i<10; i++)     {           

somefunc2( *data, i);   } }

Not very good

6060

Aliases – better change to this Aliases – better change to this

void func1( int *data ){    

int i;     int localdata;     localdata = *data;     for(i=0; i<10; i++)     {           

somefunc2( localdata, i);     }

}

Better way

6161

Loop JammingLoop Jamming

Never use two loops where one will suffice: for(i=0; i<100; i++) {    

stuff(); } for(i=0; i<100; i++) {    

morestuff(); }

Better combine

them

6262

Early loop breakingEarly loop breaking

This loop searches a list of 10000 numbers to see if there is a -99 in it. found = FALSE; for(i=0;i<10000;i++) {     if( list[i] == -99 )     {         found = TRUE;     } } if( found ) printf("Yes, there is a -99. Hooray!\n"); This works well but searches the whole list.

6363

Early loop breakingEarly loop breaking

A better way is to abort the search when it is found.

found = FALSE; for(i=0; i<10000; i++) {     if( list[i] == -99 )     {         found = TRUE;         break;     } } if( found ) printf("Yes, there is a -99. Hooray!\n");

6464

Lecture 9Lecture 9

6565

Memory and CPUMemory and CPU

Program here

Cache and register

here

6666

Memory hierarchiesMemory hierarchies

Within CPU

6767

common memory technologiescommon memory technologies

Static Random Access Memory (SRAM) Dynamic Random Access Memory (DRAM)

Magnetic disks Magnetic tapes Optical disks

6868

SpeedSpeed

6969

Size and CostSize and Cost

7070

Principle of LocalityPrinciple of Locality

references to a single address occur close together in time

like int i, j; (like i and j)

(this is called temporal locality).

references to addresses that are near to each other occur together in time

Like it calls i and then j later

(this is called spatial locality).

7171

Principle of localityPrinciple of locality

The principle of locality of reference is not an assurance, but rather a conjecture. (means GUESS)

Empirically, however, there is little doubt that programs behave according to this principle.

Think about it: If you need to

use the same variable i later, it is better to keep this in the cache. Not to release to the memory.

7272

Graph showing CPU, DRAM & Graph showing CPU, DRAM & SRAMSRAM

7373

Four-level hierarchyFour-level hierarchy

7474

Lecture 10Lecture 10

7575

ExampleExample

/* Assumes n is a power of two */ void merge_sort (int * data, int n) {

int half = n >> 1; if (n == 1) return; binary_sort(data, half); binary_sort(data + half, half); merge(data, data + half, half); }

// no need to memorise

7676

Graph of Merge SortGraph of Merge Sort

the access times in nanoseconds (ns) for the L1 cache (T1), L2 cache (T2), L3 cache (T3), and main memory (Tm).

7777

Looking at the CachesLooking at the Caches

We can deduce many things about the cache design of a particular computer by carefully examining its memory performance.

We can design a benchmark program whose locality we control.

int data[MAXSIZE]; for (i = 0; i < repeat; i++) { for (i = 0; i < N; i++) { dummy = data[i]; } }

7878

Control the spatial localityControl the spatial locality

Here, stride controls the amount of spatial locality

int data[MAXSIZE]; for (i = 0; i < repeat; i++) { for (i = 0; i < N; i += stride) { dummy = data[i]; } }

7979

Graph showing the effectGraph showing the effect

8080

Example of a matrixExample of a matrix

int data[M][N];

for (i = 0 ; i < N; i++) {

for (j = 0; j < M; j++) {

sum += data[j][i];

}

}

8181

Changing the order of the iterations is not always better. Below is Changing the order of the iterations is not always better. Below is an example.an example.

int original[M][N];

int transposed[N][M];

for (i = 0; i < M; i++) {

for (j = 0; j < N; j++) {

transposed[i][j] = original[j][i];

}

}

8282

Insufficient Temporal LocalityInsufficient Temporal Locality

int original[M][N]; int transposed[N][M];

for (k = 0; k < M / m; k++) { for (l = 0; l < N / n; k++) { for (i = k*m; i < (k+1)*m; i++) { for (j = l*n; j < (l+1)*n; j++) { transposed[i][j] = original[j][i]; } } } }

8383

Lecture 11Lecture 11

8484

Example of a matrixExample of a matrix

int data[M][N];

for (i = 0 ; i < N; i++) {

for (j = 0; j < M; j++) {

sum += data[j][i];

}

}

This is a

MxN matrix

8585

Row-major and Column-majorRow-major and Column-major

Row major – sequence of access

data

Column major

8686

Accessing a column-majorAccessing a column-major

8787

Accessing row dataAccessing row data

It will be faster, as it accesses [0,0], [0,1][0,2] which will be loaded into cache line after reading [00] up to [13], as the data is already in memory in this sequence

Row major is faster than column major

8888

Segment address translationSegment address translation

DiskMemory

8989

PagingPaging

the allocation of memory into chunks of varying size causes external fragmentation.

To solve this problem we can change the nature of the address translation so that, instead of mapping virtual to physical address in big chunks of varying size, it maps them in small chunks of constant size,

9090

PagingPaging

9191

Impact of VM on PerformanceImpact of VM on Performance

int data[M][N]; for (i = 0 ; i < N; i++){ for (j = 0; j < M; j++){ sum += data[j][i]; } } //column major – more page fault

9292

Impact of VM on PerformanceImpact of VM on Performance

int data[M][N]; for (j = 0 ; j < N; j++){ for (i = 0; i < M; i++){ sum += data[j][i]; } } //row major – less page fault

9393

Example of Context SwitchingExample of Context Switching

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

9494

Process stateProcess state

Here, there are three states for one process. Running means it uses the CPU, ready means it is ready to use the CPU,while suspended means it is waiting for an I/O.

9595

Non-Preemptive processNon-Preemptive process

Must finish before CPU can switch to others, say you have three processes, P1, P2, P3

9696

Preemptive processPreemptive process

CPU can switch without finishing the process

97

Lecture 12 Lecture 12

Network Programming

9898

ReviewReview

Client Server Programming Model

Networks

Global IP Internet

Socket Interface

Web Servers

9999

Client Server Programming ModelClient Server Programming Model

1. When a client needs service, it initiates a transaction2. The server receives the request, interprets and

manipulates3. The server sends a response to the client and waits for the

next request4. The client receives the response and manipulates it.

resourceserver

processClient

process

100100

Hardware and Software OrganisationsHardware and Software Organisations

Client

TCP/IP

Network Adaptor

Client

TCP/IP

Network Adaptor

101101

Internet Domain NamesInternet Domain Names

U nnam ed ro o t

m il ed u go v c o m

c ityu c uhk

D C O

/* DNS entry structure */Struct hostnet {char *h_name; /* official domain name of host */char **h_aliases /* null-terminated array of domain name */

int h_addrtype; /* host address */Int h_length; /* length of an address in bytes */char **h_addr_list; /*null terminated array of in_addr structs */

};

102102

Internet ConnectionInternet Connection

Internet Clients and severs communicate by sending and receiving streams of bytes over connection.

A connection is point-to-point.

A connection is full duplex in the sense that data can flow in both directions.

A socket is an end point connection.

103103

Socket ConnectionSocket Connection

Client Server

Each socket has a corresponding socket address that consists of IP address and 16-bit integer port. It is denoted by address: port (such as address:port 121.2.3.4:12345)

104104

Socket InterfaceSocket Interface

Socket

Connect

Rio_written

Rio_readlineb

close

Socket

accept

Rio_readlineb

Rio_written

close

listen

bind

Rio_readlineb

105105

Socket FunctionSocket Function

/* listen function

#include <sys/socket.h>

int listen (int sockfd, int backlog); /* return -1 on Unix error, -2 on DNS error */

/* accept function */

#include <sys/socket.h>

int accept (int listenfd, struct sockaddr, *addr, int *addrlen); /* return -1 on Unix error, -2 on DNS error */

106106

Role of The listening and Connected DescriptorsRole of The listening and Connected Descriptors

Client Server

Client Server

Client Server

107107

Last PageLast Page

Recommended