View
215
Download
0
Category
Preview:
Citation preview
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
1/72
Mid3 Revision , VM andInstruction Set Architecture
Prof. Sin-Min Lee
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
2/72
Classification of DigitalCircuits
Combinational.
Output depends only on current input values.
Sequential.
Output depends on current input values andpresent state of the circuit, where the presentstate of the circuit is the current value of the
devices memory.Also called finite state machines.
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
3/72
Characteristic tables
The tables that weve
made so far are calledcharacteristic tables.
They show the nextstate Q(t+1) in terms ofthe current state Q(t)
and the inputs. For simplicity, the
control input C is notusually listed.
Again, these tables
D Q(t+1) Operation
0 0 Reset1 1 Set
T Q(t+1) Operation
0 Q(t) No change
1 Q(t) Complement
J K Q(t+1) Operation
0 0 Q(t) No change0 1 0 Reset
1 0 1 Set
1 1 Q(t) Complement
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
4/72
Characteristic equations We can also write characteristic equations,
where the next state Q(t+1) is defined interms of the current state Q(t) and inputs.
D Q(t+1) Operation
0 0 Reset
1 1 Set
T Q(t+1) Operation
0 Q(t) No change
1 Q(t) Complement
J K Q(t+1) Operation
0 0 Q(t) No change
0 1 0 Reset
1 0 1 Set
1 1 Q(t) Complement
Q(t+1) = D
Q(t+1) = KQ(t) + JQ(t)
Q(t+1) = TQ(t) + TQ(t)= T Q(t)
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
5/72
Memory Allocation
Compile for overlays Compile for fixed Partitions
Separate queue per partition Single queue
Relocation and variable partitions Dynamic contiguous allocation (bit maps versus linked
lists)
Fragmentation issues Swapping Paging
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
6/72
Overlays
Overlay Manager
Overlay Area
Main Program
Overlay 1
Overlay 2
Overlay 3
Secondary Storage
Overlay 1
Overlay 2
Overlay 3
Overlay 1
0K
5k
7k
12k
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
7/72
Multiprogramming with FixedPartitions
Divide memory into n
(possible unequal)partitions.
Problem:
Fragmentation
Free Space
0k
4k
16k
64k
128k
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
8/72
Fixed Partitions
LegendFree Space0k
4k
16k
64k
128k
Internalfragmentation
(cannot bereallocated)
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
9/72
Fixed Partition Allocation
Implementation Issues
Separate input queue for each partition
Requires sorting the incoming jobs and putting them into
separate queues Inefficient utilization of memory
when the queue for a large partition is empty but the queue for asmall partition is full. Small jobs have to wait to get into memoryeven though plenty of memory is free.
One single input queue for all partitions.
Allocate a partition where the job fits in. Best Fit
Worst Fit
First Fit
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
10/72
Relocation
Correct starting address when a program starts in memory
Different jobs will run at different addresses When a program is linked, the linker must know at what address the
program will begin in memory. Logical addresses, Virtual addresses
Logical address space , range (0 to max)
Physical addresses, Physical address space
range (R+0 to R+max) for base value R. User program never sees the real physical addresses
Memory-management unit (MMU) map virtual to physical addresses.
Relocation register Mapping requires hardware (MMU) with the base register
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
11/72
Relocation Register
Memory
Base Register
CPU
Instruction
Address
+
BA
MA MA+BA
Physical
AddressLogical
Address
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
12/72
Storage Placement Strategies
Best fit
Use the hole whose size is equal to the need, or if none is
equal, the whole that is larger but closest in size. Rationale?
First fit
Use the first available hole whose size is sufficient to meet
the need Rationale?
Worst fit
Use the largest available hole
Rationale?
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
13/72
Storage Placement Strategies
Every placement strategy has its ownproblem
Best fit Creates small holes that cant be used
Worst Fit
Gets rid of large holes making it difficult to run largeprograms
First Fit
Creates average size holes
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
14/72
Locality of Reference
Most memory references confined to smallregion
Well-written program in small loop, procedure
or function Data likely in array and variables stored
together
Working set Number of pages sufficient to run program normally,
i.e., satisfy locality of a particular program
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
15/72
Page Replacement Algorithms
Page fault - page is not in memory and must beloaded from disk
Algorithms to manage swapping
First-In, First-Out FIFOBeladys Anomaly Least Recently Used LRU
Least Frequently Used LFU
Not Used Recently NUR Referenced bit, Modified (dirty) bit
Second Chance Replacement algorithms
Thrashing too many page faults affect system performance
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
16/72
Virtual Memory Tradeoffs
Disadvantages
SWAP file takes up space on disk
Paging takes up resources of the CPU
Advantages
Programs share memory space
More programs run at the same time
Programs run even if they cannot fit into memoryall at once
Process separation
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
17/72
Virtual Memory vs. Caching
Cache speeds up memory access
Virtual memory increases amount ofperceived storage
Independence from the configuration andcapacity of the memory system
Low cost per bit compared to main memory
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
18/72
How Bad Is Fragmentation?
Statistical arguments - Random sizes
First-fit
Given N allocated blocks
0.5N blocks will be lost because offragmentation
Known as 50% RULE
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
19/72
Solve Fragmentation w.Compaction
Monitor Job 3 FreeJob 5 Job 6Job 7 Job 85
Monitor Job 3 FreeJob 5 Job 6Job 7 Job 86
Monitor Job 3 FreeJob 5 Job 6Job 7 Job 87
Monitor Job 3 FreeJob 5 Job 6Job 7 Job 88
Monitor Job 3 FreeJob 5 Job 6Job 7 Job 89
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
20/72
Storage Management Problems
Fixed partitions suffer from
internal fragmentation Variable partitions suffer from
external fragmentation
Compaction suffers from overhead
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
21/72
Placement Policy
Determines where in real memory aprocess piece is to reside
Important in a segmentation system
Paging or combined paging withsegmentation hardware performs addresstranslation
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
22/72
Replacement Policy
Placement Policy
Which page is replaced?
Page removed should be the page least likely
to be referenced in the near future
Most policies predict the future behavior onthe basis of past behavior
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
23/72
Replacement Policy
Frame Locking
If frame is locked, it may not be replaced
Kernel of the operating system
Control structures
I/O buffers
Associate a lock bit with each frame
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
24/72
Basic Replacement Algorithms
Optimal policy
Selects for replacement that page for whichthe time to the next reference is the longest
Impossible to have perfect knowledge offuture events
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
25/72
Basic Replacement Algorithms
Least Recently Used (LRU)
Replaces the page that has not beenreferenced for the longest time
By the principle of locality, this should be thepage least likely to be referenced in the nearfuture
Each page could be tagged with the time oflast reference. This would require a greatdeal of overhead.
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
26/72
Basic Replacement Algorithms
First-in, first-out (FIFO)
Treats page frames allocated to a process asa circular buffer
Pages are removed in round-robin style
Simplest replacement policy to implement
Page that has been in memory the longest is
replaced These pages may be needed again very soon
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
27/72
Basic Replacement Algorithms
Clock Policy
Additional bit called a use bit
When a page is first loaded in memory, the use bit is
set to 1 When the page is referenced, the use bit is set to 1
When it is time to replace a page, the first frameencountered with the use bit set to 0 is replaced.
During the search for replacement, each use bit set to1 is changed to 0
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
28/72
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
29/72
FIFO Replacement Policy
2 1 3 4 2 5 5 1 1 3 3 4 5 5 6
2 1 3 4 2 2 5 5 1 1 3 4 4 5
2 1 3 4 4 2 2 5 5 1 3 3 4
Hit
Hit ratio: 4 / 15
String:
2 1 3 4 2 5 4 1 2 3 1 4 5 4 6
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
30/72
LRU Replacement Policy
2 1 3 4 2 5 4 1 2 3 1 4 5 4 6
2 1 3 4 2 5 4 1 2 3 1 4 5 4
2 1 3 4 2 5 4 1 2 3 1 1 5
Hit
Hit ratio: 3 / 15
String:
2 1 3 4 2 5 4 1 2 3 1 4 5 4 6
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
31/72
Optimal Replacement Policy
2 1 3 4 4 4 4 4 4 4 4 4 4 4 6
2 1 1 1 1 1 1 1 1 1 1 5 5 4
2 2 2 5 5 5 2 3 3 3 3 3 5
String:
2 1 3 4 2 5 4 1 2 3 1 4 5 4 6
Hit
Hit ratio: 6 / 15
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
32/72
Early memory managementschemes
Originally used to devote computer tosingle user:
User has all of memory
0 65535
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
33/72
Limitations of single-usercontiguous scheme
Only one person using the machine--lotsof computer time going to waste (why?)
Largest job based on size of machinememory
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
34/72
Next: fixed partitions
Created chunks of memory for each job:
Job 1 Job 2 Job 3
0 65535
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
35/72
Limitations of fixed partitions
Operator had to correctly guess size ofprograms
Programs limited to partitions they weregiven
Memory fragmentation resulted
The kind illustrated here is called internalmemory fragmentation
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
36/72
Dynamic Partitions
1
3
4
2
1
6
5
7
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
37/72
Internal versus external memoryfragmentation:
Job 8
Space previously allocated by Job 1
Space currently allocated by Job 8
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
38/72
Dynamic Partitions
Contiguous memory is still required forprocesses
How do we decide size of the partitions?
Once the machine is going, how do oldjobs get replaced by new ones?
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
39/72
Dyanmic Partitions: First Fit
In this scheme, we search forward in thefree list for a partition large enough toaccommodate the next job
Fast, but the gaps left can be large
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
40/72
Dynamic Partitions: Best Fit
In this scheme, we try to find the smallestpartition large enough to hold the next job
This tends to minimize the size of the gaps
But it also requires that we keep list of freespaces
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
41/72
Deallocating memory
If the block we are deallocating is adjacentto one or two free blocks, then it needs tobe merged with them.
So either we are returning a pointer to thefree block, or we are changing the size ofa block, or both
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
42/72
Relocatable Dynamic Partitions
We can see that in some cases, a job canfit into the combined spaces within orbetween partitions of the early schemes
So how do we take advantage of thatspace?
One way is to move programs while they
are in the machine--compacting themdown into the lower end of memory abovethe operating system
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
43/72
Several names for this
Garbage collection
Defragmentation
Compaction
All share a problem: relative addressing!
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
44/72
Page Replacement Algorithms
Optimal page replacement simply notpossible
Keep referenced (R) and Modify (M) bits toallow us to keep track of past usageinstead
Page is referenced by any read or write in it
Page is modified by any change (write) madeto it
P R l t Al ith
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
45/72
Page Replacement Algorithms,Continued
FIFO = First in, first out
LRU = Least recently used
LFU = Least frequently used
both of the latter rely on apagerequestcall to the operating system
a failure to find a page =page interrupt
we might measure quality byfailure rate = page interrupts / page
requests
P R l t Al ith
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
46/72
Page Replacement Algorithms,Continued
Clock page replacement
Hand of the clock points to the oldest page
If a page fault occurs, check R bits in
clockwise order
A variant called the two-handed clock isused in some UNIX systems
FIFO l ti i t
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
47/72
FIFO solution is not morememory
Called Beladys anomaly
the page request orderis an importantfactor, not just the size of memory
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
48/72
LRU
Doesnt suffer from Beladys anomaly
Presumes locality of reference
Butwhile it works well, it is a little morecomplex to implement in software
Consequently, aging and various clockalgorithms are the most common in practice
Aging can yield a good approximation
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
49/72
Segmented Memory Allocation
Instead of equal divisions, try to breakcode into its natural modules
Compiler now asked to help operating
system
No page frames--different sizes required(meaning we get external fragmentation
again)
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
50/72
Segmented/Demand Paging
Subdivide the natural program segmentsinto equal sized parts to load into pageframes
eliminates external fragmentation
allows for large virtual memory, so it isoften used in more modern OSs
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
51/72
Tradeoffs
Note that there is a tradeoff betweenexternal fragmentation and page faults inpaging systems
Note also that we probably want slightlysmaller page frames in a Segmented-Demand Paging framework
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
52/72
Instruction Set Architectures
Part 1
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital Design
Circuit Design
Instruction SetArchitecture
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
53/72
Some ancient history
Earliest (1940s) computers were one-of-a-kind.
Early commercial computers (1950s), each newmodel had entirely different instruction set.
Programmed at machine code or assemblerlevel
1957 IBM introduced FORTRAN
Much easier to write programs.
Remarkably, code wasnt much slower than hand-written.
Possible to use a new machine withoutreprogramming.
-
Impact of High Level
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
54/72
Impact of High-LevelLanguages
Customers were delighted
Computer makers werent so happy
Needed to write new compilers (and OSs)
for each new model Written in assembly code
Portable compilers didnt exist
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
55/72
IBM 360 architecture
The first ISA used for multiple models
IBM invested $5 billion
6 models introduced in 1964
Performance varied by factor of 50
24-bit addresses (huge for 1964)
largest model only had 512 KB memory
Huge success!Architecture still in use today
Evolved to 370 (added virtual addressing) and 390(32 bit addresses).
Lets learn from our
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
56/72
Let s learn from oursuccesses ...
Early 70s, IBM took another big gamble
FS a new layer between ISA and high-level language
Put a lot of the OS function into hardware
Huge failure
Moral: Getting right abstraction is hard!
The Instruction Set
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
57/72
The Instruction SetArchitecture
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital Design
Circuit Design
Instruction SetArchitecture
The agreed-upon interface between:
the software that runs on a computer andthe hardware that executes it.
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
58/72
The Instruction Set Architecture
that part of the architecture that is visibleto the programmer
instruction formats
opcodes (available instructions)
number and types of registers
storage access, addressing modes
exceptional conditions
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
59/72
Overall goals of ISA
Can be implemented by simple hardware
Can be implemented by fast hardware
Instructions do useful things
Easy to write (or generate) machine code
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
60/72
Key ISA decisionsinstruction length
are all instructions the same length?
how many registers?
where do operands reside?
e.g., can you add contents of memory to a register?
instruction format which bits designate what??
operands how many? how big?
how are memory addresses computed?
operations
what operations are provided??
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
61/72
Running examplesWell look at four example ISAs:
Digitals VAX (1977) - elegant Intels x86 (1978) - ugly, but successful (IBM PC)
MIPS focus of text, used in assorted machines
PowerPC used in Macs, IBM supercomputers, ...
VAX and x86 are CISC (Complex InstructionSet Computers)
MIPS and PowerPC are RISC (Reduced
Instruction Set Computers) almost all machines of 80s and 90s are RISC
including VAXs successor, the DEC Alpha
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
62/72
Instruction LengthVariable:
Fixed:
x86 Instructions vary from 1 to 17 Bytes long
VAX from 1 to 54 Bytes
MIPS, PowerPC, and most other RISCs:
all instruction are 4 Bytes long
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
63/72
Instruction Length
Variable-length instructions (x86, VAX):
- require multi-step fetch and decode.
+ allow for a more flexible and compact
instruction set.
Fixed-length instructions (RISCs)
+ allow easy fetch and decode.
+ simplify pipelining and parallelism.- instruction bits are scarce.
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
64/72
Whats going on??
How is it possible that ISAs of 70s weremuch more complex than those of 90s?
Doesnt everything get more complex?
Today, transistors are much smaller &cheaper, and design tools are better, sobuilding complex computer should be easier.
How could IBM make two models of 370ISA in the same year that differed by 50xin performance??
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
65/72
Microcode
Another layer - between ISA and hardware
1 instruction sequence of microinstructions
-instruction specifies values of individual
wires Each model can have different micro-
language
low-end (cheapest) model uses simple HW, longmicroprograms.
Well look at rise and fall of microcode later
Meanwhile, back to ISAs ...
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
66/72
How many registers?
All computers have a small set of registersMemory to hold values that will be used soon
Typical instruction will use 2 or 3 register values
Advantages of a small number of registers:It requires fewer bits to specify which one.
Less hardware
Faster access (shorter wires, fewer gates)
Faster context switch (when all registers need saving)
Advantages of a larger number:Fewer loads and stores needed
Easier to do several operations at once
In 141, load means moving
data from memory to register,
store is reverse
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
67/72
How many registers?VAX 16 registers
R15 is program counter (PC)Elegant! Loading R15 is a jump instruction
x86 8 general purpose regs Fine print some restrictions apply
Plus floating point and special purpose registers
Most RISCs have 32 int and 32 floating point regsPlus some special purpose ones
PowerPC has 8 four-bit condition registers, a countregister (to hold loop index), and others.
Itanium has 128 fixed, 128 float, and 64 predicate registers
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
68/72
Where do operands reside?Stack machine:
Push loads memory into 1st
register (top of stack), moves other regsdown
Pop does the reverse.
Add combines contents of first two registers, moves rest up.
Accumulator machine:Only 1 register (called the accumulator)
Instruction include store and acc acc + mem
Register-Memory machine :Arithmetic instructions can use data in registers and/or memory
Load-Store Machine (aka Register-RegisterMachine):Arithmetic instructions can only use data in registers.
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
69/72
Load-store architecturescan do:
add r1=r2+r3
load r3, M(address)
store r1, M(address)
forces heavy dependenceon registers, which isexactly what you want in
todays CPUs
cant do:add
r1=r2+M(address)
- more instructions
+ fast implementation (e.g.,easy pipelining)
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
70/72
Where do operands reside?VAX: register-memory
Very general. 0, 1, 2, or 3 operands can be inregisters
x86: register-memory ...
But floating-point registers are a stack.
Not as general as VAX instructions
RISC machines:
Always load-store machines
Im not aware of any accumulator machines in last 20years. But they may be used by embedded processors,and might conceivable be appropriate for 141L project.
Comparing the Number of
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
71/72
p gInstructions
Code sequence for C = A + BStack Accumulator Register-Memory Load-Store
Push A Load A Load R1,A
Push B Add B Load R2,BAdd Store C
Add C, A, B
Add R3,R1,R2
Pop C Store C,R3
Alternate ISAs
7/27/2019 21SCS147L17Mid3 Revision4-14[1]
72/72
Alternate ISA sA = X*Y + X*Z
Stack Accumulator Reg-Mem Load-store
Recommended