View
244
Download
6
Category
Preview:
Citation preview
Advanced Computer Architecture (CSL502)
Unit 1: Introduction To Parallel Processing
1
PARALLEL PROCESSING
2
Advanced Computer Architecture (CSL502)
•Evolution of Computer Systems
• Parallelism in Uniprocessor Systems
Evolution of Computer systems
First Generation - 1940-1956: Vacuum Tubes •The first computers used vacuum tubes for circuitry and magnetic drums for memory, and were often enormous, taking up entire rooms. They were very expensive to operate and in addition to using a great deal of electricity, generated a lot of heat, which was often the cause of malfunctions. •First generation computers relied on machine language to perform operations, and they could only solve one problem at a time. Machine languages are the only languages understood by computers. 3
Evolution of Computer systems
First Generation - 1940-1956: Vacuum Tubes •The UNIVAC and ENIAC computers are examples of first- generation computing devices. The UNIVAC was the first commercial computer delivered to a business client, the U.S. Census Bureau in 1951. •Acronym for Electronic Numerical Integrator and
Computer, the world's first operational electronic digital computer, developed by Army Ordnance to compute World War II ballistic firing tables. The ENIAC, weighing 30 tons, using 200 kilowatts of electric power and consisting of 18,000 vacuum tubes,1,500 relays, and hundreds of thousands of resistors, capacitors, and inductors, was completed in 1945. 4
Evolution of Computer systems
First Generation - 1940-1956: Vacuum Tubes •In addition to ballistics, the ENIAC's field of application included weather prediction, atomic-energy calculations, cosmic-ray studies, thermal ignition, random-number studies, wind-tunnel design, and other scientific uses. The ENIAC soon became obsolete as the need arose for faster computing speeds.
5
Evolution of Computer systems
Second Generation - 1956-1963: Transistors •Transistors replaced vacuum tubes and ushered in the second generation computer. Transistor is a device composed of semiconductor material that amplifies a signal or opens or closes a circuit. Invented in 1947 at Bell Labs, transistors have become the key ingredient of all digital circuits, including computers. Today's latest microprocessor contains tens of millions of microscopic transistors. •Though the transistor still generated a great deal of heat that subjected the computer to damage, it was a vast improvement over the vacuum tube. Second-generation computers still relied on punched cards for input and printouts for output. 6
Evolution of Computer systems Second Generation - 1956-1963: Transistors •Second-generation computers moved from cryptic binary
machine language to symbolic, or assembly, languages, which allowed programmers to specify instructions in words. High-level programming languages were also being developed at this time, such as early versions of COBOL and FORTRAN. •These were also the first computers that stored their instructions in their memory, which moved from a magnetic drum to magnetic core technology. •The first computers of this generation were developed for the atomic energy industry.
7
Evolution of Computer systems Third Generation - 1964-1971: Integrated Circuits •The development of the integrated circuit was the hallmark of the third generation of computers. Transistors were miniaturized and placed on silicon chips, called semiconductors, which drastically increased the speed and efficiency of computers. •A chip is a small piece of semi conducting material(usually silicon) on which an integrated circuit is embedded. A typical chip is less than ¼-square inches and can contain millions of electronic components(transistors). Computers consist of many chips placed on electronic boards called printed circuit boards. There are different types of chips. For example, CPU chips (also called microprocessors) contain an entire processing unit, whereas memory chips contain blank memory.
8
Evolution of Computer systems
Fourth Generation - 1971-Present: Microprocessors •The microprocessor brought the fourth generation of computers, as thousands of integrated circuits we rebuilt onto a single silicon chip. A silicon chip that contains a CPU. In the world of personal computers, the terms microprocessor and CPU are used interchangeably. At the heart of all personal computers and most workstations sits a microprocessor. Microprocessors also control the logic of almost all digital devices, from clock radios to fuel-injection systems for automobiles.
9
Evolution of Computer systems Fourth Generation - 1971-Present: Microprocessors •Three basic characteristics differentiate microprocessors: Instruction Set: The set of instructions that the microprocessor can execute. Bandwidth: The number of bits processed in a single instruction.
Clock Speed: Given in megahertz (MHz), the clock speed determines how many instructions per second the processor can execute.
In both cases, the higher the value, the more powerful the CPU. For example, a 32-bit microprocessor that runs at 50MHz is more powerful than a 16-bitmicroprocessor that runs at 25MHz.
10
Evolution of Computer systems
Fourth Generation - 1971-Present: Microprocessors •The Intel 4004chip, developed in 1971, located all the components of the computer - from the central processing unit and memory to input/output controls - on a single chip. •In 1981 IBM introduced its first computer for the home user, and in 1984 Apple introduced the Macintosh. Microprocessors also moved out of the realm of desktop computers and into many areas of life as more and more everyday products began to use microprocessors.
11
Evolution of Computer systems
Fifth Generation - Present and Beyond: Artificial Intelligence •Fifth generation computing devices, based on artificial intelligence, are still in development, though there are some applications, such as voice recognition, that are being used today. •Artificial Intelligence is the branch of computer science concerned with making computers behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology. Artificial
intelligence includes: Games Playing: programming computers to play games such as chess and checkers 12
Evolution of Computer systems
Fifth Generation - Present and Beyond: Artificial Intelligence •Expert Systems: programming computers to make decisions in real-life situations (for example, some expert systems help doctors diagnose diseases based on symptoms) •Natural Language: programming computers to
understand natural human languages •Neural Networks: Systems that simulate intelligence by attempting to reproduce the types of physical connections that occur in animal brains •Robotics: programming computers to see and hear and react to other sensory stimuli 13
14
Evolution of Computer systems Trends Towards Parallel Processing
15
Evolution of Computer systems Trends Towards Parallel Processing
16
Evolution of Computer systems Trends Towards Parallel Processing
From an Operating system point of view, computer systems have improved chronologically in four phases:
•Batch processing
•Multiprogramming
•Time sharing
•Multiprocessing
Evolution of Computer systems
17
Trends Towards Parallel Processing
Parallel Processing Can Be Challenged in Four Programmatic Levels: •Job or Program level •Task or Procedure level •Interinstruction level •Intrainstruction level The highest job level is conducted algorithmically The lowest Intrainstruction level is implemented by hardware means. There is a trade-off between above two Development in Data communication technologies bridges the
gap between Distributed processing and Parallel processing and we can say distributed processing is a form of parallel processing in a special environment
18
Parallelism In Uniprocessor Systems Basic Uniprocessor Architecture
19
Parallelism In Uniprocessor Systems Basic Uniprocessor Architecture
20
Parallelism In Uniprocessor Systems Parallel Processing Mechanisms
•Multiplicity of functional units
•Parallelism and pipelining within the CPU
•Overlapped CPU and I/O operations
•Use of hierarchical memory system
•Balancing of subsystem bandwidths
•Multiprogramming and time sharing
21
Parallelism In Uniprocessor Systems Parallel Processing Mechanisms
Multiplicity of functional units
22
Parallelism In Uniprocessor Systems Parallel Processing Mechanisms
Parallelism and pipelining within the CPU
•ALU contains parallel adders with carry lookahead and carry-save
techniques
•For multiply and divide high-speed multiplier recoding and
convergence division techniques are used to explore parallelism and
resource sharing
•Instruction pipelining are used
23
Parallelism In Uniprocessor Systems Parallel Processing Mechanisms
Use of hierarchical memory system
24
Parallelism In Uniprocessor Systems Parallel Processing Mechanisms
Balancing of subsystem bandwidth
•td > tm > tp
•The BANDWIDTH of a system is defined as the number of operations performed per unit time.
•In case of memory, let W be the number of words delivered per memory cycle tm then
Bm = W / tm (words/s or bytes/s)
•Memory access conflicts may cause delayed access of some of the processor
requests. In practice, the utilized memory bandwidth Bum
Bum = Bm / √ M
Where M=number of interleaved modules in memory system
25
Parallelism In Uniprocessor Systems Parallel Processing Mechanisms
Balancing of subsystem bandwidth
26
Parallelism In Uniprocessor Systems Parallel Processing Mechanisms
Multiprogramming and Time Sharing
In BATCH PROCESSING approach once a CPU is allocated to a program it remains allocated whether input/output or CPU-bound part is being run. In MULTIPROGRAMMING when a CPU is allocated to a program and its CPU-bound part is over and Input/output- bound part is about to begin, in this situation CPU is taken back from the program and allocated to another program whose CPU-bound part is ready. In case of TIME SHARING equal time slot is given to all programs for execution in Round-robin fashion.
27
Parallelism In Uniprocessor Systems Parallel Processing Mechanisms
Multiprogramming and Time Sharing
Advanced Computer Architecture (CSL502)
Unit 1: Introduction To Parallel Processing
29
Advanced Computer Architecture (CSL502)
•Parallel Computer Structures
• Architectural classification Scheme
We
Parallel Computer Structures
can divide it into three architectural
configurations:
Pipeline computers
Array processors
Multiprocessor systems
30
executed in overlapped fashion. 31
Parallel Computer Structures Pipeline computers Instruction execution in digital computer can be divided into four major steps •IF(Instruction Fetch): - from main memory •ID(Instruction Decode):- identifying the operation to be performed •OF(Operand Fetch):- accessing data(if any) on which operation to be performed •EX(Execution)-implementing instruction on data
In a nonpipelined computer, these four steps must be
completed before the next instruction can be issued In a pipelined computer, successive instructions are
Parallel Computer Structures Pipeline computers
32
operations as
Parallel Computer Structures Pipeline computers An instruction cycle consist of multiple pipeline cycles In pipeline the operation of all stages is synchronized under a common clock Interface latches are used between adjacent segments to hold the intermediate results Theoretically, a k-stage pipeline processor could be at most k times faster than nonpipeline processor However, due to memory conflicts, data dependency, branch and interrupts, this ideal speedup may not be achieved For some CPU –bound instructions, the execution phase can be further partitioned into a multiple-stage arithmetic
floating-point 33
logic pipeline,
34
Parallel Computer Structures Pipeline computers Some main issues in pipeline design job sequencing Collision prevention Congestion control Branch handling Reconfiguration Hazard resolution
Pipeline computers are suitable for VECTOR PROCESSING
Parallel Computer Structures Pipeline computers
35
36
Parallel Computer Structures Array Computers
An Array processor is a synchronous parallel
computer with multiple ALUs, called Processing Elements(PE) that can operate in parallel Data routing mechanism is used among PEs Scalar and control-type instructions are directly executed in the control unit(CU) Each PE consists of an ALU with registers and a local memory PEs are passive devices without instruction decoding capabilities
Parallel Computer Structures Array Computers
37
oMultiport memories 38
Parallel Computer Structures Multiprocessor Systems Multiprocessor systems are used to improve Throughput Reliability Flexibility Availability Multiprocessor system contains two or more processors All processors share access to common sets of memory modules, I/O channels, and peripherals devices Single integrated Operating System governs everything Multiprocessor hardware system organization is determined primarily by the interconnection structure to be used between the memories and processors. Three different interconnections have been used oTime –shared common bus oCrossbar switch network
Parallel Computer Structures Multiprocessor Systems
39
40
Parallel Computer Structures Performance of Parallel Computers
The theoretical speedup achieved by n identical
parallel processor is at most n times faster than a
single processor
In practice, it is not achieved due to
Memory and communication paths conflicts
Inefficient algorithm etc
Parallel Computer Structures Performance of Parallel Computers
41
42
Parallel Computer Structures Data Flow Computer
The conventional von Neumann machines are called control flow
computers Program Counter controls execution of program To exploit maximum parallelism in a program, Data Flow Computer were suggested The basic concept is to enable the execution of an instruction whenever its required operand become available Programs for data-driven computations can be represented by Data Flow Graphs
Next slide shows Data Flow Graph for z=(x+y)*2
Parallel Computer Structures Data Flow Computer
43
Parallel Computer Structures Data Flow Computer
44
45
Parallel Computer Structures Data Flow Computer
The basic mechanism for the execution of a data flow program
Each instruction in data flow computer is implemented
as template Activity templates are stored in the activity store Each activity template has a unique address Activity template's address is entered in instruction queue when instruction is ready to execute Instruction fetch and data access are handled by fetch and update units The operation unit performs the required operation
46
Architectural Classification Schemes Multiplicity of Instruction – Data Streams
It was introduced by Michael J. Flynn
According to it computer organization is characterized by the multiplicity of hardware provided to service the instruction and the data stream. There are four categorizations
Single instruction stream-single data stream(SISD)
Single instruction stream-multiple data stream(SIMD) Multiple instruction stream-single data stream(MISD) Multiple instruction stream-multiple data
stream(MIMD)
Architectural Classification Schemes Multiplicity of Instruction – Data Streams
47
Architectural Classification Schemes Multiplicity of Instruction – Data Streams
48
Architectural Classification Schemes Multiplicity of Instruction – Data Streams
49
50
Architectural Classification Schemes Serial versus Parallel Processing
It was given by Feng
It uses the degree of parallelism to classify various computer architectures The max no of bits processed by a computer in unit time is called “Maximun Parallelism Degree” P Let Pi bits processed by processor in ith processor cycle
T Consider T processor cycles indexed by i-1,2,3, The average parallelism degree, Pa
51
Architectural Classification Schemes Serial versus Parallel Processing
In general, Pi<=P
Utilization rate of a computer system within T cycles
52
Architectural Classification Schemes Serial versus Parallel Processing
The max parallelism degree P ( C ) of a given computer C is represented by the product of the word length w and the bit-slice length m
P( C ) =n.m
There are four types of processing methods Word-serial and bit-serial(WSBS) Word-parallel and bit-serial(WPBS) Word-serial and bit-parallel(WSBP) Word-parallel and bit-parallel(WPBP)
Architectural Classification Schemes Serial versus Parallel Processing
53
54
Architectural Classification Schemes Parallelism versus Pipelining
It was proposed by Handler It is based on parallelism in Processor Control Unit, ALU, Bit-level circuit
Architectural Classification Schemes Parallelism versus Pipelining
55
Architectural Classification Schemes Parallelism versus Pipelining
56
Architectural Classification Schemes Parallelism versus Pipelining
57
Architectural Classification Schemes Parallelism versus Pipelining
58
Advanced Computer Architecture (CSL502)
Unit 1: Introduction To Parallel Processing
59
Advanced Computer Architecture (CSL502)
Parallel Processing Applications
60
61
Introduction To Parallel Processing Parallel Processing Applications
Fast and efficient computing is highly demanded in many areas like Scientific Engineering Energy resource Medical Military Artificial intelligence Basic research areas
62
Introduction To Parallel Processing Parallel Processing Applications
Large-scale scientific problem solving involves
three interactive disciplines:
Theories
Experiments
computations
Introduction To Parallel Processing Parallel Processing Applications
63
64
Introduction To Parallel Processing Parallel Processing Applications
Computer simulations has several advantages:
Computer simulation are far cheaper and faster than
physical experiments
Computers can solve a wider range of problems than
scientific laboratory equipment can
Computational approaches are only limited by
computer speed and memory capacity, while physical
experiments have many practical constraints
65
Introduction To Parallel Processing Parallel Processing Applications
We can divide parallel processing applications in FOUR
categories
Predictive Modeling and Simulations
Engineering Design and Automation
Energy Resources Exploration
Medical, Military, and Basic Research
66
Introduction To Parallel Processing Parallel Processing Applications
Predictive Modeling and Simulations
World scientists are concerned about multidimensional
modeling of the atmosphere, the earth environment,
outer space, and the world economy
Predictive modeling is done through extensive
computer simulation experiments which needs computing
speed of 1000million megaflops or above
FLOPS=Floating Point Operation Per Second
67
Introduction To Parallel Processing Parallel Processing Applications
Predictive Modeling and Simulations are required in the
following areas
Numerical weather forecasting
Oceanography and astrophysics
Socioeconomics and government use
68
Introduction To Parallel Processing Parallel Processing Applications
Numerical weather forecasting
Computations are carried out on a three dimensional grid that
partitions the atmosphere vertically into K levels and horizontally
into M intervals of longitude and N intervals of latitude
Using 270-mile grid(between New York and Washington, D.C.) , a
24-hour forecast would need to perform about 100 billion data
operations
A 100 megaflops computer needs 100 minutes to compute
Introduction To Parallel Processing Parallel Processing Applications
69
70
Introduction To Parallel Processing Parallel Processing Applications
Oceanography and astrophysics
To do a complete simulation of the Pacific ocean with adequate
resolution (10 grid) for 50 years would take 1000 hours on a Cyber-
205 computer
The formation of the earth from planetesimals in the solar system
can be simulated with a high speed computer
The dynamic range of astrophysics studies may be from billions of
years to milliseconds
Interesting problems include the physics of supernovae and the
dynamics of galaxies. Illiac-IV array processor was used.
71
Introduction To Parallel Processing Parallel Processing Applications
Oceanography and astrophysics
Since oceans exchange heat with the atmosphere, a good
understanding of oceans would help in the following areas
Climate predictive analysis
Fishery management
Ocean resource exploration
Coastal dynamics and tides
Oceanography studies use a grid size on a smaller scale and a time
variability on a large scale than those used for atmospheric studies
72
Introduction To Parallel Processing Parallel Processing Applications
Socioeconomics and government use
Nobel laureate W. W. Leontief(1980) has proposed an input
output model of the world economy which performs large scale
matrix operations on a CDC scientific computer. This United Nations
supported world economic simulation suggests how a system of
international economic relations that features a partial
disarmament could narrow the gap between the rich and the poor.
In US, FBI uses large computers for crime control
IRS, uses large number of fast mainframe for tax collection and
auditing.
73
Introduction To Parallel Processing Parallel Processing Applications
We can divide parallel processing applications in FOUR
categories
Predictive Modeling and Simulations
Engineering Design and Automation
Energy Resources Exploration
Medical, Military, and Basic Research
74
Introduction To Parallel Processing Parallel Processing Applications
Engineering Design and Automation
Some of the area where fast computers are used
Finite –element analysis
Computational aerodynamics
Artificial intelligence and automation
Remote sensing application
75
Introduction To Parallel Processing Parallel Processing Applications
We can divide parallel processing applications in FOUR
categories
Predictive Modeling and Simulations
Engineering Design and Automation
Energy Resources Exploration
Medical, Military, and Basic Research
76
Introduction To Parallel Processing Parallel Processing Applications
Energy Resources Exploration
Seismic exploration- in oil finding
Reservoir modeling- modeling of oil fields
Plasma fusion power- in nuclear fusion research
Nuclear reactor safety
77
Introduction To Parallel Processing Parallel Processing Applications
We can divide parallel processing applications in FOUR
categories
Predictive Modeling and Simulations
Engineering Design and Automation
Energy Resources Exploration
Medical, Military, and Basic Research
78
Introduction To Parallel Processing Parallel Processing Applications
Medical, Military, and Basic Research
Computer-assisted tomography-the human body can be
modeled by it(CAT) scanning
Genetic engineering- biological systems can be
simulated on supercomputers. A highly pipelined
machine, called the Cytocomputer, has been developed at
the Michigan Environmental research institute for
biomedical image processing.
Weapon research and defence
Advanced Computer Architecture (CSL502)
Unit 1: Memory and Input-Output Subsystem
79
Advanced Computer Architecture (CSL502)
Hierarchical Memory Structure Virtual Memory System Memory Allocation and Management Cache Memories and Management Input-Output Subsystems
80
81
Hierarchical Memory Structure Memory Hierarchy
The design objective of hierarchical memory in a parallel processing system and a multiprogramed uniprocessor system are to attempt to match the processor speed with the rate of information transfer or the bandwidth of the memory at the lowest level and at a reasonable cost In multiprocessor systems, it is frequent that the arrival of concurrent memory requests to memory at the same level of the hierarchy If two or more requests are directed to the same section of the memory at the same level, a conflict is said to occur, which could degrade the performance of the system To avoid conflict the partitioning of the memory at the same level into several modules are done so that some degree of concurrent access can be achieved
82
Hierarchical Memory Structure Memory Hierarchy
Memories in the hierarchy can be classified on the basis of Accessing method Random Access Memory(RAM)-the access time ta of a memory word is independent of its location Sequential Access Memory(SAM)-information is accessed serially Direct Access Storage Device(DASD)-rotational devices made of magnetic materials where any block of information can be accessed directly Speed or Access time –in memory hierarchy the highest level has fastest memory speed and lowest level has slowest Primary- Example is RAM Secondary- Example is DASD
Hierarchical Memory Structure Memory Hierarchy
83
Hierarchical Memory Structure Memory Hierarchy
84
CCD=Charged Coupled Devices
85
Hierarchical Memory Structure Memory Hierarchy Example- Three level memory hierarchy
86
Hierarchical Memory Structure Memory Hierarchy
The processor usually references an item in memory by providing the address of that item A memory hierarchy is usually organized so that the address space in level i is a subset of that in level i+1 Address Ak in level i is not necessarily address Ak in level i+1, but any information in level i may also exist in level i+1. some of the information in level i may be more updated than that in level i+1 This different copies of same data creates DATA CONSISTENCY or COHERENCE problem between adjacent levels The data consistency problem may also exist between the local memories or caches when two cooperating processes, which are executing concurrently or on separate processors, interact via one or more shared variable.
87
P1
Hierarchical Memory Structure Memory Hierarchy P2
X=1
In modeling the performance of a hierarchical memory HIT RATIO(H) is used, which is a probability of finding the requested information in the memory of a given level H depends upon the granularity of information transfer, the capacity, the management strategy any other factors The hit ratio/success function may be written as H(s), where s=memory size The miss ratio is F(s)=1-H(s) Access frequency at level i, relative number of successful access to level i, is
hi=H(si ) – H(si - 1 )
88
The granularity(Block size) transferred & the
management policy Design of the processor-memory interconnection network Two performance measures Effective memory access time Utilization of processor
Hierarchical Memory Structure Memory Hierarchy Optimization of Memory Hierarchy
The of n-level memory hierarchy designing is a tradeoff in between performance and cost Performance depends on Program behavior with respect to memory references The access time & memory size of each level
89
Which is
T= Ʃ* H(sn ) – H(si – 1 )] . ti i=1
Hierarchical Memory Structure Memory Hierarchy Optimization of Memory Hierarchy
The effective access time Ti from the processor to the ith
level of memory hierarchy is the sum of the individual
average access times tk of each level from k=1 to i i
Ti=Ʃ tk
k=1 The effective access time for each memory reference in n-level memory hierarchy is n
T= Ʃ hi .Ti
i=1 n
90
n n
Which is T= Ʃ* 1 – H(si – 1 )] . ti = Ʃ* F(si – 1 )] . ti
i=1 i=1
Total cost of memory system is n
C= Ʃ c(ti ) . si
i=1 Where c (ti ) is the cost per byte of memory at level i and si is size at level i A typical memory-hierarchy design problem involves
n min T=Ʃ [ F(si – 1 )] . ti
i=1
Subject to the constraints C<=C0 where si >0 and ti> 0, for i=1,2,3, …….n
Hierarchical Memory Structure Memory Hierarchy Optimization of Memory Hierarchy
All data are available at level n, thus H(sn )=1
91
Hierarchical Memory Structure Addressing Schemes for Main Memory
Main memory is partitioned into several independent modules and the address distributed across these modules This scheme is called interleaving. The interleaving of
address among M modules is called M-way
interleaving Two methods of interleaving The high-order m bits are used to select the modules while remaining n-m bits select the address within the module The low-order m bits are used to select the modules while remaining n-m bits select the address within the module
Hierarchical Memory Structure Addressing Schemes for Main Memory
92
High order bits
Hierarchical Memory Structure Addressing Schemes for Main Memory
93
94
Virtual Memory System The Concept of Virtual Memory Using virtual memory(VM) concept, a program whose size is larger than the
available free memory space can be executed
In VM concept a program is divided into pages(equal sized parts) and loaded
into memory one by one as demanded by the cpu
Memory management is required in the following phases
Program structure and design
Compiler assigns names while translating the program modules from
programming language into the modules of machine code or unique identifiers
A linker then combines these modules of unique identifiers
Composite is translated by a loader into main memory location
The set of unique identifiers defines the virtual space or name space
The set of main memory locations allocated to the program defines the physical
memory space
The last phase is dynamic memory management required during the execution
of program
95
Virtual Memory System The Concept of Virtual Memory Let the name space Vj generated by the jth program running on a processor consists of a set of n unique identifiers
Vj={0,1, ……….n-1}
Let the memory space allocated to the program in execution has m locations
M={0,1, ………..m-1}
Since the allocated memory space may vary with program execution, m is a
function of time. At any time t and for each referenced name x ϵ Vj
there is an address map fj (t) : Vj -> M U { Ф}
The function fj (t) is defined by
fj [x,t] = {
y
Ф
if at time t item x is in location y
if at time t item x is missing from M
When item is missing, a fault handler takes following actions
A placement policy selects a location in memory where the fetched item will
be placed
If memory is full, a replacement policy selects item(s) to remove
A fetch policy decides when an item is to be fetched from lower Memory
96
Virtual Memory System The Concept of Virtual Memory Program Locality : due to the looping, sequential and blocked formatted control
structures inherent in the grouping of instructions, and data in program, cpu
reference generation pattern is predictable. This property is called locality of
reference.
There are three types of localities
Temporal- there is a tendency for a process to reference in the near future the
elements of the reference strings referenced in the recent past. It is due to loops,
temporary variables, or process blocks
Spatial- there is a tendency for a process to make references to a portion of the
virtual address space in the neighborhood of the last reference
Working set(W)
if we consider a hypothetical interval time window ∆ which moves across the
virtual time axis, it can be seen that only a subset of the virtual address space is
needed during the time interval of the history of a process. The subset of the
virtual space referenced during the interval t, t + ∆ is called the working set W (t, ∆).
∆ is a critical parameter to optimize the working set of the process
Virtual Memory System The Concept of Virtual Memory
97
98
Virtual Memory System The Concept of Virtual Memory Program Relocation : During the program execution, processor generates logical addresses which are mapped into physical addresses in main
memory When the program is initially loaded, the address mapping is called static relocation and address mapping during the execution is called dynamic relocation static relocation makes it difficult for process to share information which is modifiable during execution One technique for dynamic relocation is the use of relocation/base register. Program may be loaded initially using
static relocation, after which that may be displaced within memory and the contents of relocation register adjusted to reflect the displacement. Two or more processes may share the programs by using different relocation register.
Virtual Memory System Paged Memory System
99
DIRECT MAPPING
##This method Needs two memory access, slow##
C=Changed bit
100
Virtual Memory System Paged Memory System In this scheme, the virtual space is partitioned into pages and memory is partitioned into frames
Each virtual page contains virtual page no. ip (mapped)
and displacement iw (unmapped) The address map consist of a page table(PT), which contains base address of the frame in memory, if exist The simplest page table may contain one entry for each possible virtual page One page table for each process, and it is created in main memory at the initiation of the process PTBR(page table base reg) in each processor contains the base address of the page table of the currently running process on the processor
101
Virtual Memory System Paged Memory System Technique of maintaining multiple virtual address space
102
Virtual Memory System Paged Memory System Technique of maintaining multiple virtual address space
In multiprogrammed processor page map contains virtual page
number (ip), a process identification, the RWX, a modified bit (C), and PFA in shared memory
Process identification of currently running process is present in current process register (CPR) of processor
in this scheme virtual page no. of virtual address(VA) is associatively compared with all page map entries(PME) that have same process identification as the current running process. If matched, page frame number is retrived and displacement is concatenated to form physical address. If no match, a page fault interrupt occur which locate the page.
103
Virtual Memory System Paged Memory System
Problem with pure paged memory system
It is inefficient if the virtual space is large.
For example, for 32-bit VA and 1K page size, page address is 22-
bits needs 222 page table entries. Assume for 8 MB main memory
223 / 210 =213 page frames. PTE has 13 bits page frame field
No mechanism for reasonable implementation of sharing
Internal fragmentation, last page may have unused space
Table fragmentation, main memory occupied by PT and so are
unavailable for virtual pages
104
Virtual Memory System Paged Memory System Example- Address & page table entry formats of VAX-11/780 virtual mem.(VM)
105
Virtual Memory System Paged Memory System Example-Partition of virtual address space of VAX-11/780 virtual mem.(VM)
106
Virtual Memory System Paged Memory System Example-Region addressing scheme of VAX-11/780
P0 for program’s region page table
P1 for control’s region page table
Virtual Memory System Paged Memory System
107
Example- VAX-11/780
Virtual Memory System Paged Memory System
108
Example- VAX-11/780
109
Virtual Memory System Segmented Memory System
Block-structured HLL programs have high degree of modularity(eg C etc.)
modules are compiled to produce machine codes in a logical space which
further loaded, linked and executed
Set of logically related contiguous data elements are called segments
Segments are allowed to grow and shrink almost arbitrarily, unlike pages
Segmentation is technique for managing virtual space allocation, whereas
paging is a concept to manage the physical space
An element in a segment is referenced by the segment name-element name
pair(<s>, [i]).
During program execution, the segment name<s> is translated into a segment
address by OS and element name is a displacement within a segment
A program consists of a set of linked segments where links are created as a
result of procedure segment calls within the program segment
110
Virtual Memory System Segmented Memory System Example- Segmentation was used in Burroughs B5500
Each process has ST pointed by STBR Address fields contains base address of segment in main memory
111
Virtual Memory System Segmented Memory System Example- Segmentation was used in Burroughs B5500
112
Virtual Memory System Segmented Memory System
When a segment <s> is initially referenced in a process, its segment number is not established in this case an entry must be created in ST A global table, active segment table(AST) is searched to determine whether the segment is active in memory. If it is, the base address and its attribute are returened from AST and an entry is made in AST to indicate that the process is using segment. If entry is not present in AST, file directory search is initiated and appropriate entries are made in AST and ST Known segment table is associated with each process, which contains entries on a set of segments known to the process
Virtual Memory System Paged Segmentation Memory System
113
Virtual Memory System Paged Segmentation Memory System
114
Advanced Computer Architecture (CSL502)
Unit 1: Memory and Input-Output Subsystem
115
Advanced Computer Architecture (CSL502)
Hierarchical Memory Structure Virtual Memory System Memory Allocation and Management Cache Memories and Management Input-Output Subsystems
116
117
Memory Allocation & Management Classification of Memory Policies
Two policies: Fixed and variable partitioning for allocation of memory pages to active process If the resident set size zi(t) is fixed for all t during which process Pi is active, then the size vector Z(t) is constant during any interval in which the set of active process is fixed; this is called fixed partitioning In variable partitioning the partition vector Z(t) varies with time. The advantage of fixed partitioning is low overhead of implementation but memory utilization degraded Besides fixed and variable partitioning strategies, a memory policy can be global or local. A local policy involves only the resident set of the faulting process; the global policy considers the history of the resident sets of all active processes in making a decision.
118
Memory Allocation & Management Classification of Memory Policies
When a page fault occurs, two memory-fetching policies used in fetching the pages of a process. Demand perfecting and demand fetching. In demand perfecting, a number of pages including the faulting page of the process are fetched in anticipation of the process’s future requirements. In demand fetching, only the page referenced is fetched on a miss The ith process’s behavior is described in terms of its reference string, which is a sequence:
Ri(T)=ri(1)ri(2)…….ri(T) Where, ri(k) is the number of the page containing the virtual address references of the process Pi at time k, where k=1,2,….T measures the execution time or virtual time
Memory Allocation & Management Optimal Load Control
In multiprogramming environment memory is used dynamically The number of active processes(degree of multiprogramming) in a parallel processor system is usually greater than the number of processors so that switching among processes can be done This capability requires the memory be able to hold the pages of the active processes in order to reduce context switching time Multiprogramming improves concurrency in the use of all system resources, but the degree of multiprogramming should be varied dynamically to maintain both a low overhead on the system and a high degree of concurrency
120
A multiprogrammed multiprocessing virtual memory system model
Memory Allocation & Management Optimal Load Control The network has two portion:
(a) Active network-which contains the processor, memory and the file
memory, (b) Passive network-contains
process queue and the policies for admitting new processes to active status. A process is active if it is in active network, where it is eligible to receive processing and have pages in main memory. Each
active process is waiting or in service
Memory Allocation & Management Optimal Load Control
121
Memory Allocation & Management Memory Management Policies
122
123
Memory Allocation & Management Cache Memories and Management
Characteristics of Cache It consists of two parts: cache directory and RAM The memory portion is partitioned into a number of equal-sized blocks called block frames The directory is implemented as some form of associative memory, consists of block address tags and some control bits such as “dirty” bit , a “valid” bit and protection bits The address tag contains the block address of the blocks that are currently in the memory
Memory Allocation & Management Cache Memories and Management
124
Simplified flowchart of cache operation for fetch
Recommended