Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer...

Advanced Computer Architecture (CSL502)

Unit 1: Introduction To Parallel Processing

PARALLEL PROCESSING

•Evolution of Computer Systems

• Parallelism in Uniprocessor Systems

Evolution of Computer systems

First Generation - 1940-1956: Vacuum Tubes •The first computers used vacuum tubes for circuitry and magnetic drums for memory, and were often enormous, taking up entire rooms. They were very expensive to operate and in addition to using a great deal of electricity, generated a lot of heat, which was often the cause of malfunctions. •First generation computers relied on machine language to perform operations, and they could only solve one problem at a time. Machine languages are the only languages understood by computers. 3

First Generation - 1940-1956: Vacuum Tubes •The UNIVAC and ENIAC computers are examples of first- generation computing devices. The UNIVAC was the first commercial computer delivered to a business client, the U.S. Census Bureau in 1951. •Acronym for Electronic Numerical Integrator and

Computer, the world's first operational electronic digital computer, developed by Army Ordnance to compute World War II ballistic firing tables. The ENIAC, weighing 30 tons, using 200 kilowatts of electric power and consisting of 18,000 vacuum tubes,1,500 relays, and hundreds of thousands of resistors, capacitors, and inductors, was completed in 1945. 4

First Generation - 1940-1956: Vacuum Tubes •In addition to ballistics, the ENIAC's field of application included weather prediction, atomic-energy calculations, cosmic-ray studies, thermal ignition, random-number studies, wind-tunnel design, and other scientific uses. The ENIAC soon became obsolete as the need arose for faster computing speeds.

Second Generation - 1956-1963: Transistors •Transistors replaced vacuum tubes and ushered in the second generation computer. Transistor is a device composed of semiconductor material that amplifies a signal or opens or closes a circuit. Invented in 1947 at Bell Labs, transistors have become the key ingredient of all digital circuits, including computers. Today's latest microprocessor contains tens of millions of microscopic transistors. •Though the transistor still generated a great deal of heat that subjected the computer to damage, it was a vast improvement over the vacuum tube. Second-generation computers still relied on punched cards for input and printouts for output. 6

Evolution of Computer systems Second Generation - 1956-1963: Transistors •Second-generation computers moved from cryptic binary

machine language to symbolic, or assembly, languages, which allowed programmers to specify instructions in words. High-level programming languages were also being developed at this time, such as early versions of COBOL and FORTRAN. •These were also the first computers that stored their instructions in their memory, which moved from a magnetic drum to magnetic core technology. •The first computers of this generation were developed for the atomic energy industry.

Evolution of Computer systems Third Generation - 1964-1971: Integrated Circuits •The development of the integrated circuit was the hallmark of the third generation of computers. Transistors were miniaturized and placed on silicon chips, called semiconductors, which drastically increased the speed and efficiency of computers. •A chip is a small piece of semi conducting material(usually silicon) on which an integrated circuit is embedded. A typical chip is less than ¼-square inches and can contain millions of electronic components(transistors). Computers consist of many chips placed on electronic boards called printed circuit boards. There are different types of chips. For example, CPU chips (also called microprocessors) contain an entire processing unit, whereas memory chips contain blank memory.

Fourth Generation - 1971-Present: Microprocessors •The microprocessor brought the fourth generation of computers, as thousands of integrated circuits we rebuilt onto a single silicon chip. A silicon chip that contains a CPU. In the world of personal computers, the terms microprocessor and CPU are used interchangeably. At the heart of all personal computers and most workstations sits a microprocessor. Microprocessors also control the logic of almost all digital devices, from clock radios to fuel-injection systems for automobiles.

Evolution of Computer systems Fourth Generation - 1971-Present: Microprocessors •Three basic characteristics differentiate microprocessors: Instruction Set: The set of instructions that the microprocessor can execute. Bandwidth: The number of bits processed in a single instruction.

Clock Speed: Given in megahertz (MHz), the clock speed determines how many instructions per second the processor can execute.

In both cases, the higher the value, the more powerful the CPU. For example, a 32-bit microprocessor that runs at 50MHz is more powerful than a 16-bitmicroprocessor that runs at 25MHz.

Fourth Generation - 1971-Present: Microprocessors •The Intel 4004chip, developed in 1971, located all the components of the computer - from the central processing unit and memory to input/output controls - on a single chip. •In 1981 IBM introduced its first computer for the home user, and in 1984 Apple introduced the Macintosh. Microprocessors also moved out of the realm of desktop computers and into many areas of life as more and more everyday products began to use microprocessors.

Fifth Generation - Present and Beyond: Artificial Intelligence •Fifth generation computing devices, based on artificial intelligence, are still in development, though there are some applications, such as voice recognition, that are being used today. •Artificial Intelligence is the branch of computer science concerned with making computers behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology. Artificial

intelligence includes: Games Playing: programming computers to play games such as chess and checkers 12

Fifth Generation - Present and Beyond: Artificial Intelligence •Expert Systems: programming computers to make decisions in real-life situations (for example, some expert systems help doctors diagnose diseases based on symptoms) •Natural Language: programming computers to

understand natural human languages •Neural Networks: Systems that simulate intelligence by attempting to reproduce the types of physical connections that occur in animal brains •Robotics: programming computers to see and hear and react to other sensory stimuli 13

Evolution of Computer systems Trends Towards Parallel Processing

From an Operating system point of view, computer systems have improved chronologically in four phases:

•Batch processing

•Multiprogramming

•Time sharing

•Multiprocessing

Trends Towards Parallel Processing

Parallel Processing Can Be Challenged in Four Programmatic Levels: •Job or Program level •Task or Procedure level •Interinstruction level •Intrainstruction level The highest job level is conducted algorithmically The lowest Intrainstruction level is implemented by hardware means. There is a trade-off between above two Development in Data communication technologies bridges the

gap between Distributed processing and Parallel processing and we can say distributed processing is a form of parallel processing in a special environment

Parallelism In Uniprocessor Systems Basic Uniprocessor Architecture

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

•Multiplicity of functional units

•Parallelism and pipelining within the CPU

•Overlapped CPU and I/O operations

•Use of hierarchical memory system

•Balancing of subsystem bandwidths

•Multiprogramming and time sharing

Multiplicity of functional units

Parallelism and pipelining within the CPU

•ALU contains parallel adders with carry lookahead and carry-save

techniques

•For multiply and divide high-speed multiplier recoding and

convergence division techniques are used to explore parallelism and

resource sharing

•Instruction pipelining are used

Use of hierarchical memory system

Balancing of subsystem bandwidth

•td > tm > tp

•The BANDWIDTH of a system is defined as the number of operations performed per unit time.

•In case of memory, let W be the number of words delivered per memory cycle tm then

Bm = W / tm (words/s or bytes/s)

•Memory access conflicts may cause delayed access of some of the processor

requests. In practice, the utilized memory bandwidth Bum

Bum = Bm / √ M

Where M=number of interleaved modules in memory system

Balancing of subsystem bandwidth

Multiprogramming and Time Sharing

In BATCH PROCESSING approach once a CPU is allocated to a program it remains allocated whether input/output or CPU-bound part is being run. In MULTIPROGRAMMING when a CPU is allocated to a program and its CPU-bound part is over and Input/output- bound part is about to begin, in this situation CPU is taken back from the program and allocated to another program whose CPU-bound part is ready. In case of TIME SHARING equal time slot is given to all programs for execution in Round-robin fashion.

Multiprogramming and Time Sharing

•Parallel Computer Structures

• Architectural classification Scheme

Parallel Computer Structures

can divide it into three architectural

configurations:

Pipeline computers

Array processors

Multiprocessor systems

executed in overlapped fashion. 31

Parallel Computer Structures Pipeline computers Instruction execution in digital computer can be divided into four major steps •IF(Instruction Fetch): - from main memory •ID(Instruction Decode):- identifying the operation to be performed •OF(Operand Fetch):- accessing data(if any) on which operation to be performed •EX(Execution)-implementing instruction on data

In a nonpipelined computer, these four steps must be

completed before the next instruction can be issued In a pipelined computer, successive instructions are

Parallel Computer Structures Pipeline computers

operations as

Parallel Computer Structures Pipeline computers An instruction cycle consist of multiple pipeline cycles In pipeline the operation of all stages is synchronized under a common clock Interface latches are used between adjacent segments to hold the intermediate results Theoretically, a k-stage pipeline processor could be at most k times faster than nonpipeline processor However, due to memory conflicts, data dependency, branch and interrupts, this ideal speedup may not be achieved For some CPU –bound instructions, the execution phase can be further partitioned into a multiple-stage arithmetic

floating-point 33

logic pipeline,

Parallel Computer Structures Pipeline computers Some main issues in pipeline design job sequencing Collision prevention Congestion control Branch handling Reconfiguration Hazard resolution

Pipeline computers are suitable for VECTOR PROCESSING

Parallel Computer Structures Pipeline computers

Parallel Computer Structures Array Computers

An Array processor is a synchronous parallel

computer with multiple ALUs, called Processing Elements(PE) that can operate in parallel Data routing mechanism is used among PEs Scalar and control-type instructions are directly executed in the control unit(CU) Each PE consists of an ALU with registers and a local memory PEs are passive devices without instruction decoding capabilities

Parallel Computer Structures Array Computers

oMultiport memories 38

Parallel Computer Structures Multiprocessor Systems Multiprocessor systems are used to improve Throughput Reliability Flexibility Availability Multiprocessor system contains two or more processors All processors share access to common sets of memory modules, I/O channels, and peripherals devices Single integrated Operating System governs everything Multiprocessor hardware system organization is determined primarily by the interconnection structure to be used between the memories and processors. Three different interconnections have been used oTime –shared common bus oCrossbar switch network

Parallel Computer Structures Multiprocessor Systems

Parallel Computer Structures Performance of Parallel Computers

The theoretical speedup achieved by n identical

parallel processor is at most n times faster than a

single processor

In practice, it is not achieved due to

Memory and communication paths conflicts

Inefficient algorithm etc

Parallel Computer Structures Performance of Parallel Computers

Parallel Computer Structures Data Flow Computer

The conventional von Neumann machines are called control flow

computers Program Counter controls execution of program To exploit maximum parallelism in a program, Data Flow Computer were suggested The basic concept is to enable the execution of an instruction whenever its required operand become available Programs for data-driven computations can be represented by Data Flow Graphs

Next slide shows Data Flow Graph for z=(x+y)*2

Parallel Computer Structures Data Flow Computer

The basic mechanism for the execution of a data flow program

Each instruction in data flow computer is implemented

as template Activity templates are stored in the activity store Each activity template has a unique address Activity template's address is entered in instruction queue when instruction is ready to execute Instruction fetch and data access are handled by fetch and update units The operation unit performs the required operation

Architectural Classification Schemes Multiplicity of Instruction – Data Streams

It was introduced by Michael J. Flynn

According to it computer organization is characterized by the multiplicity of hardware provided to service the instruction and the data stream. There are four categorizations

Single instruction stream-single data stream(SISD)

Single instruction stream-multiple data stream(SIMD) Multiple instruction stream-single data stream(MISD) Multiple instruction stream-multiple data

stream(MIMD)

Architectural Classification Schemes Multiplicity of Instruction – Data Streams

Architectural Classification Schemes Serial versus Parallel Processing

It was given by Feng

It uses the degree of parallelism to classify various computer architectures The max no of bits processed by a computer in unit time is called “Maximun Parallelism Degree” P Let Pi bits processed by processor in ith processor cycle

T Consider T processor cycles indexed by i-1,2,3, The average parallelism degree, Pa

In general, Pi<=P

Utilization rate of a computer system within T cycles

The max parallelism degree P ( C ) of a given computer C is represented by the product of the word length w and the bit-slice length m

P( C ) =n.m

There are four types of processing methods Word-serial and bit-serial(WSBS) Word-parallel and bit-serial(WPBS) Word-serial and bit-parallel(WSBP) Word-parallel and bit-parallel(WPBP)

Architectural Classification Schemes Parallelism versus Pipelining

It was proposed by Handler It is based on parallelism in Processor Control Unit, ALU, Bit-level circuit

Architectural Classification Schemes Parallelism versus Pipelining

Parallel Processing Applications

Introduction To Parallel Processing Parallel Processing Applications

Fast and efficient computing is highly demanded in many areas like Scientific Engineering Energy resource Medical Military Artificial intelligence Basic research areas

Large-scale scientific problem solving involves

three interactive disciplines:

Theories

Experiments

computations

Computer simulations has several advantages:

Computer simulation are far cheaper and faster than

physical experiments

Computers can solve a wider range of problems than

scientific laboratory equipment can

Computational approaches are only limited by

computer speed and memory capacity, while physical

experiments have many practical constraints

We can divide parallel processing applications in FOUR

categories

Predictive Modeling and Simulations

Engineering Design and Automation

Energy Resources Exploration

Medical, Military, and Basic Research

World scientists are concerned about multidimensional

modeling of the atmosphere, the earth environment,

outer space, and the world economy

Predictive modeling is done through extensive

computer simulation experiments which needs computing

speed of 1000million megaflops or above

FLOPS=Floating Point Operation Per Second

Predictive Modeling and Simulations are required in the

following areas

Numerical weather forecasting

Oceanography and astrophysics

Socioeconomics and government use

Numerical weather forecasting

Computations are carried out on a three dimensional grid that

partitions the atmosphere vertically into K levels and horizontally

into M intervals of longitude and N intervals of latitude

Using 270-mile grid(between New York and Washington, D.C.) , a

24-hour forecast would need to perform about 100 billion data

operations

A 100 megaflops computer needs 100 minutes to compute

To do a complete simulation of the Pacific ocean with adequate

resolution (10 grid) for 50 years would take 1000 hours on a Cyber-

205 computer

The formation of the earth from planetesimals in the solar system

can be simulated with a high speed computer

The dynamic range of astrophysics studies may be from billions of

years to milliseconds

Interesting problems include the physics of supernovae and the

dynamics of galaxies. Illiac-IV array processor was used.

Since oceans exchange heat with the atmosphere, a good

understanding of oceans would help in the following areas

Climate predictive analysis

Fishery management

Ocean resource exploration

Coastal dynamics and tides

Oceanography studies use a grid size on a smaller scale and a time

variability on a large scale than those used for atmospheric studies

Socioeconomics and government use

Nobel laureate W. W. Leontief(1980) has proposed an input

output model of the world economy which performs large scale

matrix operations on a CDC scientific computer. This United Nations

supported world economic simulation suggests how a system of

international economic relations that features a partial

disarmament could narrow the gap between the rich and the poor.

In US, FBI uses large computers for crime control

IRS, uses large number of fast mainframe for tax collection and

auditing.

categories

Some of the area where fast computers are used

Finite –element analysis

Computational aerodynamics

Artificial intelligence and automation

Remote sensing application

categories

Seismic exploration- in oil finding

Reservoir modeling- modeling of oil fields

Plasma fusion power- in nuclear fusion research

Nuclear reactor safety

categories

Computer-assisted tomography-the human body can be

modeled by it(CAT) scanning

Genetic engineering- biological systems can be

simulated on supercomputers. A highly pipelined

machine, called the Cytocomputer, has been developed at

the Michigan Environmental research institute for

biomedical image processing.

Weapon research and defence

Unit 1: Memory and Input-Output Subsystem

Hierarchical Memory Structure Virtual Memory System Memory Allocation and Management Cache Memories and Management Input-Output Subsystems

Hierarchical Memory Structure Memory Hierarchy

The design objective of hierarchical memory in a parallel processing system and a multiprogramed uniprocessor system are to attempt to match the processor speed with the rate of information transfer or the bandwidth of the memory at the lowest level and at a reasonable cost In multiprocessor systems, it is frequent that the arrival of concurrent memory requests to memory at the same level of the hierarchy If two or more requests are directed to the same section of the memory at the same level, a conflict is said to occur, which could degrade the performance of the system To avoid conflict the partitioning of the memory at the same level into several modules are done so that some degree of concurrent access can be achieved

Memories in the hierarchy can be classified on the basis of Accessing method Random Access Memory(RAM)-the access time ta of a memory word is independent of its location Sequential Access Memory(SAM)-information is accessed serially Direct Access Storage Device(DASD)-rotational devices made of magnetic materials where any block of information can be accessed directly Speed or Access time –in memory hierarchy the highest level has fastest memory speed and lowest level has slowest Primary- Example is RAM Secondary- Example is DASD

CCD=Charged Coupled Devices

Hierarchical Memory Structure Memory Hierarchy Example- Three level memory hierarchy

The processor usually references an item in memory by providing the address of that item A memory hierarchy is usually organized so that the address space in level i is a subset of that in level i+1 Address Ak in level i is not necessarily address Ak in level i+1, but any information in level i may also exist in level i+1. some of the information in level i may be more updated than that in level i+1 This different copies of same data creates DATA CONSISTENCY or COHERENCE problem between adjacent levels The data consistency problem may also exist between the local memories or caches when two cooperating processes, which are executing concurrently or on separate processors, interact via one or more shared variable.

Hierarchical Memory Structure Memory Hierarchy P2

In modeling the performance of a hierarchical memory HIT RATIO(H) is used, which is a probability of finding the requested information in the memory of a given level H depends upon the granularity of information transfer, the capacity, the management strategy any other factors The hit ratio/success function may be written as H(s), where s=memory size The miss ratio is F(s)=1-H(s) Access frequency at level i, relative number of successful access to level i, is

hi=H(si ) – H(si - 1 )

The granularity(Block size) transferred & the

management policy Design of the processor-memory interconnection network Two performance measures Effective memory access time Utilization of processor

Hierarchical Memory Structure Memory Hierarchy Optimization of Memory Hierarchy

The of n-level memory hierarchy designing is a tradeoff in between performance and cost Performance depends on Program behavior with respect to memory references The access time & memory size of each level

Which is

T= Ʃ* H(sn ) – H(si – 1 )] . ti i=1

The effective access time Ti from the processor to the ith

level of memory hierarchy is the sum of the individual

average access times tk of each level from k=1 to i i

Ti=Ʃ tk

k=1 The effective access time for each memory reference in n-level memory hierarchy is n

T= Ʃ hi .Ti

Which is T= Ʃ* 1 – H(si – 1 )] . ti = Ʃ* F(si – 1 )] . ti

i=1 i=1

Total cost of memory system is n

C= Ʃ c(ti ) . si

i=1 Where c (ti ) is the cost per byte of memory at level i and si is size at level i A typical memory-hierarchy design problem involves

n min T=Ʃ [ F(si – 1 )] . ti

Subject to the constraints C<=C0 where si >0 and ti> 0, for i=1,2,3, …….n

All data are available at level n, thus H(sn )=1

Hierarchical Memory Structure Addressing Schemes for Main Memory

Main memory is partitioned into several independent modules and the address distributed across these modules This scheme is called interleaving. The interleaving of

address among M modules is called M-way

interleaving Two methods of interleaving The high-order m bits are used to select the modules while remaining n-m bits select the address within the module The low-order m bits are used to select the modules while remaining n-m bits select the address within the module

High order bits

Virtual Memory System The Concept of Virtual Memory Using virtual memory(VM) concept, a program whose size is larger than the

available free memory space can be executed

In VM concept a program is divided into pages(equal sized parts) and loaded

into memory one by one as demanded by the cpu

Memory management is required in the following phases

Program structure and design

Compiler assigns names while translating the program modules from

programming language into the modules of machine code or unique identifiers

A linker then combines these modules of unique identifiers

Composite is translated by a loader into main memory location

The set of unique identifiers defines the virtual space or name space

The set of main memory locations allocated to the program defines the physical

memory space

The last phase is dynamic memory management required during the execution

of program

Virtual Memory System The Concept of Virtual Memory Let the name space Vj generated by the jth program running on a processor consists of a set of n unique identifiers

Vj={0,1, ……….n-1}

Let the memory space allocated to the program in execution has m locations

M={0,1, ………..m-1}

Since the allocated memory space may vary with program execution, m is a

function of time. At any time t and for each referenced name x ϵ Vj

there is an address map fj (t) : Vj -> M U { Ф}

The function fj (t) is defined by

fj [x,t] = {

if at time t item x is in location y

if at time t item x is missing from M

When item is missing, a fault handler takes following actions

A placement policy selects a location in memory where the fetched item will

be placed

If memory is full, a replacement policy selects item(s) to remove

A fetch policy decides when an item is to be fetched from lower Memory

Virtual Memory System The Concept of Virtual Memory Program Locality : due to the looping, sequential and blocked formatted control

structures inherent in the grouping of instructions, and data in program, cpu

reference generation pattern is predictable. This property is called locality of

reference.

There are three types of localities

Temporal- there is a tendency for a process to reference in the near future the

elements of the reference strings referenced in the recent past. It is due to loops,

temporary variables, or process blocks

Spatial- there is a tendency for a process to make references to a portion of the

virtual address space in the neighborhood of the last reference

Working set(W)

if we consider a hypothetical interval time window ∆ which moves across the

virtual time axis, it can be seen that only a subset of the virtual address space is

needed during the time interval of the history of a process. The subset of the

virtual space referenced during the interval t, t + ∆ is called the working set W (t, ∆).

∆ is a critical parameter to optimize the working set of the process

Virtual Memory System The Concept of Virtual Memory

Virtual Memory System The Concept of Virtual Memory Program Relocation : During the program execution, processor generates logical addresses which are mapped into physical addresses in main

memory When the program is initially loaded, the address mapping is called static relocation and address mapping during the execution is called dynamic relocation static relocation makes it difficult for process to share information which is modifiable during execution One technique for dynamic relocation is the use of relocation/base register. Program may be loaded initially using

static relocation, after which that may be displaced within memory and the contents of relocation register adjusted to reflect the displacement. Two or more processes may share the programs by using different relocation register.

Virtual Memory System Paged Memory System

DIRECT MAPPING

##This method Needs two memory access, slow##

C=Changed bit

Virtual Memory System Paged Memory System In this scheme, the virtual space is partitioned into pages and memory is partitioned into frames

Each virtual page contains virtual page no. ip (mapped)

and displacement iw (unmapped) The address map consist of a page table(PT), which contains base address of the frame in memory, if exist The simplest page table may contain one entry for each possible virtual page One page table for each process, and it is created in main memory at the initiation of the process PTBR(page table base reg) in each processor contains the base address of the page table of the currently running process on the processor

Virtual Memory System Paged Memory System Technique of maintaining multiple virtual address space

In multiprogrammed processor page map contains virtual page

number (ip), a process identification, the RWX, a modified bit (C), and PFA in shared memory

Process identification of currently running process is present in current process register (CPR) of processor

in this scheme virtual page no. of virtual address(VA) is associatively compared with all page map entries(PME) that have same process identification as the current running process. If matched, page frame number is retrived and displacement is concatenated to form physical address. If no match, a page fault interrupt occur which locate the page.

Problem with pure paged memory system

It is inefficient if the virtual space is large.

For example, for 32-bit VA and 1K page size, page address is 22-

bits needs 222 page table entries. Assume for 8 MB main memory

223 / 210 =213 page frames. PTE has 13 bits page frame field

No mechanism for reasonable implementation of sharing

Internal fragmentation, last page may have unused space

Table fragmentation, main memory occupied by PT and so are

unavailable for virtual pages

Virtual Memory System Paged Memory System Example- Address & page table entry formats of VAX-11/780 virtual mem.(VM)

Virtual Memory System Paged Memory System Example-Partition of virtual address space of VAX-11/780 virtual mem.(VM)

Virtual Memory System Paged Memory System Example-Region addressing scheme of VAX-11/780

P0 for program’s region page table

P1 for control’s region page table

Example- VAX-11/780

Virtual Memory System Segmented Memory System

Block-structured HLL programs have high degree of modularity(eg C etc.)

modules are compiled to produce machine codes in a logical space which

further loaded, linked and executed

Set of logically related contiguous data elements are called segments

Segments are allowed to grow and shrink almost arbitrarily, unlike pages

Segmentation is technique for managing virtual space allocation, whereas

paging is a concept to manage the physical space

An element in a segment is referenced by the segment name-element name

pair(<s>, [i]).

During program execution, the segment name<s> is translated into a segment

address by OS and element name is a displacement within a segment

A program consists of a set of linked segments where links are created as a

result of procedure segment calls within the program segment

Virtual Memory System Segmented Memory System Example- Segmentation was used in Burroughs B5500

Each process has ST pointed by STBR Address fields contains base address of segment in main memory

Virtual Memory System Segmented Memory System Example- Segmentation was used in Burroughs B5500

Virtual Memory System Segmented Memory System

When a segment <s> is initially referenced in a process, its segment number is not established in this case an entry must be created in ST A global table, active segment table(AST) is searched to determine whether the segment is active in memory. If it is, the base address and its attribute are returened from AST and an entry is made in AST to indicate that the process is using segment. If entry is not present in AST, file directory search is initiated and appropriate entries are made in AST and ST Known segment table is associated with each process, which contains entries on a set of segments known to the process

Virtual Memory System Paged Segmentation Memory System

Unit 1: Memory and Input-Output Subsystem

Hierarchical Memory Structure Virtual Memory System Memory Allocation and Management Cache Memories and Management Input-Output Subsystems

Memory Allocation & Management Classification of Memory Policies

Two policies: Fixed and variable partitioning for allocation of memory pages to active process If the resident set size zi(t) is fixed for all t during which process Pi is active, then the size vector Z(t) is constant during any interval in which the set of active process is fixed; this is called fixed partitioning In variable partitioning the partition vector Z(t) varies with time. The advantage of fixed partitioning is low overhead of implementation but memory utilization degraded Besides fixed and variable partitioning strategies, a memory policy can be global or local. A local policy involves only the resident set of the faulting process; the global policy considers the history of the resident sets of all active processes in making a decision.

Memory Allocation & Management Classification of Memory Policies

When a page fault occurs, two memory-fetching policies used in fetching the pages of a process. Demand perfecting and demand fetching. In demand perfecting, a number of pages including the faulting page of the process are fetched in anticipation of the process’s future requirements. In demand fetching, only the page referenced is fetched on a miss The ith process’s behavior is described in terms of its reference string, which is a sequence:

Ri(T)=ri(1)ri(2)…….ri(T) Where, ri(k) is the number of the page containing the virtual address references of the process Pi at time k, where k=1,2,….T measures the execution time or virtual time

Memory Allocation & Management Optimal Load Control

In multiprogramming environment memory is used dynamically The number of active processes(degree of multiprogramming) in a parallel processor system is usually greater than the number of processors so that switching among processes can be done This capability requires the memory be able to hold the pages of the active processes in order to reduce context switching time Multiprogramming improves concurrency in the use of all system resources, but the degree of multiprogramming should be varied dynamically to maintain both a low overhead on the system and a high degree of concurrency

A multiprogrammed multiprocessing virtual memory system model

Memory Allocation & Management Optimal Load Control The network has two portion:

(a) Active network-which contains the processor, memory and the file

memory, (b) Passive network-contains

process queue and the policies for admitting new processes to active status. A process is active if it is in active network, where it is eligible to receive processing and have pages in main memory. Each

active process is waiting or in service

Memory Allocation & Management Optimal Load Control

Memory Allocation & Management Memory Management Policies

Memory Allocation & Management Cache Memories and Management

Characteristics of Cache It consists of two parts: cache directory and RAM The memory portion is partitioned into a number of equal-sized blocks called block frames The directory is implemented as some form of associative memory, consists of block address tags and some control bits such as “dirty” bit , a “valid” bit and protection bits The address tag contains the block address of the blocks that are currently in the memory

Memory Allocation & Management Cache Memories and Management

Simplified flowchart of cache operation for fetch

Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer...

Documents

Computer Architecture: Parallel Processing Basicsece740/f13/lib/exe/fetch.php?media=onur... · Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University

+ William Stallings Computer Organization and Architecture Parallel Processing

Parallel Processing & Distributed Systemscse.hcmut.edu.vn/~ptvu/ppds/lec1.pdf · Parallel Processing Terminology Parallel processing Parallel computer – Multi-processor computer

A Multi architecture Parallel-Processing Development ... · PDF fileA Multi architecture Parallel-Processing Development Environment Scott Townsend Sverdrup Technology, Inc. ... computer

Computer Architecture by Kai Hwang Kai Hwang & F. A. Briggs, “Computer Architecture and Parallel Processing”, McGraw Hill

ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Science/Computer... · ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Hesham El-Rewini Southern Methodist University

Advanced Computer Architecture and Parallel Processing Rabie A. Ramadan Rabie@rabieramadan.org http:

Parallel Architecture Overview - Wayne State Universityczxu/ece7660_f05/pca-model.pdf · Parallel Architecture Overview 2 ... A parallel computer is a collection of processing elements

Parallel Processing and Multiprocessors Why Parallel ...ee565/slides/ch4.pdf · Parallel Processing and Multiprocessors why parallel processing? types of parallel processors ... Flynn

CS 61C: Great Ideas in Computer Architecture Lecture 18: Parallel Processing – SIMDcs61c/sp18/lec/18/lec18.pdf · 2018-03-22 · Lecture 18: Parallel Processing - SIMD Flynn* Taxonomy,

ritdelhi.ac.inritdelhi.ac.in/Ce_Sixth_sem.pdf · ADVANCED COMPUTER SYSTEM ARCHITECTURE Rationale ... Computer Architecture and parallel processing-Kai.Hwang, Faye B Briggs . MULTIMEDIA

CMPE 421 Parallel Computer Architecture Multi Processing 1

Advanced computer architecture and parallel processing

Parallel Computer Architecture - University of Oregonipcc.cs.uoregon.edu/lectures/lecture-2-architecture.pdf · Lecture 2 – Parallel Architecture Parallel Computer Architecture

08 parallel processing computer architecture

CMPE655 - Shaaban #1 lec # 1 Spring 2014 1-28-2014 Introduction to Parallel Processing Parallel Computer Architecture: Definition & Broad issues involvedParallel

ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING …ebook.eqbal.ac.ir/Computers - Informatin Technology/Architecture... · ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING

UNIT-V I COMPUTER ARCHITECTURE AND ORGANIZATION · PDF fileUNIT-V I COMPUTER ARCHITECTURE AND ORGANIZATION Blog - ... Pipeline and Vector Processing 1. Parallel Processing ... UNIT-V

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is

CS 252 Graduate Computer Architecture Parallel Processing ...inst.eecs.berkeley.edu/~cs252/fa07/lectures/L17-Parallel.pdf · CS 252 Graduate Computer Architecture Lecture 17 Parallel