33
CSCI 232 © 2005 JW Ryder 1 Parallel Processing • Large class of techniques used to provide simultaneous data processing tasks • Purpose: Increase computational speed of the computer • A parallel processing system is able to process multiple tasks simultaneously

CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

Embed Size (px)

Citation preview

Page 1: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 1

Parallel Processing

• Large class of techniques used to provide simultaneous data processing tasks

• Purpose: Increase computational speed of the computer

• A parallel processing system is able to process multiple tasks simultaneously

Page 2: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 2

Parallel Processing

• Instruction in ALU, next instr. read from memory

• 2 or more ALUs, 2 or more processors

• Speed up, throughput - amount of processing that can be done in a given amount of time

• Amount of hardware increases, cost increases, complexity increases

Page 3: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 3

Parallel Processing

• Viewed at various levels of complexity

• Lowest - distinguish between serial and parallel load registers

• Higher level - Multiple functional units (FU)– Arithmetic

• Adder-subtractor, Integer multiplier

– Logic• Logic unit, Incrementer, Shifter

– Floating point• add-subtract, multiply, divide

Page 4: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 4

Parallel Processing Classification

• Internal organization of processors

• Interconnection structure between processors

• Flow of information through the system

• Organization of computer system by number of instructions and data items that are manipulated simultaneously

Page 5: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 5

• Normal operation of computer is fetch from memory then execute in processor

• Sequence of instructions read from memory is instruction stream

• Operations performed on the data in the processor is data stream

• Parallel processing may occur in the instruction stream, data stream, or both

Classifications

Page 6: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 6

• SISD - Single Instruction Single Data

• SIMD - Single Instruction Multiple Data

• MISD - Multiple Instruction Single Data

• MIMD - Multiple Instruction Multiple Data

4 Major Groups

Page 7: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 7

• Single computer containing a– Control Unit– Processing Unit– Memory Unit

• Instructions executed sequentially

• System may or may not have internal parallel processing capabilities– Multiple FUs or pipelining

SISD

Page 8: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 8

• Organization including many processing units under supervision of a common control unit

• All processors receive the same instruction from the control unit

• Operate on different items of data

• Shared memory unit must contain multiple modules so that it can communicate with all processors simultaneously

• Array processor

SIMD

Page 9: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 9

• Only of theoretical interest

MISD

Page 10: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 10

• Computer system capable of processing several programs at the same time

• Most multiprocessor and multi-computer systems are in this category

• Flynn’s classification depends on distinction between the performance of the control unit and the data processing unit

• Emphasizes behavioral characteristics of the computer system rather than its operational structures and interconnections

MIMD

Page 11: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 11

• Pipelining does not fit into Flynn’s parallel processing classification scheme

• Only 2 categories used are SIMD, MIMD

Pipelining

Page 12: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 12

• Multiprocessor system is an interconnection of 2 or more CPUs with memory and input-output equipment

• ‘Processor’ in multiprocessor can mean either a central processing unit (CPU) or an input-output processor (IOP)

• System with single CPU and multiple IOPs is not considered (usually) a multiprocessor

Multiprocessors

Page 13: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 13

• Both support concurrent operations• Computers are interconnected with

each other by means of communications lines to form a computer network– Consists of several autonomous

computers that may or may not communicate with each other

• Multiprocessor system controlled by one operating system that provides interaction between processors and all components in the system cooperate to solve the problem at hand

Multiprocessors / Multicomputers

Page 14: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 14

Multiprocessors• Microprocessors major

motivation - cheap, small

• VLSI helps make it possible too

• Improves reliability– mutual funds, some loss of

efficiency

• Benefits– Improved system performance– Computations can proceed in

parallel in 2 ways• Multiple independent jobs run in

parallel

• Single job can be partitioned into multiple parallel tasks

Page 15: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 15

Multiprocessors• Overall functions can be partitioned

into several tasks• System tasks can be allocated to

specialized processors– Designed for optimal performance– Example: One processor

performs standard tasks for an industrial process and others sense and control various parameters such as temperature and flow rate

– Example: One processor takes care of high speed floating point operations while other processes standard operations and tasks

Page 16: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 16

Performance Improvement

• Decompose problem into multiple discrete tasks

• User can explicitly direct computer to split tasks

• Provide a compiler that automatically detects when parts of program can be split– Parallelizing compiler

• Multiprocessors classified by way memory is organized

Page 17: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 17

Tightly Coupled

• A multiprocessor system with common shared memory– Shared memory or Tightly

coupled multiprocessor

• Does not preclude each processor from having own local memory

• Most commercial tightly coupled systems provide cache memory for each CPU

• In addition, global common memory provided that all CPUs can access

Page 18: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 18

Loosely Coupled

• Distributed memory = Loosely coupled

• Each processing element (PE) is a loosely coupled system has its own local memory

• Processors tied together by switching scheme designed to route information between processors through a message passing scheme

• Programs and data relayed in packets consisting of address, data, error detection codes

Page 19: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 19

Loosely Coupled

• Packets either destined for a specific processor or grabbed by first processor that finds it depending on communication system design

• Most efficient when interaction between tasks is minimal

• Tightly coupled tasks can tolerate higher degree of interaction between tasks

Page 20: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 20

Interconnection Structures

• Components forming a multiprocessor are

– CPUs

– IOPs

– A memory unit (may be partitioned into separate modules)

• Interconnections can have different physical configurations

– Depending on number transfer paths available between processors and memory in shared memory system

– Depending on number of transfer paths among PEs in a loosely coupled system

Page 21: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 21

Physical Forms

• Time-Shared Common Bus• Multiported Memory• Crossbar Switch• Multistage Switching Network• Hypercube System

Page 22: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 22

Time-Shared Common Bus

• N processors connected through a common bus to a memory unit

• Only 1 processor can have access (communicate with) the memory unit or another processor at a time

• Transfer operations conducted by processor that is in control of the bus

• Other processors must wait, checking availability

• Command issued to inform destination that communication is requested– What operation, from where

• Destination responds and transfer begins

Page 23: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 23

Common Bus• Bus Contention• Resolved by including a bus

controller– Priorities

• Restricted to a single transfer at a time– When one processor transferring

to/from memory other processors are either busy with internal processing or idle waiting

• System overall transfer rate is limited by speed of bus

• Multiple buses possible but you pay penalty ($$)

Page 24: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 24

Dual Buses

Not more economical• Local buses, local memory• System bus controller is big

coordinator• Local memory can be cache memory

– Coherency problems possible

Page 25: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 25

Multiported Memory

• Separate buses between each memory module (MM) and processor

• Each processor bus connected to each MM

• Processor bus consists of – Address

– Data

– Control lines

• MM has 4 ports, 1 for each bus

Page 26: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 26

Multiported Memory

• MM must have internal logic to determine which bus has control

• Fixed priorities assigned to each memory port (1,2,3,4)

• Advantage: High transfer rate• Disadvantage:

– Expensive memory control logic

– Many cables and connectors

• Usually only appropriate for small number of processors

Page 27: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 27

Crossbar Switch

• Crosspoints placed at intersections of processor buses and memory buses

• See figure 13-4 on page 495• Each switch determines path (control

logic)– Examines address on bus

– Resolves conflicts on predetermined, hardcoded definition

• See figure 13-5 on page 495– Data both directions

– Multiplexers select data (remember select lines??)

Page 28: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 28

Crossbar Switch

• Supports simultaneous transfers from all MM– Separate path associated with each MM

• Hardware can be large and complex • Number switches needed is

Processors x MM

Page 29: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 29

Multistage Switching Network

• Basic Component is a 2-input 2-output interchange switch

• See figure 13-6 on page 496 - explain

• Switch can arbitrate between conflicts

• Can use to build a switching network• See figure 13-7 on page 497 -

explain

Page 30: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 30

Patterns & Omega

• Not all patterns are always available to all processors

• P1 accessing 0xx then P2 can only access 1xx

• Used in both tightly and loosely coupled systems

• Omega Switching Network - see figure 13-8 on page 498– Exactly 1 path from each source to each

MM

– Some patterns cannot be connected simultaneously (000 and 001)

• 1 switch 1 signal at a time

Page 31: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 31

Omega Network

• Tightly Coupled Systems– Sources - Processorrs

– Destinations - MM

• Loosely Coupled Systems– Source - Processor

– Destination - Processor

Page 32: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 32

Hypercube

• Hypercube or binary n-cube• Loosely coupled system composed

of N = 2n processors interconnected in an n-dimensional binary cube

• Each node contains CPU, local memory, I/O interfaces

• Direct communications paths to n other nodes (1 hop)

• There are 2n distinct n-bit binary addresses to be assigned to the processors

• Each neighboring processor address differs by exactly 1 bit position

• See figure 13-9 on page 499

Page 33: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational

CSCI 232 © 2005 JW Ryder 33

• Will take from 1 to n hops (max source to destination)

• Routing procedure– XOR Source and Destination addresses

• Result will show on which axes addresses differ

– Send along any indicated axis

– Repeat until arrival at destination

Routing Messages