CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383

CSE 8383 - Advanced Computer Architecture

Week-4Week of Feb 2, 2004

engr.smu.edu/~rewini/8383

Contents Reservation Table Latency Analysis State Diagrams MAL and its bounds Delay Insertion Throughput Group Work Introduction to Multiprocessors

Reservation Table A reservation table displays the time-

space flow of data through the pipeline for one function evaluation

A static pipeline is specified by a single reservation table

A dynamic pipeline may be specified by multiple reservation tables

Static Pipeline

Dynamic Pipeline

Reservation Table (Cont.) The number of columns in a reservation

table is called the evaluation time of a given function.

The checkmarks in a row correspond to the time instants (cycles) that a particular stage will be used.

Multiple checkmarks in a row repeated usage of the same stage in different cycles

Reservation Table (Cont.) Contiguous checkmarks

extended usage of a stage over more than one cycle

Multiple checkmarks in one column multiple stages are used in parallel

A dynamic pipeline may allow different initiations to follow a mix of reservation table

Reservation Table

1 2 3 4 5 6 7

A X X X

Latency Analysis The number of cycles between two

initiations is the latency between them

A latency of k two initiations are separated by k cycles

Collision resource conflict between two initiations

Latencies that cause collision forbidden latencies

Collision with latency 2 & 5 in evaluating X

X1 X2 X1 X2 X1

X1 X2 X1 X2

X1 X2 X1

X1 X2 X1 X1

X1 X1 X2

X1 X1 X1 X2

Latency Analysis (cont.) Latency Sequence a sequence of

permissible latencies between successive initiations

Latency Cycle a latency sequence that repeats the same subsequence (cycle) indefinitely

Latency Sequence 1, 8 Latencies Cycle (1,8) 1, 8, 1, 8, 1,

Latency Analysis (cont.) Average Latency (of a latency

cycle) sum of all latencies / number of latencies along the cycle

Constant Cycle One latency value

Objective Obtain the shortest average latency between initiations without causing collisions.

Latency Cycle (1,8)

1 2 3 4 5 6 7 8 9 10

11 12 13

14 15 16

17 18 19

Average Latency = (1+8)/2 = 4.5

Latency Cycle (6)

1 2 3 4 5 6 7 8 9 10

11 12 13

14 15 16

17 18 19

Average Latency = 6

Collision VectorC = (Cm, Cm-1, …, C2, C1)

Ci = 1 if latency i causes collision (forbidden)

Ci = 0 if latency i is permissible

Cm = 1 (always) maximum forbidden latency

Maximum forbidden latency: m <= n-1n = number of column in reservation table

Collision Vector (X after X) Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1 0 1 1 0 1 0

Collision Vector (Y after Y) Forbidden Latencies: 2, 4 Collision Vector = 1 0 1 0

State Diagram It specifies the permissible state

transitions among successive initiations

Collision vector corresponds to the initial state at time t = 1 (initial collision vector)

The next state comes at time t + p, where p is a permissible latency in the range 1 <= p < m

Right Shift Register

The next state can be obtained with the help of an m-bit shift register

1 Collision

Safe to allow an initiation

Each 1-bit shift corresponds to increase in the latency by 1

The next state The next state is obtained by

bitwise ORing the initial collision vector with the shifted register

C.V. = 1 0 1 1 0 1 0 (first state)0 1 0 1 1 0 1 C.V. 1-bit right shifted

1 0 1 1 0 1 0 initial C.V.---------------- OR

1 1 1 1 1 1 1

State Diagram for X

1 0 1 1 0 1 0

1 1 1 1 1 1 11 0 1 1 0 1 1

Cycles Simple cycles each state

appears only once(3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles simple cycles

whose edges are all made with minimum latencies from their respective starting states

(1,8), (3) one of them is MAL

MAL Minimum Average latency At least one of the greedy cycles

will lead to the MAL Consider state diagram for Y, MAL

is 3 (See diagram)

State Diagram for Y

1 0 1 0

1 1 1 11 0 1 1 0 1 1

Bounds on the MAL MAL is lower bounded by the maximum

number of checkmarks in any row of the reservation table. (Shar, 1972)

MAL is lower than or equal to the average latency of any greedy cycle in the state diagram. (Shar, 1972)

The average latency of any greedy cycle is upper-bounded by the number of 1’s in the initial collision vector plus 1. This is also an upper bund on the MAL. (Shar, 1972)

Delay Insertion The purpose is to modify the

reservation table, yielding a new collision vector

This may lead to a modified state diagram, which may produce greedy cycles meeting the lower bound on MAL

Example

S1 S2 S3

output

Example (Cont.)

1 2 3 4 5

S1 X X

S2 X X

S3 X X

Forbidden Latencies: 1, 2, 4C.V. 1 0 1 1

Example (Cont.) State Diagram

1 0 1 13*

MAL = 3

Example (Cont.)

S1 S2 S3

outputD1

Example (Cont.)

1 2 3 4 5 6 7

S1 X X

S2 X X

S3 X X

Forbidden: 2, 6C.V. 1 0 0 0 1 0

Group Activity 1

Find the State Diagram

Pipeline Throughput The average number of task

initiations per clock cycle

The inverse of MAL

Group Activity 2

1 2 3 4

S1 X X

C.V State Diagram Simple Cycles

Greedy Cycles MAL Throughput (t = 20 ns)

Multiprocessors

Introduction Uniprocessor systems are not capable

of delivering solutions to some problems in reasonable time

Multiple processors cooperate to jointly execute a single computational task in order to speed up its execution

Speed-up versus Quality-up

Architecture Background Three major Components

Processors

Memory Modules

Interconnection Network

Parallel and Distributed Computers MIMD Shared Memory

Bus based Switch based CC-NUMA

MIMD Distributed Memory SIMD Computers Clusters Grid Computing

MIMD Shared Memory Systems

Interconnection Networks

M M M M

P P P P P

Bus Based & switch based SM Systems

Global Memory

M M M M

Cache Coherent NUMA

MIMD Distributed Memory Systems

Interconnection Networks

M M M M

P P P P

SIMD Computers

Processor

Memory

von Neumann Computer

Some Interconnection Network

Clusters

Middleware

Programming Environment

Grids Grids are geographically

distributed platforms for computation.

They provide dependable, consistent, pervasive, and inexpensive access to high end computational capabilities.

Interconnection Network Taxonomy

Static Dynamic

Bus-based Switch-based1-D 2-D HC

Single Multiple SS MS Crossbar

CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383

Documents

EE 8390: Marc P. Christensen Associate Professor Electrical Engineering Department Southern Methodist University Dallas, TX 75275 mpc@engr.smu.edu (214)

SMUCSE 73441 Introduction A quick look at network fundamentals F. M. Marchetti, Ph.D. CSE / Rm 353 fmm@engr.smu.edu

8383 1396960336 advanced

Introducing Reliability and Load Balancing in Home Link of Mobile IPv6 based Networks Jahanzeb Faizan, Mohamed Khalil, and Hesham El-Rewini Parallel, Distributed,

Superscalar Architectures Jason Moore and Habib Ammari March 25 th, 2004 CSE 8383: Advanced Computer Architecture Instructor: Prof. Hesham El-Rewini

8383-AT Postcard Hi Res

s2.bitdl.irs2.bitdl.ir/Ebook/Electronics/Barr,Rewini - Fundamentals of Computer... · WILEY SERIES ON PARALLEL AND DISTRIBUTED COMPUTING SERIES EDITOR: Albert Y. Zomaya Parallel &

Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006

10/30/021 ME DATA MINING OVERVIEW Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 mhd@engr.smu.edu

CSE 5350 - Fall 2007Slide 1 Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon Mihaela Iridon, Ph.D. mihaela@engr.smu.edu CSE 5350/7350

ABN: 74 084 669 036 P 02 8383 2100 F 02 8383 … has been achieved and we now have a great platform ... acquisition which totalled a net expense of $21.3 ... Attributable tonnes mined

1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Advanced Computer Architecture CSE 8383

21062021 ATLAX 04 - sintesisdigital.com.mx:8383

Department of Computer Science and Engineering Southern Methodist University 03/17/081 LiGuo Huang lghuang@engr.smu.edu Department of Computer Science

Distinguished Lecturer Program Dinesh Rajan Southern Methodist University (SMU) rajand@engr.smu.edu

years Manufacturer Catalogue HINDERMANN 2015Carthago c-compactline 7968-8383 Carthago C-Class,

CSE 8383 - Advanced Computer Architecture

The El-Rewini/Ali Scheduling of In-Forest Task Graph on Two Processors with Communication Project Presentation By David Y. Feinstein SMU - CSE 8388 Spring

4/24/09 - KSU Spatiotemporal Stream Mining Using EMM Margaret H. Dunham Southern Methodist University Dallas, Texas 75275 mhd@engr.smu.edu This material