19
Presented by: Quinn Gaumer CPS 221

Presented by: Quinn Gaumer CPS 221. 16,384 Processing Nodes (32 MHz) 30 m x 30 m Teraflop 1992

Embed Size (px)

Citation preview

Page 1: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Presented by: Quinn GaumerCPS 221

Page 2: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

16,384 Processing Nodes (32 MHz)

30 m x 30 m Teraflop 1992

Page 3: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

With 16,384 processors the interconnect plays a large role

3 Types of Networks◦ Data◦ Control◦ Diagnostic

Page 4: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Easily Attainable High Performance Scaling Data Parallel Programming High Reliability and Availability Space/Time Shared Fast Time to Market Modular

Page 5: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Include Control Processor Processing Nodes Slices of Data and Control Networks

◦ Privileged vs. Non-Privileged Program Isolation Time Sharing

Page 6: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Provide Simple View of Network to Processors

Sharing and Fault Tolerance Decouple Network/Processor by Providing

Contract◦ Software -> ISA -> Hardware

Page 7: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

“The data network promises to eventually accept and deliver all messages injected into the network by the processors as long as the processors promise to eventually eject all messages from the network when they are delivered to the processors. ”

Page 8: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Collection of Memory Mapped FIFOs◦ Outgoing/Ingoing

Restricted Operations◦ Implemented with protected pages

Physical/Relative(Virtual) Address◦ Programs use only relative addresses

Network Independent of User◦ Delivery guaranteed by network not processing

node◦ Requires network diagnostics

Page 9: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Fat Tree Structure◦ Closer to the root, thicker the

tree◦ Ensures no bottlenecks at root

User Partitions and I/O are Sub-trees◦ Guarantees network

independence◦ Messages in partition stay within

partition Many Optimal Node to Node

Paths◦ Choose randomly among open

links

Page 10: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Data can be only 1-5 Words Wormhole Routing CRC Checking done at every Link

◦ Additional !CRC sent when error first found Primary Errors allow Diagnostic Network to

Determine location

Page 11: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Message Counters at every Link Kirchoff’s Law to Determine Missing

Messages What to do with a Bad Chip or Link?

◦ Route Messages Away from Failure◦ Map Out Nearby Processors◦ Which is better?

Both.

Page 12: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Solution: Virtual Channels◦ One channel for request and response◦ 4 channels per chip (Incoming and Outgoing)

Deadlock still possible!◦ User sends but never attempts to receive

messages◦ Higher level languages to implement

communication protocol

Page 13: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Objectives◦ Clear all messages for new

user◦ Allow all messages in transit to

eventually finish “All Fall Down” Method

◦ Evenly misroute all messages in transit to nodes

◦ Message saved at node◦ Resent when swapped in

Page 14: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Control Processor broadcasts program◦ Not instructions(SIMD)

Each Processor runs program on data set Inter-Processor Communication

◦ Hardware Barriers allow for processes to communicate without shared semaphores

Page 15: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Program smaller than instructions◦ Easier to deliver

Local fetch allows commodity processors◦ Fast new RISC processors, less R & D.

Control system useful for other problems Execution of generic MIMD code

◦ Message passing

Page 16: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Broadcasting◦ User/Supervisor◦ Interrupt◦ Utility

Combining◦ Reduction◦ Forward/Backward Scan◦ Router Done

Global Operations◦ Synchronous/Asynchronous OR

Page 17: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Binary Tree Four Types of Packets

◦ Single Source : Broadcasting◦ Multiple Source: Combining◦ Idle: Filler◦ Abstain: Allow control node to skip waiting

Collisions on Network◦ Multiple/Multiple: Buffering based on arrival time◦ Multiple/Single: Single Source Packets Prioritized◦ Single/Single: Error

Page 18: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Control Processor for each Partition◦ Executes scalar code while processing nodes

execute parallel code Connect any Control Processor to any

Partition◦ Problems can occur in control networks too◦ Diagnostics may show part of control network

must be mapped out

Page 19: Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

Binary Network◦ Pods(physical subsystem)

are leaves JTAG

◦ Designed for Multichip…but serial

Do JTAG for each Pod Combine Responses with

OR/AND