21
64-bit Scalable Chip Multiprocessor (SCMP) Tongji University

64-bit Scalable Chip Multiprocessor ( SCMP)

  • Upload
    reuel

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

64-bit Scalable Chip Multiprocessor ( SCMP). Tongji University. Why SCMP ?. Memory access latency is bottleneck TLP is the trend Flexible, scalable from 1 to 4, up to 16 cores CPU core is small and simple, easier to verify Higher throughput Improve wafer utilization. How SCMP ?. - PowerPoint PPT Presentation

Citation preview

Page 1: 64-bit Scalable Chip Multiprocessor ( SCMP)

64-bit Scalable Chip Multiprocessor

(SCMP)

Tongji University

Page 2: 64-bit Scalable Chip Multiprocessor ( SCMP)

Why SCMP ?

Memory access latency is bottleneck TLP is the trend Flexible, scalable from 1 to 4, up to 16 cores CPU core is small and simple, easier to verify Higher throughput Improve wafer utilization

Page 3: 64-bit Scalable Chip Multiprocessor ( SCMP)

How SCMP ?

Full custom 64-bit CPU core On-chip switch L2 cache and controller Hardware thread scheduler

Page 4: 64-bit Scalable Chip Multiprocessor ( SCMP)

SCMP Block Diagram

Multi-bank L2 Cache

None-Blocking Crossbar Switch

D$

Int. FPU

RF RF RF

Thread Scheduler

Crypto CoprocessorRF

I$

D$

Int. FPU

RF RF RF RF

I$

D$

Int. FPU

RF RF RF RF

I$

D$

Int. FPU

RF RF RF RF

I$

IO

Page 5: 64-bit Scalable Chip Multiprocessor ( SCMP)

4-Core-Architecture Feature

Target Application: Server 4 Multi-thread processor cores 4 MB L2 cache, multi-bank Non-blocking crossbar switch between cores and L2 cache

banks Directory based cache coherency Thread scheduler Reconfigurable crypto-coprocessor FB-DIMM memory controller (possibly)

Page 6: 64-bit Scalable Chip Multiprocessor ( SCMP)

Multi-thread Core Architecture

64-bit MIPS Instruction Set Architecture 4 thread, Coarse Multithreading, only one thread at a time 16 KB L1 instruction cache, 8 KB (or 16KB) data cache 5-8 stage pipeline Including Integer Unit, Floating Point Unit and L1 cache

Page 7: 64-bit Scalable Chip Multiprocessor ( SCMP)

64-bit CPU Core Feature

ST 90nm Technology High speed, 1 GHz Low power consumption Small die size Robust Used as hard core

Full custom:

Page 8: 64-bit Scalable Chip Multiprocessor ( SCMP)

64-bit CPU Core Feature

Coarse Multithreading Make the core design easier Small, simple core

Only one thread at a time Bottleneck : memory access When waiting for memory, thread switched

Totally 4 thread in a core Memory latency more severe in common multiprocessor Masking memory latency by switching thread

Multithreading:

Page 9: 64-bit Scalable Chip Multiprocessor ( SCMP)

Performance gap between processor and memory

Page 10: 64-bit Scalable Chip Multiprocessor ( SCMP)

Multithreading

Page 11: 64-bit Scalable Chip Multiprocessor ( SCMP)

Multithreading Multiprocessor

Page 12: 64-bit Scalable Chip Multiprocessor ( SCMP)

On-chip interconnection

Increasing memory bandwidth Possibly more than one core can access L2 cache

Make L2 cache higher associativity Easier switch design Optimized for low latency

Crossbar:

Page 13: 64-bit Scalable Chip Multiprocessor ( SCMP)

L2 Cache

Multi-banked Higher bandwidth

Multi- memory interface to main memory

InterfaceInterfaceInterfaceInterface

L2Bank 3

Crossbar

L1L1L1L1

L2Bank 2L2

Bank 1L2

Bank 0

Page 14: 64-bit Scalable Chip Multiprocessor ( SCMP)

Cache Coherency

Tracking the processors that have copies of the block

Tracking the states of data block in L2 cache Shared Uncached Exclusive

Directory-Based Cache Coherency:

Directory

Bank 3

Directory

Bank 2

Directory

Bank 1

Directory

Bank 0

L2Bank 3

Crossbar

L2Bank 2L2

Bank 1L2

Bank 0

Page 15: 64-bit Scalable Chip Multiprocessor ( SCMP)

Thread Scheduler

Dispatch threads Hardware logic coupled with OS

Thread switch When L1 cache miss

Load balance (hardware counter) L1 cache hit / miss Core pipeline idle

Configure crypto-coprocessor

Page 16: 64-bit Scalable Chip Multiprocessor ( SCMP)
Page 17: 64-bit Scalable Chip Multiprocessor ( SCMP)

Reconfigurable Crypto-coprocessor

Supporting coding and decoding symmetric algorithms: AES DES, 3DES, GDES RCx

Page 18: 64-bit Scalable Chip Multiprocessor ( SCMP)

Reconfigurable Crypto-coprocessor

Reconfigure controlled by Thread scheduler

CryptoCoprocesso

r

4-ThreadFull Custom

Core

4-ThreadFull Custom

Core

ThreadScheduler

4-ThreadFull Custom

Core

4-ThreadFull Custom

Core

Page 19: 64-bit Scalable Chip Multiprocessor ( SCMP)

OS / Software

Using commercial operation system, such as LINUX

Minimize OS/compiler modification Almost no change in OS/compiler

Optimizing Compiler to improve machine code efficiency

Page 20: 64-bit Scalable Chip Multiprocessor ( SCMP)

FB-DIMM (possibly)

Serial data path

Latency is managed with new channel features

Cost-effective

Server memory in the future

InterfaceInterfaceInterfaceInterface

FBDIMM Interface

Page 21: 64-bit Scalable Chip Multiprocessor ( SCMP)

Thank you !