18
Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar Aingaran, Kunle Olukotun Sun Microsystems Charalampos S. Nikolaou [email protected] Department of Informatics and Telecommunications 25 June 2008

Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Niagara: A 32-Way Multithreaded SparcProcessor

Poonacha Kongetira, Kathirgamar Aingaran, Kunle OlukotunSun Microsystems

Charalampos S. [email protected]

Department of Informatics and Telecommunications

25 June 2008

Page 2: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

GoalArchitectural GoalGoal’s characteristicsSun’s Approach

NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem

PerformancePower Consumption

Page 3: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Architectural Goal

Provide:

I high performance for commercial server applications

I low levels of power consumption

Page 4: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Goal’s characteristics

Commercial server applications tend to have:

I Low ILPhigh cache miss rates (large working sets/poor locality)many unpredictable branchesfrequently undetectable load-load dependencies=> memory access time limits performance

I High TLPlarge numbers of parallel client requests

I High power consumption400− 700W /foot2 for racked server clusters in Google

Page 5: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Sun’s Approach

I Ultra Sparc T1 Processor - Niagara 1

I Avoids high-latency communication between multiprocessors(SMP)

I Multicore approach (cores aggregated on a single die)

I Fine-grain multithreading within core

I Small L1 cache per core

I L2 cache shareable by cores

Page 6: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

GoalArchitectural GoalGoal’s characteristicsSun’s Approach

NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem

PerformancePower Consumption

Page 7: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Niagara Overview

I 8 cores4 threads per core1 pipeline (Sparc pipeline) per core2 L1 caches (instruction/data) per core shareable by the 4 threadsthread scheduling per core

I 3-Mbyte L2 cache4-way banked and pipelined for high bandwidth12-way set-associative for minimizing conflict missesshared by all threads

I crossbar interconnect of up to 200GB/s bandwidthconnects Sparc pipes with L2 cache banks and other shared

resourcesprovides a port for accessing the I/O subsystemuses age-based priority scheme

I 4 channels of DDR2 DRAMmaximum bandwidth up to 20GB/scapacity up to 128GB

Page 8: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Niagara Processor

Page 9: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Sparc pipe

Single-issue pipeline with six stages: Fetch, Select, Decode,Execute, Memory, Write Back

Unique resources per thread:

I set of registers

I instruction buffer

I store buffer

Shared resources among threads:

I L1 cache

I translation look-aside buffers (TLB — ITLB, DTLB)

I ALU, divider, multiplier, shifter

Page 10: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Sparc pipe block diagram

Page 11: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Thread scheduling (1/2)

Policy based on:

I LRU status

I instruction type

I cache misses

I traps

I resource conflicts

I speculative loads

Figure: Thread selection: all threadsavailable

Page 12: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Thread scheduling (2/2)

Figure: Thread selection: only two threads available (0, 1)

Page 13: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Integer Register File

I One register file per thread

I A reg. file consists of 8 windows, whichconstists of 8 Ins , 8 Outs and 8 Locals regs

I A window corresponds to a procedure call

I Between two procedure calls the windowsshare the registers Ins and Outs

I Only one window is active

I Reads/writes take a single cycle Figure: Integer register file perthread

Page 14: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Memory Subsystem

I 16 KB L1 instruction cache4-way set-associative with a line (block) size of 32 bytesrandom replacement scheme for area savings

I 8 KB L1 data cache4-way set-associative with a line size of 16 byteswrite-through policy (allocate on load, no-allocate on

stores)

L2 cache:

I maintains a sharers list at L1-line granularity

I stores do not update L1 caches until they have updated theL2 cache

I copy-back policy (write-back dirty lines, drop clean lines)

L1 caches succeed 10% miss rate. Threads per core hide thelatencies from L1 and L2 misses.

Page 15: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

GoalArchitectural GoalGoal’s characteristicsSun’s Approach

NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem

PerformancePower Consumption

Page 16: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

Power Consumpion Performace

Niagara’s dissipation of power ranges from 60 to 72 W with itspeak to 75 W.

Figure: Power consumption of various processors

Page 17: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

References

P. Kongetira, K. Aingaran, K. Olukotun,Niagara: A 32-Way Multithreaded SPARC Processor, IEEEMicro, March-April 2005, pp. 21-29.

Wikipedia,Comparison of power consumption of some nearly modernCPUshttp://en.wikipedia.org/wiki/CPU_power_dissipation#Intel_processors, 2006.

Page 18: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar

The End

Thank you!