UIUC - CS 433IBM Power7
Adam KunkAnil JohnPete Bohman
Quick Facts
Released by IBM in 2010 (~ February) Successor of the Power6
Clock Rate: 2.4 GHz - 4.25 GHz Feature size: 45 nm ISA: Power ISA v 2.06 Cores: 4, 6, 8 Cache: L1, L2, L3
References: [1]
Why the Power7?
PERCS – Productive, Easy-to-use, Reliable Computer System DARPA funded contract that IBM won in order
to develop the Power7 ($244 million contract, 2006)▪ Contract was to develop a petascale supercomputer
architecture before 2011 in the HPCS (High Performance Computing Systems) project.
IBM, Cray, and Sun Microsystems received HPCS grant for Phase II.
IBM was chosen for Phase III in 2006.
References: [1], [2]
Blue Waters
Side note: The Blue Waters system was meant to
be the first supercomputer using PERCS technology.
But, the contract was cancelled (cost and complexity).
History of Power
2004 2001 2007 2010
POWER4/4+
Dual Core Chip Multi Processing Distributed Switch Shared L2 Dynamic LPARs (32)180nm,
POWER5/5+
Dual Core & Quad Core MdEnhanced Scaling2 Thread SMTDistributed Switch +Core Parallelism +FP Performance +Memory bandwidth +130nm, 90nm
POWER6/6+
Dual Core High Frequencies Virtualization + Memory Subsystem + Altivec Instruction Retry Dyn Energy Mgmt 2 Thread SMT + Protection Keys 65nm
POWER7/7+
4,6,8 Core 32MB On-Chip eDRAM Power Optimized Cores Mem Subsystem ++ 4 Thread SMT++ Reliability + VSM & VSX Protection Keys+ 45nm, 32nm
POWER8
Future
First Dual Corein Industry
HardwareVirtualizationfor Unix & Linux
FastestProcessorIn Industry
MostPOWERful &ScalableProcessor inIndustry
References: [3]
Power7 Layout
Cores: 8 Intelligent Cores / chip (socket) 4 and 6 Intelligent Cores available
on some models 12 execution units per core Out of order execution 4 Way SMT per core 32 threads per chip L1 – 32 KB I Cache / 32 KB D
Cache per core L2 – 256 KB per coreChip: 32MB Intelligent L3 Cache on chip
Core
L2
Core
L2
Memory Interface
Core
L2
Core
L2
Core
L2
Core
L2
Core
L2
Core
L2
GX
SMP
FABRIC
POWER
BUS
Memory++
L3 CacheeDRAM
References: [3]
Power7 Options (8, 6, 4 cores)
References: [3]
Power7 Core
Each core implements “aggressive” out-of-order (OoO) instruction execution
The processor has an Instruction Sequence Unit capable of dispatching up to six instructions per cycle to a set of queues
Up to eight instructions per cycle can be issued to the Instruction Execution units
References: [4]
Execution Units
The Power7 processor has a set of 12 execution units: 2 fixed point units 2 load store units 4 double precision floating point units 1 vector unit 1 branch unit 1 condition register unit 1 decimal floating point unit
References: [4]
Pipeline
Exceptions
ILP
SMT
Simultaneous Multithreading SMT1: Single instruction execution
thread per core SMT2: Two instruction execution threads
per core SMT4: Four instruction execution threads
per core
This means that an 8-core Power7 can execute 32 threads simultaneously
Multithreading History
Thread 1 Executing
Thread 0 Executing
No Thread Executing
FX0FX1FP0FP1LS0LS1BRXCRL
Single thread Out of Order
FX0FX1FP0FP1LS0LS1BRXCRL
S80 HW Multi-thread
FX0FX1FP0FP1LS0LS1BRXCRL
POWER5 2 Way SMT
FX0FX1FP0FP1LS0LS1BRXCRL
POWER7 4 Way SMT
Thread 3 Executing
Thread 2 ExecutingReferences: [3]
TLP
Memory
Memory Access
(Look at section 2.1.4 in http://www.redbooks.ibm.com/redpapers/pdfs/redp4639.pdf)
Caches
(Look at section 2.1.6 in http://www.redbooks.ibm.com/redpapers/pdfs/redp4639.pdf)
Performance
Closing/Wrap-up
References
1. http://en.wikipedia.org/wiki/POWER7 2. http://en.wikipedia.org/wiki/PERCS 3. Central PA PUG POWER7 review.ppt
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCEQFjAA&url=http%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fwikis%2Fdownload%2Fattachments%2F135430247%2FCentral%2BPA%2BPUG%2BPOWER7%2Breview.ppt&ei=3El3T6ejOI-40QGil-GnDQ&usg=AFQjCNFESXDZMpcC2z8y8NkjE-v3S_5t3A
References (cont.)
4. http://www.redbooks.ibm.com/redpapers/pdfs/redp4639.pdf