Upload
sammy17
View
1.101
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
MicroprocessorsMicroprocessors
Introduction to PowerPC Introduction to PowerPC ArchitectureArchitecture
History & Interesting TidbitsHistory & Interesting Tidbits
OutlineOutline
Motorola has a long tradition as the leading Motorola has a long tradition as the leading provider of embedded technologies has provider of embedded technologies has produced revolutionary microprocessor and produced revolutionary microprocessor and microcontroller solutionsmicrocontroller solutions
And Motorola continues to build on that And Motorola continues to build on that tradition of leadership and innovation with tradition of leadership and innovation with the ever-expanding family of the ever-expanding family of microprocessors that implement the microprocessors that implement the PowerPC instruction set architecturePowerPC instruction set architecture
In these slides, we’ll take a look at just how In these slides, we’ll take a look at just how the PowerPC got to be in the place it is the PowerPC got to be in the place it is today.today.
Background of POWER1Background of POWER1
Part of IBM’s first attempt at making a real Part of IBM’s first attempt at making a real workstationworkstation
POWER – Performance Optimization With POWER – Performance Optimization With Enhanced RISCEnhanced RISC
IBM redefined RISC to mean Reduced IBM redefined RISC to mean Reduced Instruction Set CycleInstruction Set Cycle
Unlike classic RISC design, the POWER1 Unlike classic RISC design, the POWER1 would be a complex processorwould be a complex processor This meant more high level instructions and This meant more high level instructions and
more memory-data processorsmore memory-data processors This goes against initial RISC philosophy!This goes against initial RISC philosophy!
POWER1 Branch unitPOWER1 Branch unit
Had three Instruction Caches – Had three Instruction Caches – branch, integer, and floating point branch, integer, and floating point unitsunits
Branch unit unusually complexBranch unit unusually complexContained a program counter, condition Contained a program counter, condition
code (CC) register, and loop registercode (CC) register, and loop registerCC register – 8 fieldsCC register – 8 fields
First 2 reserved for fixed & float opsFirst 2 reserved for fixed & float opsThe 7The 7thth for vector operations for vector operationsand the rest could be set separatelyand the rest could be set separately
POWER1 Branch unit cont.POWER1 Branch unit cont.
Loop register is a counter for Loop register is a counter for ‘decrement and branch on zero’ ‘decrement and branch on zero’ loops with no branch penaltyloops with no branch penalty
Branch unit could dispatch multiple Branch unit could dispatch multiple instructions while itself executing a instructions while itself executing a program control op (up to four ops at program control op (up to four ops at once, and out of order)once, and out of order)This meant this is one of the first This meant this is one of the first
superscalar CPUs!superscalar CPUs!
Integer/Float unitsInteger/Float units
Two 32-bit registers for the integer unit and Two 32-bit registers for the integer unit and all load/store operationsall load/store operations
Register R0 treated as a constant zero for Register R0 treated as a constant zero for some instructionssome instructions
Used an MQ register for extended precision Used an MQ register for extended precision mutiply/dividesmutiply/divides Similar to the MIPS HI/LO registersSimilar to the MIPS HI/LO registers
Thirty two 64-bit registers for floating point Thirty two 64-bit registers for floating point unitunit Performed only double precision operationsPerformed only double precision operations Used a condition bit to catch float errors (no Used a condition bit to catch float errors (no
exceptions!)exceptions!)
MQ registerMQ register
The MQ Register is 36 bitsThe MQ Register is 36 bitsDuring a multiply instruction, MQ During a multiply instruction, MQ
contains the multipliercontains the multiplierDuring a divide instruction, MQ During a divide instruction, MQ
receives the quotientreceives the quotient It can be shifted right or left, It can be shifted right or left,
independently, or combined with AC independently, or combined with AC into a 72-bit registerinto a 72-bit register
PowerPCPowerPC
Born out of a desire to produce a Born out of a desire to produce a version of the POWER that would version of the POWER that would succeed both the Motorola 68000 & succeed both the Motorola 68000 & Intel x8086Intel x8086
Most notable changes:Most notable changes:Elimination of the MQ registerElimination of the MQ register
Replaced by separate upper and lower half Replaced by separate upper and lower half instructions (able to execute simultaneously)instructions (able to execute simultaneously)
Some complex instructions were removedSome complex instructions were removedEmulated in the new PowerPCEmulated in the new PowerPC
Support for 32-bit floating pointSupport for 32-bit floating point
PowerPC 601 (G1)PowerPC 601 (G1)
Meant to bridge the POWER1 and Meant to bridge the POWER1 and PowerPC featuresPowerPC features
Geared towards consumers using Geared towards consumers using workstations rather than high endworkstations rather than high end
Essentially the same as the POWER1 Essentially the same as the POWER1 except for a 32K cache (rather than except for a 32K cache (rather than separate I/D caches)separate I/D caches)
Held onto many of ‘legacy’ Held onto many of ‘legacy’ instructions from the POWER1instructions from the POWER1
The POWER2 is RISCyThe POWER2 is RISCy
The big selling point of the POWER2 was its The big selling point of the POWER2 was its ability to handle six instructions at one timeability to handle six instructions at one time
However, it came with the caveat “under However, it came with the caveat “under ideal conditions”ideal conditions”
They couldn’t be just any old instructions -- They couldn’t be just any old instructions -- to maintain that performance, the POWER2 to maintain that performance, the POWER2 had to mix exactly two integer instructions, had to mix exactly two integer instructions, two floating-point instructions, and two two floating-point instructions, and two branch or condition-code instructionsbranch or condition-code instructions
POWER2 cont.POWER2 cont.
Other additions to the Power2 were:Other additions to the Power2 were:Quad-word load and store instructionsQuad-word load and store instructionsHardware square root instructionHardware square root instructionNew instructions for conversion of New instructions for conversion of
floating-point values to integers floating-point values to integers Like the POWER1, this was targeted Like the POWER1, this was targeted
to high end systems, leaving average to high end systems, leaving average users to use the PowerPC users to use the PowerPC
PowerPC 603 (G2)PowerPC 603 (G2)
Separated the load/store ops from Separated the load/store ops from the integer unitthe integer unit
Split the branch unit into a Split the branch unit into a fetch/branch unit, a dispatch unit, fetch/branch unit, a dispatch unit, and a completion/exception unitand a completion/exception unit
Added a ‘rename’ buffer in the Added a ‘rename’ buffer in the dispatch unit for speculative dispatch unit for speculative execution using renamed integer & execution using renamed integer & float registersfloat registers
The little processor that The little processor that couldn’tcouldn’t
Strategy for reducing the size of the 603 – Strategy for reducing the size of the 603 – Use a split cache design (instead of a more complex Use a split cache design (instead of a more complex
unified cache)unified cache) Remove "unused“ or “legacy” instructionsRemove "unused“ or “legacy” instructions
Reduced the cost and the power, so 603’s Reduced the cost and the power, so 603’s could be made much cheaper, and at higher could be made much cheaper, and at higher speeds. speeds.
Had a slight performance penalty (per MHz) but Had a slight performance penalty (per MHz) but the chips could be made at higher speeds -- the chips could be made at higher speeds -- which would more than make up for it.which would more than make up for it.
A good idea, but marketing can be A good idea, but marketing can be unpredictableunpredictable
603’s Marketing Blunder603’s Marketing Blunder
The 603 was compared to the 601 and The 603 was compared to the 601 and other high end machinesother high end machines
MHz per dollar, the 603 beat out the MHz per dollar, the 603 beat out the 601601
But simply comparing MHz to MHz, the But simply comparing MHz to MHz, the 601 was largely faster601 was largely faster
So buyers got the impression that they So buyers got the impression that they were getting ripped offwere getting ripped off
A case of mistaken expectations!A case of mistaken expectations!
603 – The Engergizer 603 – The Engergizer processorprocessor
Despite initial marketing problems, Despite initial marketing problems, this processor became prolific and this processor became prolific and had far more variants than any other had far more variants than any other PowerPCPowerPC603e (603+ / Stretch) – used to solve 603e (603+ / Stretch) – used to solve
cache size problemscache size problems603ev, 603p (Valiant), 603r, 603er 603ev, 603p (Valiant), 603r, 603er
(Goldeneye) – manufacturing (Goldeneye) – manufacturing optimizationoptimization
PowerPC 604PowerPC 604
The G2 processors were split into two The G2 processors were split into two different families (the 603's and the different families (the 603's and the 604's). 604's). The 604's were meant to be the ‘bad boys’ The 604's were meant to be the ‘bad boys’
of the desktop - Power and cost were not of the desktop - Power and cost were not as important as pure blinding speed. as important as pure blinding speed.
Unlike the 601 and 603, the 604 can Unlike the 601 and 603, the 604 can do as many as 4 simultaneous do as many as 4 simultaneous instructionsinstructions
PowerPC 604 float supportPowerPC 604 float support
604's also had tweaks to improve its 604's also had tweaks to improve its ability to run inside of its larger L2 cacheability to run inside of its larger L2 cache
Floating Point units can become very Floating Point units can become very dependant on cache and memory dependant on cache and memory performanceperformance
The results:The results: 20% faster than the 603 at integer20% faster than the 603 at integer roughly 70% faster in floating pointroughly 70% faster in floating point Just over twice as fast as the Pentiums of the Just over twice as fast as the Pentiums of the
same timesame time
Dynamic Branch PredictionDynamic Branch Prediction
Processors take big performance penalties Processors take big performance penalties if they can't preload the cacheif they can't preload the cache
Being able to accurately "guess" the most Being able to accurately "guess" the most likely used path can help keep the cache likely used path can help keep the cache "preloaded" and increase processor "preloaded" and increase processor performance performance
The 604 was the first mainstream The 604 was the first mainstream processor to use "Dynamic Branch processor to use "Dynamic Branch Prediction“Prediction“ This greatly increased performanceThis greatly increased performance
G3’s – The Next GenerationG3’s – The Next Generation
Initially, the plan had been to create Initially, the plan had been to create a new chip ‘solely’ based on the 604a new chip ‘solely’ based on the 604
But after the highly successful But after the highly successful second generation of PowerPC's, IBM second generation of PowerPC's, IBM and Motorola decided to split out and Motorola decided to split out development and create more development and create more processorsprocessors
740 (Arthur)740 (Arthur)
The first was the 603 derivativeThe first was the 603 derivative This processor got some changes to the This processor got some changes to the
core (the way it executes instructions)core (the way it executes instructions) Optimized the processor for the Macintosh OSOptimized the processor for the Macintosh OS This of course resulted in a large performance This of course resulted in a large performance
boost, even more so than the boosts offered by boost, even more so than the boosts offered by the new backside cachethe new backside cache
The 740 was fast, extremely small and The 740 was fast, extremely small and efficient efficient It was outperforming Pentium II's while using It was outperforming Pentium II's while using
less than 1/5th the amount of power and size less than 1/5th the amount of power and size
750 (Typhoon)750 (Typhoon) A variant of the 740 that has a fast method A variant of the 740 that has a fast method
of access to the L2 ‘backside’ cache of access to the L2 ‘backside’ cache Allows higher performance Allows higher performance L2 cache runs much faster than most -- and at L2 cache runs much faster than most -- and at
speeds up to the clock rate of the main processorspeeds up to the clock rate of the main processor Cache system really speeds things up, but Cache system really speeds things up, but
requires more electronics (and pins) than requires more electronics (and pins) than the 740the 740 So while the chip cost isn't much more, the So while the chip cost isn't much more, the
added cache can drive the cost of the system up added cache can drive the cost of the system up (and increase the total power usage). (and increase the total power usage).
Still has very good performance per costStill has very good performance per cost
Hardware AsideHardware Aside Aluminum has long been the standard Aluminum has long been the standard
material used for semiconductor wiringmaterial used for semiconductor wiring IBM managed to use copper technology in IBM managed to use copper technology in
their G3’stheir G3’s The result?The result?
Enhance chip performanceEnhance chip performance Reduced die size and power consumptionReduced die size and power consumption
750 first created with standard aluminum 750 first created with standard aluminum design operating at up to 300 MHzdesign operating at up to 300 MHz
Applying IBM's copper manufacturing Applying IBM's copper manufacturing process to the same chip, the 750 featured process to the same chip, the 750 featured speeds of at least 400MHz - a 33 percent speeds of at least 400MHz - a 33 percent performance improvement for the same performance improvement for the same chip! chip!
Make room for 4Make room for 4thth Generation Generation The 603 derived G3 performed very well with its The 603 derived G3 performed very well with its
backside cache and was very cheap to make and backside cache and was very cheap to make and quite scalable by just adding more L2 cache (or quite scalable by just adding more L2 cache (or faster L2 cache)faster L2 cache)
Apple killed clones and focused the product lines, Apple killed clones and focused the product lines, which all reduced demands for as many different which all reduced demands for as many different high-end desktop PPC'shigh-end desktop PPC's
The end results being that the 604 derived G3's The end results being that the 604 derived G3's (code named Habanero), and some of the other (code named Habanero), and some of the other flavors (like ones with better MP support) were flavors (like ones with better MP support) were scrapped in favor of focusing on the G4's. Which scrapped in favor of focusing on the G4's. Which makes sense, considering these other processors makes sense, considering these other processors wouldn't be coming out until basically the same wouldn't be coming out until basically the same time as the G4's anyway, and you shouldn't split time as the G4's anyway, and you shouldn't split into that many different development efforts into that many different development efforts (waste of money)(waste of money)
G4G4 In direct response to Intel’s MMX In direct response to Intel’s MMX
instructions, AltiVec extensions were added instructions, AltiVec extensions were added to the G4 PowerPCto the G4 PowerPC
AltiVec adds a new set of 128-bit registersAltiVec adds a new set of 128-bit registers Separate vector execution unit & Separate vector execution unit &
instruction set supported by branch unitinstruction set supported by branch unit Allows multimedia instruction to be Allows multimedia instruction to be
executed in parallel with both int and float executed in parallel with both int and float opsops
Added an addition VRSAVE register to track Added an addition VRSAVE register to track which vector registers are being usedwhich vector registers are being used Reduces the # of registers needed to be savedReduces the # of registers needed to be saved
G4 cont.G4 cont.
Supports a 2 Megabyte L2 Cache which can Supports a 2 Megabyte L2 Cache which can help performance over the previous 1 MB help performance over the previous 1 MB L2 limit.L2 limit.
The mpx bus (used on the G4) is The mpx bus (used on the G4) is asynchronous and allows for up 4 asynchronous and allows for up 4 outstanding accesses at the same timeoutstanding accesses at the same time The results are up to a 3 fold performance The results are up to a 3 fold performance
increase for memory bound operations.increase for memory bound operations. This is why specs can be so deceptive. Without This is why specs can be so deceptive. Without
changing the speed of the bus at all, changing the speed of the bus at all, Apple/Motorola made it up to 3 times faster!Apple/Motorola made it up to 3 times faster!
ConclusionConclusion
Obviously, the PowerPC architecture will Obviously, the PowerPC architecture will play a part in imbedded technology for play a part in imbedded technology for years to come (due to low cost & energy)years to come (due to low cost & energy)
As far as personal computers and As far as personal computers and workstations go, the PowerPCs generally workstations go, the PowerPCs generally outperform their Pentium counterpartsoutperform their Pentium counterparts
However, much of what’s holding the However, much of what’s holding the PowerPC back is consumer obsession PowerPC back is consumer obsession with MHzwith MHz
MHz vs. Mega BucksMHz vs. Mega Bucks
““Only weeks ago, Motorola announced at a Only weeks ago, Motorola announced at a semiconductor conference that it would soon semiconductor conference that it would soon start shipping G4 processors operating close start shipping G4 processors operating close to the 1GHz mark. During his conference call, to the 1GHz mark. During his conference call, Jobs indicated that Apple would be working Jobs indicated that Apple would be working closely with Motorola to bridge the MHz gap, closely with Motorola to bridge the MHz gap, and introduce faster chips into the G4 and introduce faster chips into the G4 systems. And in a rare preview of the future, systems. And in a rare preview of the future, Jobs indicated that new, faster G4 systems Jobs indicated that new, faster G4 systems would begin shipping within the next 6 would begin shipping within the next 6 months.”months.”
- G4 Store Special Report- G4 Store Special Report
Works CitedWorks Cited
http://www.g4store.com/news/http://www.g4store.com/news/http://www.mackido.com/Hardware/http://www.mackido.com/Hardware/http://developer.apple.com/technoteshttp://developer.apple.com/technotes
//http://www.byte.com/art/9401/sec7/ahttp://www.byte.com/art/9401/sec7/a
rt2.htmrt2.htmhttp://www3.sk.sympatico.ca/jbayko/http://www3.sk.sympatico.ca/jbayko/
cpu5.htmlcpu5.htmlhttp://http://www.mot.comwww.mot.com/SPS/PowerPC//SPS/PowerPC/