Upload
api-3801329
View
4.603
Download
2
Embed Size (px)
Citation preview
Computer Organization Computer Organization and Architectureand Architecture
Pentium ProcessorPentium Processor
Pentium 4 Diagram (Simplified)Pentium 4 Diagram (Simplified)
Pentium 4 Core ProcessorPentium 4 Core ProcessorFetch/Decode UnitFetch/Decode Unit•• Fetches instructions from L2 cacheFetches instructions from L2 cache•• Decode into microDecode into micro--opsops•• Store microStore micro--ops in L1 cacheops in L1 cache
Out of order execution logicOut of order execution logic•• Schedules microSchedules micro--opsops•• Based on data dependence and resourcesBased on data dependence and resources•• May speculatively executeMay speculatively execute
Execution unitsExecution units•• Execute microExecute micro--opsops•• Data from L1 cacheData from L1 cache•• Results in registersResults in registers
Memory subsystemMemory subsystem•• L2 cache and systems busL2 cache and systems bus
Pentium 4 Core ProcessorPentium 4 Core ProcessorSystem bus SpeedSystem bus Speed400MH400MHdatapathdatapath between the L2 memory cache between the L2 memory cache and L1 data cache is 256and L1 data cache is 256--bit bit between L2 memory cache and the prebetween L2 memory cache and the pre--fetch unit continues to be 64fetch unit continues to be 64--bit wide. bit wide. 128 internal registers 128 internal registers
•• Pentium 4 has five execution units Pentium 4 has five execution units working in parallel and two units for working in parallel and two units for loading and storing data on RAM memory.loading and storing data on RAM memory.
•• BTB was increased to 4,096 entries BTB was increased to 4,096 entries
Pentium 4 Core ProcessorPentium 4 Core Processor
each CPU uses its own RISC each CPU uses its own RISC instructions, which are not public instructions, which are not public documented and are incompatible documented and are incompatible with microinstructions from other with microinstructions from other CPUs. I.e., Pentium III CPUs. I.e., Pentium III microinstructions are different from microinstructions are different from Pentium 4Pentium 4Intel doesnIntel doesn’’t tell the depth (size) of t tell the depth (size) of this queue.this queue.
Pentium 4 Design ReasoningPentium 4 Design ReasoningDecodes instructions into RISC like microDecodes instructions into RISC like micro--ops before L1 ops before L1 cachecacheMicroMicro--ops fixed lengthops fixed length•• Superscalar pipelining and schedulingSuperscalar pipelining and scheduling
Pentium instructions long & complexPentium instructions long & complexPerformance improved by separating decoding from Performance improved by separating decoding from scheduling & pipeliningscheduling & pipeliningData cache is write backData cache is write back•• Can be configured to write throughCan be configured to write through
L1 cache controlled by 2 bits in registerL1 cache controlled by 2 bits in register•• CD = cache disableCD = cache disable•• NW = not write throughNW = not write through•• 2 instructions to invalidate (flush) cache and write back then 2 instructions to invalidate (flush) cache and write back then
invalidateinvalidate
Pentium Data TypesPentium Data Types
8 bit Byte8 bit Byte16 bit word16 bit word32 bit double word32 bit double word64 bit quad word64 bit quad wordAddressing is by 8 bit unitAddressing is by 8 bit unitA 32 bit double word is read at A 32 bit double word is read at addresses divisible by 4addresses divisible by 4
Specific Data TypesSpecific Data Types
General General -- arbitrary binary contentsarbitrary binary contentsInteger Integer -- singnedsingned binary valuebinary valueOrdinal Ordinal -- unsigned integerunsigned integerUnpacked BCD Unpacked BCD -- One digit per byteOne digit per bytePacked BCD Packed BCD -- 2 BCD digits per byte2 BCD digits per byte
Floating PointFloating Point
Pentium Floating Point Data TypesPentium Floating Point Data Types
PentPentiumium operoperaations Typestions Types
ArithmetArithmetiiccLogLogiical cal
Data MovementData MovementControl TransferControl Transfer
StrStriing operatng operatiionsons
MMX MMX
Segment RegSegment Regiisterster
ProtectProtectiionon
Cache managementCache management
Pentium Addressing ModesPentium Addressing Modes
•• ImmediateImmediate•• Register operandRegister operand•• DisplacementDisplacement•• BaseBase•• Base with displacementBase with displacement•• Scaled index with displacementScaled index with displacement•• Base with index and displacementBase with index and displacement•• Base scaled index with displacementBase scaled index with displacement•• RelativeRelative
Pentium Addressing Mode CalculationPentium Addressing Mode Calculation
Pentium Instruction FormatPentium Instruction Format
Pentium 4 RegistersPentium 4 Registers
EFLAGS RegisterEFLAGS Register
Control RegistersControl Registers
MMX Register MappingMMX Register Mapping
MMX uses several 64 bit data typesMMX uses several 64 bit data typesUse 3 bit register address fieldsUse 3 bit register address fields•• 8 registers8 registers
No MMX specific registersNo MMX specific registers•• Aliasing to lower 64 bits of existing Aliasing to lower 64 bits of existing
floating point registersfloating point registers
MMX Register Mapping DiagramMMX Register Mapping Diagram
Pentium 4 DiagramPentium 4 Diagram
BREIF DESCRIPTION OF EACH BREIF DESCRIPTION OF EACH PIPELINE STAGEPIPELINE STAGE
PIPELINE STAGESPIPELINE STAGES
BREIF DESCRIPTION OF EACH BREIF DESCRIPTION OF EACH PIPELINE STAGEPIPELINE STAGE
TC TC NxtNxt IPIP: looks at : looks at BTBforBTBfor the next the next microinstruction to be executed. This step microinstruction to be executed. This step takes two stages. takes two stages. TC FetchTC Fetch: Trace cache fetch. Loads, from : Trace cache fetch. Loads, from the trace cache, this microinstruction. This the trace cache, this microinstruction. This step takes two stages. step takes two stages. DriveDrive: Sends the microinstruction to be : Sends the microinstruction to be processed to the resource processed to the resource allocatorallocator and and register renaming circuit. register renaming circuit.
BREIF DESCRIPTION OF EACH BREIF DESCRIPTION OF EACH PIPELINE STAGEPIPELINE STAGE
AllocAlloc: Allocate. Checks which CPU resources will : Allocate. Checks which CPU resources will be needed by the microinstructionbe needed by the microinstructionRenameRename:: If the program uses one of the eight If the program uses one of the eight standard x86 registers it will be renamed into standard x86 registers it will be renamed into one of the 128 internal registers present on one of the 128 internal registers present on Pentium 4. This step takes two stages. Pentium 4. This step takes two stages. QueQue:: Queue. The microinstructions are put in Queue. The microinstructions are put in queues accordingly to their types (for example, queues accordingly to their types (for example, integer or floating point. integer or floating point. SchSch:: Schedule. Microinstructions are scheduled Schedule. Microinstructions are scheduled to be executed accordingly to its type (integer, to be executed accordingly to its type (integer, floating point, etc). Before arriving to this stage, floating point, etc). Before arriving to this stage, all instructions are in order, This step takes all instructions are in order, This step takes three stages three stages
BREIF DESCRIPTION OF EACH BREIF DESCRIPTION OF EACH PIPELINE STAGEPIPELINE STAGE
DispDisp:: Dispatch. Sends the microinstructions to their Dispatch. Sends the microinstructions to their corresponding execution engines. This step takes two corresponding execution engines. This step takes two stages. stages. RFRF: Register file. The internal registers, stored in the : Register file. The internal registers, stored in the instructions pool, are read. This step takes two stages. instructions pool, are read. This step takes two stages. Ex:Ex: Execute. Microinstructions are executed. Execute. Microinstructions are executed. FlgsFlgs: Flags. The microprocessor flags are updated. : Flags. The microprocessor flags are updated. Br CkBr Ck: Branch check. Checks if the branch taken by the : Branch check. Checks if the branch taken by the program is the same predicted by the branch prediction program is the same predicted by the branch prediction circuit. circuit. Drive:Drive: Sends the results of this check to the branch target Sends the results of this check to the branch target buffer (BTB) present on the processorbuffer (BTB) present on the processor’’s entrances entrance
Power pc processors summaryPower pc processors summary
5858512kB512kB32KB inst32KB inst64KB 64KB datdat
2.5GHz2.5GHz20032003G5G5
256KB 256KB ––1MB1MB
32KB inst32KB inst32KB 32KB datdat
500MHz500MHz19991999G4G4
6.356.35256KB 256KB ––1MB1MB
32KB inst32KB inst32KB 32KB datdat
200200--366MHz366MHz
19971997740/750 740/750 (G3)(G3)
3.63.6--5.15.132KB inst32KB inst32KB 32KB datdat
166166--350MHz350MHz
19941994604/ 604e604/ 604e
1.61.6--2.62.6--16KB inst16KB inst16KB 16KB datdat
100100--300MHz300MHz
19941994603/ 603e603/ 603e
2.82.8----5050--120Mhz120Mhz
19931993601601
Number of Number of transistors transistors (106)(106)
L2 cacheL2 cacheL1 cacheL1 cacheClock Clock SpeedsSpeeds
First Ship First Ship DateDate
POWER PC BLOCK DIAGRAMPOWER PC BLOCK DIAGRAM
Power pc G5 cachePower pc G5 cache
L1: eight way set associativeL1: eight way set associativeL2:two way ( 256k, 512k or 1MBL2:two way ( 256k, 512k or 1MBL3: L3: offchipoffchip uptoupto 1MB1MB
PowerPC Data TypesPowerPC Data Types
8 (byte), 16 (8 (byte), 16 (halfwordhalfword), 32 (word) and 64 ), 32 (word) and 64 ((doubleworddoubleword) length data types) length data typesFixed point processor recognisesFixed point processor recognises::•• Unsigned byte, unsigned Unsigned byte, unsigned halfwordhalfword, signed , signed
halfwordhalfword, unsigned word, signed word, , unsigned word, signed word, unsigned unsigned doubleworddoubleword, byte string , byte string
•• Floating pointFloating point•• IEEE 754IEEE 754•• Single or double precisionSingle or double precision
PowerPC Addressing ModesPowerPC Addressing ModesLoad/store architectureLoad/store architecture•• IndirectIndirect
Instruction includes 16 bit displacement to be added to base Instruction includes 16 bit displacement to be added to base register (may be GP register)register (may be GP register)Can replace base register content with new addressCan replace base register content with new address
•• Indirect indexedIndirect indexedInstruction references base register and index register (both maInstruction references base register and index register (both may y be GP)be GP)EA is sum of contentsEA is sum of contents
Branch addressBranch address•• AbsoluteAbsolute•• RelativeRelative•• IndirectIndirect
ArithmeticArithmetic•• Operands in registers or part of instructionOperands in registers or part of instruction•• Floating point is register onlyFloating point is register only
PowerPC Memory Operand PowerPC Memory Operand Addressing ModesAddressing Modes
lwzlwz r3, 4(r1)r3, 4(r1) (without update)(without update)r3 = mem[r1+4] r3 = mem[r1+4] lwzulwzu r3, 4(r1)r3, 4(r1) (with update)(with update)r3 = mem[r1+4]r3 = mem[r1+4]r1 = r1 = r1r1 + 4 + 4
lwzxlwzx r3, r1, r2r3, r1, r2r3 = memory[r1+r2] r3 = memory[r1+r2] lwzuxlwzux r3, r1, r2r3, r1, r2r3 = memory[r1+r2]r3 = memory[r1+r2]r1 = r1 = r1r1 + r2 + r2
PowerPC instruction formatPowerPC instruction format
PowerPC instruction formatPowerPC instruction format
PowerPC User Visible RegistersPowerPC User Visible Registers
PowerPC Register FormatsPowerPC Register Formats