38
Computer Organization Computer Organization and Architecture and Architecture Pentium Processor Pentium Processor

Pentium 4 Structure

Embed Size (px)

Citation preview

Page 1: Pentium 4 Structure

Computer Organization Computer Organization and Architectureand Architecture

Pentium ProcessorPentium Processor

Page 2: Pentium 4 Structure

Pentium 4 Diagram (Simplified)Pentium 4 Diagram (Simplified)

Page 3: Pentium 4 Structure

Pentium 4 Core ProcessorPentium 4 Core ProcessorFetch/Decode UnitFetch/Decode Unit•• Fetches instructions from L2 cacheFetches instructions from L2 cache•• Decode into microDecode into micro--opsops•• Store microStore micro--ops in L1 cacheops in L1 cache

Out of order execution logicOut of order execution logic•• Schedules microSchedules micro--opsops•• Based on data dependence and resourcesBased on data dependence and resources•• May speculatively executeMay speculatively execute

Execution unitsExecution units•• Execute microExecute micro--opsops•• Data from L1 cacheData from L1 cache•• Results in registersResults in registers

Memory subsystemMemory subsystem•• L2 cache and systems busL2 cache and systems bus

Page 4: Pentium 4 Structure

Pentium 4 Core ProcessorPentium 4 Core ProcessorSystem bus SpeedSystem bus Speed400MH400MHdatapathdatapath between the L2 memory cache between the L2 memory cache and L1 data cache is 256and L1 data cache is 256--bit bit between L2 memory cache and the prebetween L2 memory cache and the pre--fetch unit continues to be 64fetch unit continues to be 64--bit wide. bit wide. 128 internal registers 128 internal registers

•• Pentium 4 has five execution units Pentium 4 has five execution units working in parallel and two units for working in parallel and two units for loading and storing data on RAM memory.loading and storing data on RAM memory.

•• BTB was increased to 4,096 entries BTB was increased to 4,096 entries

Page 5: Pentium 4 Structure

Pentium 4 Core ProcessorPentium 4 Core Processor

each CPU uses its own RISC each CPU uses its own RISC instructions, which are not public instructions, which are not public documented and are incompatible documented and are incompatible with microinstructions from other with microinstructions from other CPUs. I.e., Pentium III CPUs. I.e., Pentium III microinstructions are different from microinstructions are different from Pentium 4Pentium 4Intel doesnIntel doesn’’t tell the depth (size) of t tell the depth (size) of this queue.this queue.

Page 6: Pentium 4 Structure

Pentium 4 Design ReasoningPentium 4 Design ReasoningDecodes instructions into RISC like microDecodes instructions into RISC like micro--ops before L1 ops before L1 cachecacheMicroMicro--ops fixed lengthops fixed length•• Superscalar pipelining and schedulingSuperscalar pipelining and scheduling

Pentium instructions long & complexPentium instructions long & complexPerformance improved by separating decoding from Performance improved by separating decoding from scheduling & pipeliningscheduling & pipeliningData cache is write backData cache is write back•• Can be configured to write throughCan be configured to write through

L1 cache controlled by 2 bits in registerL1 cache controlled by 2 bits in register•• CD = cache disableCD = cache disable•• NW = not write throughNW = not write through•• 2 instructions to invalidate (flush) cache and write back then 2 instructions to invalidate (flush) cache and write back then

invalidateinvalidate

Page 7: Pentium 4 Structure

Pentium Data TypesPentium Data Types

8 bit Byte8 bit Byte16 bit word16 bit word32 bit double word32 bit double word64 bit quad word64 bit quad wordAddressing is by 8 bit unitAddressing is by 8 bit unitA 32 bit double word is read at A 32 bit double word is read at addresses divisible by 4addresses divisible by 4

Page 8: Pentium 4 Structure

Specific Data TypesSpecific Data Types

General General -- arbitrary binary contentsarbitrary binary contentsInteger Integer -- singnedsingned binary valuebinary valueOrdinal Ordinal -- unsigned integerunsigned integerUnpacked BCD Unpacked BCD -- One digit per byteOne digit per bytePacked BCD Packed BCD -- 2 BCD digits per byte2 BCD digits per byte

Floating PointFloating Point

Page 9: Pentium 4 Structure

Pentium Floating Point Data TypesPentium Floating Point Data Types

Page 10: Pentium 4 Structure

PentPentiumium operoperaations Typestions Types

ArithmetArithmetiiccLogLogiical cal

Data MovementData MovementControl TransferControl Transfer

StrStriing operatng operatiionsons

MMX MMX

Segment RegSegment Regiisterster

ProtectProtectiionon

Cache managementCache management

Page 11: Pentium 4 Structure

Pentium Addressing ModesPentium Addressing Modes

•• ImmediateImmediate•• Register operandRegister operand•• DisplacementDisplacement•• BaseBase•• Base with displacementBase with displacement•• Scaled index with displacementScaled index with displacement•• Base with index and displacementBase with index and displacement•• Base scaled index with displacementBase scaled index with displacement•• RelativeRelative

Page 12: Pentium 4 Structure

Pentium Addressing Mode CalculationPentium Addressing Mode Calculation

Page 13: Pentium 4 Structure

Pentium Instruction FormatPentium Instruction Format

Page 14: Pentium 4 Structure

Pentium 4 RegistersPentium 4 Registers

Page 15: Pentium 4 Structure

EFLAGS RegisterEFLAGS Register

Page 16: Pentium 4 Structure

Control RegistersControl Registers

Page 17: Pentium 4 Structure

MMX Register MappingMMX Register Mapping

MMX uses several 64 bit data typesMMX uses several 64 bit data typesUse 3 bit register address fieldsUse 3 bit register address fields•• 8 registers8 registers

No MMX specific registersNo MMX specific registers•• Aliasing to lower 64 bits of existing Aliasing to lower 64 bits of existing

floating point registersfloating point registers

Page 18: Pentium 4 Structure

MMX Register Mapping DiagramMMX Register Mapping Diagram

Page 19: Pentium 4 Structure

Pentium 4 DiagramPentium 4 Diagram

Page 20: Pentium 4 Structure

BREIF DESCRIPTION OF EACH BREIF DESCRIPTION OF EACH PIPELINE STAGEPIPELINE STAGE

Page 21: Pentium 4 Structure

PIPELINE STAGESPIPELINE STAGES

Page 22: Pentium 4 Structure

BREIF DESCRIPTION OF EACH BREIF DESCRIPTION OF EACH PIPELINE STAGEPIPELINE STAGE

TC TC NxtNxt IPIP: looks at : looks at BTBforBTBfor the next the next microinstruction to be executed. This step microinstruction to be executed. This step takes two stages. takes two stages. TC FetchTC Fetch: Trace cache fetch. Loads, from : Trace cache fetch. Loads, from the trace cache, this microinstruction. This the trace cache, this microinstruction. This step takes two stages. step takes two stages. DriveDrive: Sends the microinstruction to be : Sends the microinstruction to be processed to the resource processed to the resource allocatorallocator and and register renaming circuit. register renaming circuit.

Page 23: Pentium 4 Structure

BREIF DESCRIPTION OF EACH BREIF DESCRIPTION OF EACH PIPELINE STAGEPIPELINE STAGE

AllocAlloc: Allocate. Checks which CPU resources will : Allocate. Checks which CPU resources will be needed by the microinstructionbe needed by the microinstructionRenameRename:: If the program uses one of the eight If the program uses one of the eight standard x86 registers it will be renamed into standard x86 registers it will be renamed into one of the 128 internal registers present on one of the 128 internal registers present on Pentium 4. This step takes two stages. Pentium 4. This step takes two stages. QueQue:: Queue. The microinstructions are put in Queue. The microinstructions are put in queues accordingly to their types (for example, queues accordingly to their types (for example, integer or floating point. integer or floating point. SchSch:: Schedule. Microinstructions are scheduled Schedule. Microinstructions are scheduled to be executed accordingly to its type (integer, to be executed accordingly to its type (integer, floating point, etc). Before arriving to this stage, floating point, etc). Before arriving to this stage, all instructions are in order, This step takes all instructions are in order, This step takes three stages three stages

Page 24: Pentium 4 Structure

BREIF DESCRIPTION OF EACH BREIF DESCRIPTION OF EACH PIPELINE STAGEPIPELINE STAGE

DispDisp:: Dispatch. Sends the microinstructions to their Dispatch. Sends the microinstructions to their corresponding execution engines. This step takes two corresponding execution engines. This step takes two stages. stages. RFRF: Register file. The internal registers, stored in the : Register file. The internal registers, stored in the instructions pool, are read. This step takes two stages. instructions pool, are read. This step takes two stages. Ex:Ex: Execute. Microinstructions are executed. Execute. Microinstructions are executed. FlgsFlgs: Flags. The microprocessor flags are updated. : Flags. The microprocessor flags are updated. Br CkBr Ck: Branch check. Checks if the branch taken by the : Branch check. Checks if the branch taken by the program is the same predicted by the branch prediction program is the same predicted by the branch prediction circuit. circuit. Drive:Drive: Sends the results of this check to the branch target Sends the results of this check to the branch target buffer (BTB) present on the processorbuffer (BTB) present on the processor’’s entrances entrance

Page 25: Pentium 4 Structure

Power pc processors summaryPower pc processors summary

5858512kB512kB32KB inst32KB inst64KB 64KB datdat

2.5GHz2.5GHz20032003G5G5

256KB 256KB ––1MB1MB

32KB inst32KB inst32KB 32KB datdat

500MHz500MHz19991999G4G4

6.356.35256KB 256KB ––1MB1MB

32KB inst32KB inst32KB 32KB datdat

200200--366MHz366MHz

19971997740/750 740/750 (G3)(G3)

3.63.6--5.15.132KB inst32KB inst32KB 32KB datdat

166166--350MHz350MHz

19941994604/ 604e604/ 604e

1.61.6--2.62.6--16KB inst16KB inst16KB 16KB datdat

100100--300MHz300MHz

19941994603/ 603e603/ 603e

2.82.8----5050--120Mhz120Mhz

19931993601601

Number of Number of transistors transistors (106)(106)

L2 cacheL2 cacheL1 cacheL1 cacheClock Clock SpeedsSpeeds

First Ship First Ship DateDate

Page 26: Pentium 4 Structure

POWER PC BLOCK DIAGRAMPOWER PC BLOCK DIAGRAM

Page 27: Pentium 4 Structure
Page 28: Pentium 4 Structure

Power pc G5 cachePower pc G5 cache

L1: eight way set associativeL1: eight way set associativeL2:two way ( 256k, 512k or 1MBL2:two way ( 256k, 512k or 1MBL3: L3: offchipoffchip uptoupto 1MB1MB

Page 29: Pentium 4 Structure

PowerPC Data TypesPowerPC Data Types

8 (byte), 16 (8 (byte), 16 (halfwordhalfword), 32 (word) and 64 ), 32 (word) and 64 ((doubleworddoubleword) length data types) length data typesFixed point processor recognisesFixed point processor recognises::•• Unsigned byte, unsigned Unsigned byte, unsigned halfwordhalfword, signed , signed

halfwordhalfword, unsigned word, signed word, , unsigned word, signed word, unsigned unsigned doubleworddoubleword, byte string , byte string

•• Floating pointFloating point•• IEEE 754IEEE 754•• Single or double precisionSingle or double precision

Page 30: Pentium 4 Structure

PowerPC Addressing ModesPowerPC Addressing ModesLoad/store architectureLoad/store architecture•• IndirectIndirect

Instruction includes 16 bit displacement to be added to base Instruction includes 16 bit displacement to be added to base register (may be GP register)register (may be GP register)Can replace base register content with new addressCan replace base register content with new address

•• Indirect indexedIndirect indexedInstruction references base register and index register (both maInstruction references base register and index register (both may y be GP)be GP)EA is sum of contentsEA is sum of contents

Branch addressBranch address•• AbsoluteAbsolute•• RelativeRelative•• IndirectIndirect

ArithmeticArithmetic•• Operands in registers or part of instructionOperands in registers or part of instruction•• Floating point is register onlyFloating point is register only

Page 31: Pentium 4 Structure

PowerPC Memory Operand PowerPC Memory Operand Addressing ModesAddressing Modes

Page 32: Pentium 4 Structure

lwzlwz r3, 4(r1)r3, 4(r1) (without update)(without update)r3 = mem[r1+4] r3 = mem[r1+4] lwzulwzu r3, 4(r1)r3, 4(r1) (with update)(with update)r3 = mem[r1+4]r3 = mem[r1+4]r1 = r1 = r1r1 + 4 + 4

Page 33: Pentium 4 Structure

lwzxlwzx r3, r1, r2r3, r1, r2r3 = memory[r1+r2] r3 = memory[r1+r2] lwzuxlwzux r3, r1, r2r3, r1, r2r3 = memory[r1+r2]r3 = memory[r1+r2]r1 = r1 = r1r1 + r2 + r2

Page 34: Pentium 4 Structure

PowerPC instruction formatPowerPC instruction format

Page 35: Pentium 4 Structure

PowerPC instruction formatPowerPC instruction format

Page 36: Pentium 4 Structure

PowerPC User Visible RegistersPowerPC User Visible Registers

Page 37: Pentium 4 Structure

PowerPC Register FormatsPowerPC Register Formats

Page 38: Pentium 4 Structure