18
CS854 Pentium III group 1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

Embed Size (px)

Citation preview

Page 1: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 1

Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction

MMX Instruction SSE Instruction

System Instruction

Page 2: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 2

General Purpose Instruction Data transfer Binary integer arithmetic Decimal arithmetic Logic operations Shift and rotate Bit and byte operations Program control String Flag control Segment register operations

Page 3: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 3

X87 FPU Instruction Floating-point Integer Binary-coded decimal (BCD)

operands

Page 4: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 4

SIMD Instruction SIMD: Single Instruction Multiple Data

MMX instruction & SSE instruction provides a group of instructions that

perform SIMD operations on packed integer and/or packed floating-point data elements contained in the 64-bit MMX or the 128-bit XMM registers.

enables increased performance on a wide variety of multimedia and communications applications.

Page 5: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 5

What’s new in Pentium III Pentium III=Pentium II + SSE SSE : Internet Streaming SIMD

Extensions Seventy New Instruction Three Categories:

SIMD-Floating Point New Media Instruction Streaming Memory Instruction

Page 6: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 6

The implementation of SSE

SSE has 128-bit architectural width Double-cycling the existing 64-bit

data paths. Deliver a realized 1.5 – 2x speedup Only have 10% die size overhead

Page 7: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 7

SIMD-FP Instruction SIMD feature introduce a new register

file containing eight 128-bit registers Capable of holding a vector of four IEEE

single precision FP data elements Allow four single precision FP operations to

be carried out within a single instruction

Page 8: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 8

SIMD-FP Instruction

R esverationS tation

ExecutionU nit 0

M ultip lerS IM D FPM M X x87

M M XALU 0

W ire

ExecutionU nit 1

LD /STAG U 1

LD /STAG U 0

S toreD ata

S IM D FPAdder

Shuffle

M M XALU 1

R eciprocalEstim ates

Port 0 Port 1

Port 2 Port 3 Port 4

uopDispatch

D ispatchM echan ism

Page 9: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 9

SIMD-FP Instruction The dispatch mechanism allows

half of a SIMD multiply and half of an independent SIMD add to be issued together

The peak rate of one 128-bit operation is when the instructions alternate between different execution unit.

Page 10: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 10

SIMD-FP Instruction The news unit for shuffle instruction and

reciprocal estimate Rearranging elements within a vector Two approximation instructions: RCP RSQRT

Adding a new state SIMD-FP and MMX or x87 instruction can be

used concurrently A new control/status register MXCSR

Page 11: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 11

SIMD-FP Instruction Support two modes of FP arithmetic

Full IEEE-754 mode Flush-to-zero(FTZ) mode

More that SIMD-FP arithmetic Need perform operations on a subset of

elements within a vector SIMD logical Instructions

(AND,ANDN,OR,XOR) MOVMSKPS instruction

Page 12: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 12

SIMD-FP Instruction With MOVHPS,MOVLPS, SHUFPS

instructions,Pentium III can transpose vectors with only a small overhead.

Drawback of this implementation Code-scheduling dilemma

W 3 Z3 Y3 X3 W 2 Z2 Y2 X2 W 1 Z1 Y1 X1 W 0 Z0 Y0 X0

X0X1X2X3

Y3 X3 Y2 X2 Y1 X1 Y0 X0

M em ory

R egisters

R egister

S H U F P S

M O V H P SM O V H P S

M O V LP S M O V LP S

R d

R d R s

Page 13: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 13

New Media Instruction New integer instruction—extensions

to the MMX instruction set Accelerate important multimedia

tasks PMAX PMIN : Viterbi-Search algorithm in

speech recognition PAVG: accelerate video decoding PSADBW: Speed motion-search in video

encoding

Page 14: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 14

Streaming Memory Instruction One Downside of SIMD engines

Increase the processing rate above the memory system’s ability to supply data

Intel increased the throughput of the memory system and the P6 bus Prefetch instruction Streaming store Enhanced write combining(WC)

Page 15: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 15

Streaming Memory Instruction

Prefetch instruction Bring data into the cache before the

program actually needs it Overlap processing with long-latency

memory read Just hints never cause a program fault

so can be hoisted arbitrarily far and retired before the memory access completes

Page 16: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 16

Streaming Memory Instruction

Specify the cache level to which data will be prefetched

Streaming store Instruction Store data directly to

memory,bypassing the caches Avoid polluting the caches when it

knows the data being stored will not be accessed again soon

Page 17: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 17

Streaming Memory Instruction

Enhanced write combining (WC) Increase to four WC buffers Improve the buffer-management

policies SFENCE instruction

Flush the wc buffer Ensure that all prior stores are globally

visible

Page 18: CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction

CS854 Pentium III group 18

Conclusion Almost the same core with

Pentium II SSE enhance the multimedia

capability SSE has some advantages over

3Dnow