Upload
nathaniel-cummings
View
218
Download
0
Embed Size (px)
Citation preview
CS854 Pentium III group 1
Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction
MMX Instruction SSE Instruction
System Instruction
CS854 Pentium III group 2
General Purpose Instruction Data transfer Binary integer arithmetic Decimal arithmetic Logic operations Shift and rotate Bit and byte operations Program control String Flag control Segment register operations
CS854 Pentium III group 3
X87 FPU Instruction Floating-point Integer Binary-coded decimal (BCD)
operands
CS854 Pentium III group 4
SIMD Instruction SIMD: Single Instruction Multiple Data
MMX instruction & SSE instruction provides a group of instructions that
perform SIMD operations on packed integer and/or packed floating-point data elements contained in the 64-bit MMX or the 128-bit XMM registers.
enables increased performance on a wide variety of multimedia and communications applications.
CS854 Pentium III group 5
What’s new in Pentium III Pentium III=Pentium II + SSE SSE : Internet Streaming SIMD
Extensions Seventy New Instruction Three Categories:
SIMD-Floating Point New Media Instruction Streaming Memory Instruction
CS854 Pentium III group 6
The implementation of SSE
SSE has 128-bit architectural width Double-cycling the existing 64-bit
data paths. Deliver a realized 1.5 – 2x speedup Only have 10% die size overhead
CS854 Pentium III group 7
SIMD-FP Instruction SIMD feature introduce a new register
file containing eight 128-bit registers Capable of holding a vector of four IEEE
single precision FP data elements Allow four single precision FP operations to
be carried out within a single instruction
CS854 Pentium III group 8
SIMD-FP Instruction
R esverationS tation
ExecutionU nit 0
M ultip lerS IM D FPM M X x87
M M XALU 0
W ire
ExecutionU nit 1
LD /STAG U 1
LD /STAG U 0
S toreD ata
S IM D FPAdder
Shuffle
M M XALU 1
R eciprocalEstim ates
Port 0 Port 1
Port 2 Port 3 Port 4
uopDispatch
D ispatchM echan ism
CS854 Pentium III group 9
SIMD-FP Instruction The dispatch mechanism allows
half of a SIMD multiply and half of an independent SIMD add to be issued together
The peak rate of one 128-bit operation is when the instructions alternate between different execution unit.
CS854 Pentium III group 10
SIMD-FP Instruction The news unit for shuffle instruction and
reciprocal estimate Rearranging elements within a vector Two approximation instructions: RCP RSQRT
Adding a new state SIMD-FP and MMX or x87 instruction can be
used concurrently A new control/status register MXCSR
CS854 Pentium III group 11
SIMD-FP Instruction Support two modes of FP arithmetic
Full IEEE-754 mode Flush-to-zero(FTZ) mode
More that SIMD-FP arithmetic Need perform operations on a subset of
elements within a vector SIMD logical Instructions
(AND,ANDN,OR,XOR) MOVMSKPS instruction
CS854 Pentium III group 12
SIMD-FP Instruction With MOVHPS,MOVLPS, SHUFPS
instructions,Pentium III can transpose vectors with only a small overhead.
Drawback of this implementation Code-scheduling dilemma
W 3 Z3 Y3 X3 W 2 Z2 Y2 X2 W 1 Z1 Y1 X1 W 0 Z0 Y0 X0
X0X1X2X3
Y3 X3 Y2 X2 Y1 X1 Y0 X0
M em ory
R egisters
R egister
S H U F P S
M O V H P SM O V H P S
M O V LP S M O V LP S
R d
R d R s
CS854 Pentium III group 13
New Media Instruction New integer instruction—extensions
to the MMX instruction set Accelerate important multimedia
tasks PMAX PMIN : Viterbi-Search algorithm in
speech recognition PAVG: accelerate video decoding PSADBW: Speed motion-search in video
encoding
CS854 Pentium III group 14
Streaming Memory Instruction One Downside of SIMD engines
Increase the processing rate above the memory system’s ability to supply data
Intel increased the throughput of the memory system and the P6 bus Prefetch instruction Streaming store Enhanced write combining(WC)
CS854 Pentium III group 15
Streaming Memory Instruction
Prefetch instruction Bring data into the cache before the
program actually needs it Overlap processing with long-latency
memory read Just hints never cause a program fault
so can be hoisted arbitrarily far and retired before the memory access completes
CS854 Pentium III group 16
Streaming Memory Instruction
Specify the cache level to which data will be prefetched
Streaming store Instruction Store data directly to
memory,bypassing the caches Avoid polluting the caches when it
knows the data being stored will not be accessed again soon
CS854 Pentium III group 17
Streaming Memory Instruction
Enhanced write combining (WC) Increase to four WC buffers Improve the buffer-management
policies SFENCE instruction
Flush the wc buffer Ensure that all prior stores are globally
visible
CS854 Pentium III group 18
Conclusion Almost the same core with
Pentium II SSE enhance the multimedia
capability SSE has some advantages over
3Dnow