28
November 22, 199 9 The University of Texas at Austin 23 - 1 Native Signal Processing Ravi Bhargava Laboratory of Computer Architecture Electrical and Computer Engineering Department The University of Texas at Austin November 22, 1999

Native Signal Processing

Embed Size (px)

DESCRIPTION

Native Signal Processing. Ravi Bhargava Laboratory of Computer Architecture Electrical and Computer Engineering Department The University of Texas at Austin November 22, 1999. Motivation. Need new ways to support emerging applications Multimedia Signal processing 3D graphics - PowerPoint PPT Presentation

Citation preview

  • Native Signal ProcessingRavi BhargavaLaboratory of Computer ArchitectureElectrical and Computer Engineering DepartmentThe University of Texas at Austin

    November 22, 1999

  • MotivationNeed new ways to support emerging applicationsMultimediaSignal processing3D graphicsMany applications have similar propertiesLarge streams of dataData parallelismSmall data typesTight loopsHigh percentage of computationsCommon computations (e.g. MAC)

  • Target ApplicationsIP telephony gatewaysMulti-channel modemsSpeech processing systemsEcho cancelersImage and video processing systemsScientific array processing systemsInternet routersVirtual private network servers

  • Basic IdeaExtend capabilities of general-purpose processorAdditions to the instruction set architectureAdditions to the CPU state

    Exploit Data ParallelismSingle Instruction Multiple Data (SIMD)Multiple operations using one functional unit

  • SIMDUse one register to store many pieces of data

  • Packed ArithmeticUnsigned Packed Add with Saturation

  • Packed ArithmeticMultiply Accumulate

  • MMX Example16-bit Vector Dot-Product

  • Dot-Product (cont.)

  • Alternative Dot-ProductUseful in certain cases

  • Alternative Dot-Product (cont.)Interleaving the Data

  • Support for NSPLibrariesMost common type of supportTradeoff speed for robustnessCompilerCant do too much, but getting betterCan use macros, annotations, and intrinsicsOperating SystemNeeds to know about any new registers and statesNeeds to know about any new exceptions

  • Common ConcernsOrganizing SIMD dataPacking and unpacking SIMD registersShuffling data within the registerswizzling, scatter and gatherPermute unitsAligning dataSpecial load and store instructionsNew data types in compilersPadding data types

  • New InstructionsPacked ComparisonsPacked maximum, minimumPacked AveragePacked Sum of absolute differencesMultiply-accumulateConversion: signed, data length, data typeApproximations: reciprocal, reciprocal square rootPrefetchStore fence

  • Evolution of NSPVisual Instruction Set (VIS)First NSP extension found on Suns UltraSPARCMMXFirst x86 NSP extensions, created for Intels Pentium3DNow!Addition of FP SIMD instructions on AMDs K6-2AltivecAdds new CPU State for Motorolas G4 PowerPC Streaming SIMD ExtensionsNew x86 FP SIMD for Intels Pentium III

  • VISUses 64-bit floating point registersData formats tailored for graphicsNo saturation arithmeticOnly 8-bit x 16-bit multiplicationImplicit assumptions about rounding & number of significant bits3% increase in die areaC Macros and MediaLib for development

  • VIS Tailored for GraphicsAssumes pixels will be most common data

  • UltraSPARC with VIS

  • 3DNow! on K6-II21 new SIMD instructionsIncluding floating point SIMD instructionsPREFETCH instructionApproximation and intra-element instructionsTwo 32-bit FP instructions in one registerTwo 3DNow! instructions possible per cycleLatency of two cycles, throughput of one cycleAliased to the FP registers (like MMX)

  • 3DNow! on the K6-II

  • Streaming SIMD Extension70 new instructions8 new 128-bit SSE registers (OS support)Execute two SSE instructions simultaneouslyDouble-cycle through existing 64-bit hardware2 micro-ops per SSE instructionLatency often two cycles or greaterExplicit scalar SSE instructionsPrefetch to various levels of cacheNon-allocating stores10% increase in die area

  • Pentium III

  • 3DNow! on the AthlonZero switching overhead back to FP!Most instructions have one cycle latency24 more instructions12 more integer SIMD instructions7 more data manipulation instructions5 unique DSP instructionsPacked FP to Integer word conversion with sign extendPacked FP negative accumulatePacked FP mixed negative-positive accumulatePacked integer word to FP conversionPacked swap double-word

  • Athlon

  • Altivec162 new instructionsIntra-element: rotate, shift, max, min, sum acrossApproximations: log2, 2 raised to an exponentNo explicit SAD instructionSeparate 32-entry, 128-bit vector register fileFloating-point and fixed-point data typesInstructions have three source operands

  • Altivec (cont.)Permute UnitShuffle data from any of the 32 byte locations in two operands to any of 16 byte locations in the destinationPermute instructions - one cycleVector UnitThree sub-unitsSimple fixed-point - one cycleComplex fixed-point - three cyclesFloating-point - four cycles

  • Applications using 3DNow!3D games and enginesDescent 4, Diablo 2, Quake III, Unreal 23D video card driversMatrox, Diamond, ATI, nVidia, 3DLabs, 3dfxGraphics APIsOpenGL, DirectX, GlideOther appsNaturally Speaking, LiveArt98, WinAMP, PicVideo, PC-Doctor, QuickTech, XingMP3

  • Additional Information

    MMX talk and additional referenceshttp://www.ece.utexas.edu/~ravib/mmxdsp/

    NSP talk outline and linkshttp://www.ece.utexas.edu/~ravib/nsp/