Low-Level PlumbingLow-Level PlumbingforforMedia IntegrationMedia Integration
Turner WhittedTurner Whitted
Microsoft ResearchMicrosoft Research
JTW/MSR
Outline Outline
• Part I: Implementation chronologyPart I: Implementation chronology– Then, Now, Then again …Then, Now, Then again …– aka Wheel Of Reincarnationaka Wheel Of Reincarnation
• Part II: Architectural musingPart II: Architectural musing– Perceptual/content requirementsPerceptual/content requirements– Data paths, data typesData paths, data types– Not about programming modelsNot about programming models
JTW/MSR
Starting points – E&S Frame Starting points – E&S Frame BufferBuffer
• NO fixed function NO fixed function unitsunits
• Code for basic Code for basic logiclogic
• Controller Controller programmable by programmable by designer onlydesigner only
• Treated as Treated as peripheralperipheral
RAM
12 bit video (R)
12 bit video (B)
12 bit video (G)
Controller64x48(?) bits
HostPort 1Port N
Bus
Ref: Kajiya, J.T., Sutherland, I.E., and Cheadle, E.C., "A Random-Access Video Frame Buffer," Proceedings of the Conference on Computer Graphics, Pattern Recognition, and Data Structure, UCLA Extension, Los Angeles, California, May 14-16, 1975
JTW/MSR
Starting points – Ikonas Starting points – Ikonas RDS3000RDS3000
• Single, 32-bit wide datapath based on Single, 32-bit wide datapath based on 2901 bit-sliced DSP2901 bit-sliced DSP
• Programmed in C (Gary’s Ikonas Assemler)Programmed in C (Gary’s Ikonas Assemler)• NO fixed function unitsNO fixed function units• Bound [later memory mapped] to single Bound [later memory mapped] to single
applicationapplication
Ref: N. England, A graphics system architecture for interactive application-specific display functions, IEEE CGA, pp. 60-70, Jan 1986.
JTW/MSR
Starting points – Pixar CHAP Starting points – Pixar CHAP
• SIMD processorsSIMD processors
• Loops, conditional Loops, conditional executionexecution
• Focus on parallel Focus on parallel programming programming issuesissues
Ref: Adam Levinthal and Thomas Porter, “Chap – A SIMD Graphics Processor,” Proceedings of SIGGRAPH 84, (18) 3, July 1984, pp. 77 – 82.
CHAP
MEMORY
CHAP VIDEO
16bx16KMEM
16bx16KMEM
16bx16KMEM
16bx16KMEM
16bMAC/SHIFT
16bMAC/SHIFT
16bMAC/SHIFT
16bMAC/SHIFT
CodeCONTROL
UNITCROSSBAR
P-Bus
JTW/MSR
““The” pipelineThe” pipeline• [Mostly] Fixed structure, programmable [Mostly] Fixed structure, programmable
nodesnodes
• Vector graphics legacyVector graphics legacy
One or, optionally two, 9-element
geometry pipelines
16, 20, 32, 40, or 64
Pixel Nodes
Host Bus
Example: AT&T Pixel MachineExample: AT&T Pixel Machine
JTW/MSR
Structural evolutionStructural evolution• Texture engine with remnants of line drawing DNATexture engine with remnants of line drawing DNA
Ras
ter
Set
up
Tex
ture
Map
pin
gR
aste
rize
r
Co
nve
nti
on
alF
ram
e S
tore
DisplayDevice
Pro
gra
mm
able
T&
LP
roce
sso
r
GeometryStream
Texture Stream
Ras
teri
zer
Pro
gra
mm
able
Fra
gm
ent
Pro
cess
or
Co
nve
nti
on
alF
ram
e S
tore
DisplayDevice
Pro
gra
mm
able
Ver
tex
Pro
cess
or
GeometryStream
Texture Stream
JTW/MSR
PerformancePerformance
• Ikonas RDS3000 (1980)Ikonas RDS3000 (1980)– 20 MB/s processor to memory20 MB/s processor to memory– 20 MIPS equiv.20 MIPS equiv.
• Pixar CHAP (1984)Pixar CHAP (1984)– 240 MB/s processor to memory (P-bus)240 MB/s processor to memory (P-bus)– 64 MIPS peak64 MIPS peak
• ATI 9700 (2002)ATI 9700 (2002)– 20800 MB/s chip to memory20800 MB/s chip to memory– 8400 MIPS8400 MIPS
JTW/MSR
API abstractions (OpenGL, API abstractions (OpenGL, DX)DX)• Fixed structure pipeline Fixed structure pipeline
accessed through APIaccessed through API– API tracks hardware API tracks hardware
through several through several generations while generations while maintaining consistencymaintaining consistency
• Integrated with Integrated with mainstream computingmainstream computing
• Easy to programEasy to program– Single largest key to Single largest key to
commercial success of commercial success of graphics systems and graphics systems and applicationsapplications V
erte
x P
roce
ssin
g
Ras
teriz
er
Fra
gmen
t/Pix
el P
roc.
Abstraction layer
Application
JTW/MSR
Complete the circleComplete the circle
• FPGA+memoryFPGA+memory• NO fixed functionNO fixed function• Programmable in Programmable in
C (Verilog C (Verilog actually)actually)
• Good for Good for simplesimple prototypingprototyping– No APINo API– No device driversNo device drivers– No SDKNo SDK
JTW/MSR
Part II: Going forward …Part II: Going forward …
• MotivationMotivation– We’ve tacked every imaginable feature We’ve tacked every imaginable feature
onto what was initially a line drawing onto what was initially a line drawing pipelinepipeline
– It’s time to start overIt’s time to start over– The window of opportunity is wide openThe window of opportunity is wide open
JTW/MSR
Going forward …Going forward …with graphics processorswith graphics processors
• AlternativesAlternatives
3D RasterGraphics
Real-time photorealism
Parallel execution for scientific applications
Parallel execution for interactive applications
Feed eyeballs withIntegrated media
JTW/MSR
Integrated mediaIntegrated media(partial illustration)(partial illustration)
Sprites with depth
Lumigraph Light field
Geometry centric Image centric
Warping Interpolation
Polygon rendering + texture mapping
Fixed geometry
View-dependent geometry
View-dependent texture
Concentric mosaics
JTW/MSR
Geometry ImageGeometry Image
geometry imagegeometry image257 x 257; 12 bits/channel257 x 257; 12 bits/channel
3D geometry3D geometrycompletely regular samplingcompletely regular sampling
Ref: X. Gu, S. Gortler, H. Hoppe, “Geometry images,” ACM Transactions on Graphics 21(3): 355-361 (2002)
JTW/MSR
From first principles: From first principles: functionfunction
Dis
play
Pro
cess
or
U-VWL* Content
Video
Text
Line drawing
Animated 3D shapes
*Ultra-Vast Wasteland
Still photos
Dis
play
Dev
ice
JTW/MSR
3D text experiment3D text experiment
Olynyk, Mitchell, Snyder, MSR
• Extend Extend IBR/volume IBR/volume rendering to rendering to texttext
• Superior image Superior image reconstructionreconstruction
• Higher image Higher image quality than quality than mip-mapped mip-mapped texturetexture
JTW/MSR
Generic physical blocksGeneric physical blocks
• General General purpose purpose front endfront end
• Fixed Fixed function function back endback end
ReadCache
WriteCache
Mapper
Recon-Struction/Filtering
Common
Memory
Ref: T. Whitted, “Overview of IBR:Hardware and Software Issues,” ICIP 2000.
JTW/MSR
From first principles: From first principles: implementationimplementation
• Don’t count on quantum Don’t count on quantum GPUs soon – stick to CMOS GPUs soon – stick to CMOS digital logicdigital logic
• Count on CAD more than Count on CAD more than feature sizefeature size
• Heat is the enemyHeat is the enemy• The economy of commodity The economy of commodity
DRAM is hard to beatDRAM is hard to beat– But there is huge But there is huge
performance pressure on performance pressure on DRAMDRAM
• Designers are restricted Designers are restricted only by a lack of only by a lack of experienceexperience
CMOS digital circuitry
Commodity DRAM
Co
nte
nt
JTW/MSR
Design challengesDesign challenges
• Essence of the problemEssence of the problem– We don’t have a function to implementWe don’t have a function to implement– We must design We must design specificallyspecifically for for unknownunknown methods methods– Brute force is prohibitedBrute force is prohibited
• Feeds and speeds …What do we know?Feeds and speeds …What do we know?– Regulated by contentRegulated by content
• But we rarely turn content conventions into quantitative But we rarely turn content conventions into quantitative measuresmeasures
– Limited by perceptionLimited by perception• Which we don’t fully understandWhich we don’t fully understand
– Function of representationFunction of representation• If we don’t know the representation, we don’t know the If we don’t know the representation, we don’t know the
flowsflows• In my group’s research we limit flow to HW “sanity” and In my group’s research we limit flow to HW “sanity” and
then work backwardsthen work backwards
JTW/MSR
SummarySummary
• Graphics hardware has nearly Graphics hardware has nearly completed the circuit back to its completed the circuit back to its starting pointstarting point– Flexible, powerful, programmableFlexible, powerful, programmable
• Media processing requirements extend Media processing requirements extend beyond the classic 3D pipelinebeyond the classic 3D pipeline
• Unusual window of opportunityUnusual window of opportunity– to match architecture with a broader to match architecture with a broader
range of applications and contentrange of applications and content
JTW/MSR