Upload
others
View
13
Download
1
Embed Size (px)
Citation preview
A High Performance 3D Graphics A High Performance 3D Graphics Rasterizer with Effective Memory Rasterizer with Effective Memory
StructureWoo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi,
Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il S Ki W H Ch W S k Ki T k D HIl-San Kim, Won-Ho Chun, Won-Suk Kim, Tack-Don Han,
Moon-Key Lee, Sung-Bong Yang, and Shin-Dug Kim
Media System Lab.
Yonsei University
Seoul, Korea
E mail kiwh@kurene yonsei ac kr
COOL Chips IV
E-mail : [email protected]
-- 22 --
Outline
Introduction
David Simulator
High Performance 3D Graphics Rasterizer with g pEffective Memory Structure (David Rasterizer)
Performance AnalysisPerformance Analysis
Conclusions
COOL Chips IV
IntroductionIntroduction
COOL Chips IV
Yearly Research PlanNRL Project y
Basic Environment & ResearchBasic Environment & Research (15m(15m
TitleTitle
1st Year Study Simulation Environment 3D GA Simulator development Survey of Related Works
Technologm
illion texTechnolog
million tex
The Design of High Performance 3D Graphics Accelerator for Realistic Image
2nd Year Propose an Effective Architecture
Technology Prevalent 3D Graphic Accelerator
Technology Prevalent 3D Graphic Accelerator
PropertiesProperties
gy Prevalextured polygy Prevalextured poly 2nd YearPropose an Effective Architecture
Performance Evaluation IP Co-Development
entygons)entygons)
• Institution for bringing up anexcellent lab. with a core technology
• A Government-initiated Project
3rd Year High Performance Architecture
High Performance 3D Graphic Accelerator
High Performance 3D Graphic Accelerator
High Pe
(25mtextured
High Pe
(25mtextured
NecessitiesNecessities
5th Year
∼ High Performance Architecture Building international core IP Implementation of Prototype System Parallel Rendering Architecture
erformanc
million
d polygons
erformanc
million
d polygons
• Leadership of an advanced researcharea
• Synergy effect through research interchanges
COOL Chips IV
ces) ces)interchanges
-- 55 --
The Design of A High Performance 3D Graphics Acceleratorfor Realistic Image & Building Core IPs
The Design of A High Performance 3D Graphics Acceleratorfor Realistic Image & Building Core IPsfor Realistic Image & Building Core IPsfor Realistic Image & Building Core IPs
Technology Prevalent High Performance
• Geometry Processing Unit, Rendering Unit
3D Graphics Accelerator 3D Graphics Accelerator
ArchitectureResearch
y g , g• Realization Mapping Unit
• P-M Architecture, Memory Architecture for 3D GA
• Cache & Memory Hierarchy
• Parallel 3D Rendering System
Design • Execution Model(VLIW, SIMD, RISC etc.), Control & Interface
Research • Appliance to other system library by implementing VHDL
SW • API & Rendering Algorithm
Research • Geometry Compression / Modeling
Verification • Construction of Simulation Environment & Verification by Simulation
COOL Chips IV
/Integration • Prototype System
-- 66 --
Current Research Work
High performance floating point adder/subtractorA ith ti U it
Major ResearchMajor Research David SimulatorDavid SimulatorVertex data(x,y,z,w)
l d li h i ( )
High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider
Arithmetic UnitModel-view Transform
( Lighting )
Clipping
Object-oriented rendering using the analytic model
Overlapped light geometry processing(OLGP) New method for topological compression
Geometry Unit
Rendering Unit Viewport Transform
Projection
Divide by w
Perspective texture mapping
Effective reuse buffer for triangle mesh Order-independent transparency
g
Realization Unit
Triangle Setup
Scan-conversion
Perspecti e
Texture cache simulation for various cache architectures
p pp g Efficient bump mappingModified anti-aliasing execution model
Realization Unit
M U it
TextureMapping
Bump map. Perspective
Fog / Alpha blending Texture cache simulation for various cache architectures Texture Cache SharingMemory bandwidth saving scheme for texture data
Memory Unit Z-buffering
Anti-aliasing
Pixel data
COOL Chips IV
Related Works & Basic Research
David SimulatorDavid Simulator
COOL Chips IV
-- 88 --
David Simulator Simulation Work FlowModel DataModel Data
Obj format
ParserParser
M Lib C llM Lib C llOpenGL format
Mesa Library CallMesa Library Call Geometry Simulator CallGeometry Simulator Call
FPU InstructionsRasterizer Simulator CallRasterizer Simulator Call
Rasterizer
Inst & h
Geometry EngineSetup pipeline
Edge work pipeline Texture cache
Rasterizer Simulator CallRasterizer Simulator Call
Verte
x B
uffe
r Inst. & Data
transfer & Store to Local
nstr
uctio
n Fe
tch
Dec
ode
Exe
cute
#1
Wri
te B
ack
Exe
cute
#2
Exe
cute
#3 Write
data to local
memory
Edge work pipeline
Span processing
Texture cache
Mapping unit (texture, bump, environment,
displacement)Z C C l Bl d memory In
MC for frame buffer accessMC for image map
access
Z Compare Color Blend
COOL Chips IVPerformance Result Performance Result
-- 99 --
David Rasterizer Block Diagram
Setup
Edge W alk
Block DiagramLambda Calculation
Gradient value ofU and V, W , D
X, Y ZLambda Calculation
Span ProcessingZ-test pipeline Mapping pipeline
Lambda Calculation
,R, G, B, α Address Calculation
Tags
Texel Addresses, BumXel Addresses, EXel Address
M iss Addresses
Cache Addresses
Z B ff
RequestFIFO
ReorderBuffer
M iss Addresses
F t
Interpolation
Z-BufferCompare
Texture Cache
FragmentFIFO Cache
Addresses
Interpolation
Texels, BumXel, EXel
Color Blend
AddressZ-Buffer
AddressFrame buffer
Bump Engine Env. Engine
Pixel Cache
M emory Controller
128bits24bitsAddress Data
COOL Chips IVFrame Buffer/ Texture M emory / Bump M ap / Environment M ap
High Performance 3D Graphics Rasterizer with Effective Memory Rasterizer with Effective Memory
Structure (David Rasterizer)
COOL Chips IV
-- 1111 --
Architectural features of David rasterizer
Performing z-test pipeline before TBE(Texture, g p p ,Bump, and Environment) mapping completion Saving memory bandwidthSa g e o y ba d d
Solve the incosistency problem with tagging scheme for pixel cache
Texture cache sharing with BE(Bump and Environment) mappingEnvironment) mapping Efficient structure
COOL Chips IV
-- 1212 --
Rasterizer model : Neon, S3
Pixel information Pixel information
Neon S3
Texture cache
Texture read / filter
Texture blend
Z read
Z test
Z read
Texture
Z write
memory
Z test
Alpha test memory
Texture cache
Texture read / filter
Texture blend
Z write
Destination read
Alpha test
Destination readDestination read
Alpha blend
Destination read
Alpha blend
COOL Chips IV
Destination write Destination write
-- 1313 --
David Rasterizer
Tag test and Z read
Pixel information
Pixel cache
Z test
cache
rate
memory
Texture cache
Texture read / filter
Texture blend ide
sepa
Alpha test
wi
Destination readPixel
Tag update and Z write
Alpha blend
Destination write
Pixel cache
COOL Chips IV
Destination write
-- 1414 --
Architecture Comparison Neon
When is texture mapping
S3 David
When is texture mapping performed? before Z test
OpenGL semantics for perfectly
after Z test after Z test
OpenGL semantics for perfectly transparent texture Support Not support Support
• Simple scheme
Advantages • Support OpenGL semantics
• No wasting bandwidth No fetching texture data that are obscured
• No wasting bandwidth No fetching texture data that are obscured
• Support OpenGL semantics
• Wide separation
Disadvantages • Wasting bandwidth • Unable to supportOpenGL semantics
• Wide separation Inconsistency problemSolve it using additional
flag bits in a pixel cache
COOL Chips IV
-- 1515 --
Texture Cache Sharing
Texture Mapping #1 C h DRAM
Current 3D Architecture
Texture Mapping #1
David 3D Architecture
Texture Mapping #2Texture Mapping #1 Cache DRAM
Bump Mapping DRAM
Shared H/W
Bump Mapping SharedCache
DRAM
Environment Mapping
Texture Mapping #2
Shared H/W
Environment Mapping
Cache
DRAM
Current Architecture David Architecture
M i H d Shared H/WMapping Hardware Independent H/W Shared H/W Reduce H/W cost (about 30%)
Features BE Mapping : DRAM Access BE Mapping : Cache AccessRemove Pipeline Stalls due to DRAM Access
Cache Size
# of Read Port in Cache
Same Same
2 2
COOL Chips IV
Throughput 1 Cycle Texture Mapping : 1 CycleBE Mapping : 2 Cycles (infrequent operations)
Performance AnalysisPerformance Analysis
COOL Chips IV
-- 1717 --
Environments for Performance Evaluation
Model DataModel DataO GL f t
Mesa Library CallMesa Library CallOpenGL format
Trace GenerationTrace Generation
RasterizerSetup pipeline
Rasterizer Simulator CallRasterizer Simulator CallTexture Cache SimulatorTexture Cache Simulator
Edge work pipeline
Span processing
Texture cache
Mapping unit (texture,
Pixel Cache SimulatorPixel Cache Simulator
Miss Ratio, PerformanceMC for frame buffer access
pp g ( ,bump, environment,
displacement)
MC for image map access
Z Compare Color Blend
Z Depth Complexity # of Z Test Fails
,
COOL Chips IV
Z Depth Complexity, # of Z Test Fails
-- 1818 --
Model Data
SPECviewperf
LightscapeProCDRS
Crystal Space(Game engine)(Game engine)
COOL Chips IV
Blocks Flarge
-- 1919 --
Bandwidth Saving in Texture Data(ProCDRS)
25
Depth Complexity Bandwidth Saving
20
25
th S
avin
g
15
% B
andw
idt
10
ompl
exity
, %
0
5
Dep
th C
o
0Frame Number
Depth Complexity Bandwidth Saving
COOL Chips IV
Depth Complexity Bandwidth SavingAverage 2.22 21.55%
-- 2020 --
Bandwidth Saving in Texture Data(Light)
45
Depth Complexity Bandwidth Saving
33363942
th S
avin
g
24273033
% B
andw
idt
12151821
ompl
exity
, %
369
12
Dep
th C
0
Frame NumberDepth Complexity Bandwidth Saving
COOL Chips IV
Depth Complexity Bandwidth SavingAverage 2.31 29.16%
-- 2121 --
Bandwidth Saving in Texture Data(Blocks)
9
Depth Complexity Bandwidth Saving
7
8
9
th S
avin
g
5
6
7
% B
andw
idth
3
4
ompl
exity
, %
0
1
2
Dep
th C
o
0Frame Number
Depth Complexity Bandwidth Saving
COOL Chips IV
Depth Complexity Bandwidth SavingAverage 1.43 4.76%
-- 2222 --
Bandwidth Saving in Texture Data(Flarge)
25
Depth Complexity Bandwidth Saving
20
25
th S
avin
g
15
% B
andw
idth
10
ompl
exity
, %
0
5
Dep
th C
o
0Frame Number
Depth Complexity Bandwidth Saving
COOL Chips IV
Depth Complexity Bandwidth SavingAverage 1.31 4.93%
ConclusionsConclusions
COOL Chips IV
-- 2424 --
Conclusions
Simulation Environment (David Simulator)Evaluation for 3D graphics accelerator architecture Evaluation for 3D graphics accelerator architecture
Performance comparison
David ArchitectureDavid Architecture Performing z-test before TBE mapping completion
• 5% 29% bandwidth savings for texture data in Scenes with• 5%~29% bandwidth savings for texture data in Scenes with 1~3 depth complexity
• As the depth complexity grows, the amount of bandwidth i b lsavings become large
– Recently, a 3D graphic application shows high depth complexity
Texture cache sharing with BE mapping• Hardware saving from sharing and hardware reduction for BE
mappings
COOL Chips IV