24
A High Performance 3D Graphics A High Performance 3D Graphics Rasterizer with Effective Memory Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il S Ki W H Ch W Sk Ki T k D H Il-San Kim, Won-Ho Chun, Won-Suk Kim, Tack-Don Han, Moon-Key Lee, Sung-Bong Yang, and Shin-Dug Kim Media System Lab. Yonsei University Seoul, Korea E mail kiwh@kurene yonsei ac kr COOL Chips IV E-mail : kiwh@kurene.yonsei.ac.kr

A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

  • Upload
    others

  • View
    13

  • Download
    1

Embed Size (px)

Citation preview

Page 1: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

A High Performance 3D Graphics A High Performance 3D Graphics Rasterizer with Effective Memory Rasterizer with Effective Memory

StructureWoo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi,

Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il S Ki W H Ch W S k Ki T k D HIl-San Kim, Won-Ho Chun, Won-Suk Kim, Tack-Don Han,

Moon-Key Lee, Sung-Bong Yang, and Shin-Dug Kim

Media System Lab.

Yonsei University

Seoul, Korea

E mail kiwh@kurene yonsei ac kr

COOL Chips IV

E-mail : [email protected]

Page 2: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 22 --

Outline

Introduction

David Simulator

High Performance 3D Graphics Rasterizer with g pEffective Memory Structure (David Rasterizer)

Performance AnalysisPerformance Analysis

Conclusions

COOL Chips IV

Page 3: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

IntroductionIntroduction

COOL Chips IV

Page 4: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

Yearly Research PlanNRL Project y

Basic Environment & ResearchBasic Environment & Research (15m(15m

TitleTitle

1st Year Study Simulation Environment 3D GA Simulator development Survey of Related Works

Technologm

illion texTechnolog

million tex

The Design of High Performance 3D Graphics Accelerator for Realistic Image

2nd Year Propose an Effective Architecture

Technology Prevalent 3D Graphic Accelerator

Technology Prevalent 3D Graphic Accelerator

PropertiesProperties

gy Prevalextured polygy Prevalextured poly 2nd YearPropose an Effective Architecture

Performance Evaluation IP Co-Development

entygons)entygons)

• Institution for bringing up anexcellent lab. with a core technology

• A Government-initiated Project

3rd Year High Performance Architecture

High Performance 3D Graphic Accelerator

High Performance 3D Graphic Accelerator

High Pe

(25mtextured

High Pe

(25mtextured

NecessitiesNecessities

5th Year

∼ High Performance Architecture Building international core IP Implementation of Prototype System Parallel Rendering Architecture

erformanc

million

d polygons

erformanc

million

d polygons

• Leadership of an advanced researcharea

• Synergy effect through research interchanges

COOL Chips IV

ces) ces)interchanges

Page 5: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 55 --

The Design of A High Performance 3D Graphics Acceleratorfor Realistic Image & Building Core IPs

The Design of A High Performance 3D Graphics Acceleratorfor Realistic Image & Building Core IPsfor Realistic Image & Building Core IPsfor Realistic Image & Building Core IPs

Technology Prevalent High Performance

• Geometry Processing Unit, Rendering Unit

3D Graphics Accelerator 3D Graphics Accelerator

ArchitectureResearch

y g , g• Realization Mapping Unit

• P-M Architecture, Memory Architecture for 3D GA

• Cache & Memory Hierarchy

• Parallel 3D Rendering System

Design • Execution Model(VLIW, SIMD, RISC etc.), Control & Interface

Research • Appliance to other system library by implementing VHDL

SW • API & Rendering Algorithm

Research • Geometry Compression / Modeling

Verification • Construction of Simulation Environment & Verification by Simulation

COOL Chips IV

/Integration • Prototype System

Page 6: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 66 --

Current Research Work

High performance floating point adder/subtractorA ith ti U it

Major ResearchMajor Research David SimulatorDavid SimulatorVertex data(x,y,z,w)

l d li h i ( )

High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider

Arithmetic UnitModel-view Transform

( Lighting )

Clipping

Object-oriented rendering using the analytic model

Overlapped light geometry processing(OLGP) New method for topological compression

Geometry Unit

Rendering Unit Viewport Transform

Projection

Divide by w

Perspective texture mapping

Effective reuse buffer for triangle mesh Order-independent transparency

g

Realization Unit

Triangle Setup

Scan-conversion

Perspecti e

Texture cache simulation for various cache architectures

p pp g Efficient bump mappingModified anti-aliasing execution model

Realization Unit

M U it

TextureMapping

Bump map. Perspective

Fog / Alpha blending Texture cache simulation for various cache architectures Texture Cache SharingMemory bandwidth saving scheme for texture data

Memory Unit Z-buffering

Anti-aliasing

Pixel data

COOL Chips IV

Related Works & Basic Research

Page 7: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

David SimulatorDavid Simulator

COOL Chips IV

Page 8: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 88 --

David Simulator Simulation Work FlowModel DataModel Data

Obj format

ParserParser

M Lib C llM Lib C llOpenGL format

Mesa Library CallMesa Library Call Geometry Simulator CallGeometry Simulator Call

FPU InstructionsRasterizer Simulator CallRasterizer Simulator Call

Rasterizer

Inst & h

Geometry EngineSetup pipeline

Edge work pipeline Texture cache

Rasterizer Simulator CallRasterizer Simulator Call

Verte

x B

uffe

r Inst. & Data

transfer & Store to Local

nstr

uctio

n Fe

tch

Dec

ode

Exe

cute

#1

Wri

te B

ack

Exe

cute

#2

Exe

cute

#3 Write

data to local

memory

Edge work pipeline

Span processing

Texture cache

Mapping unit (texture, bump, environment,

displacement)Z C C l Bl d memory In

MC for frame buffer accessMC for image map

access

Z Compare Color Blend

COOL Chips IVPerformance Result Performance Result

Page 9: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 99 --

David Rasterizer Block Diagram

Setup

Edge W alk

Block DiagramLambda Calculation

Gradient value ofU and V, W , D

X, Y ZLambda Calculation

Span ProcessingZ-test pipeline Mapping pipeline

Lambda Calculation

,R, G, B, α Address Calculation

Tags

Texel Addresses, BumXel Addresses, EXel Address

M iss Addresses

Cache Addresses

Z B ff

RequestFIFO

ReorderBuffer

M iss Addresses

F t

Interpolation

Z-BufferCompare

Texture Cache

FragmentFIFO Cache

Addresses

Interpolation

Texels, BumXel, EXel

Color Blend

AddressZ-Buffer

AddressFrame buffer

Bump Engine Env. Engine

Pixel Cache

M emory Controller

128bits24bitsAddress Data

COOL Chips IVFrame Buffer/ Texture M emory / Bump M ap / Environment M ap

Page 10: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

High Performance 3D Graphics Rasterizer with Effective Memory Rasterizer with Effective Memory

Structure (David Rasterizer)

COOL Chips IV

Page 11: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 1111 --

Architectural features of David rasterizer

Performing z-test pipeline before TBE(Texture, g p p ,Bump, and Environment) mapping completion Saving memory bandwidthSa g e o y ba d d

Solve the incosistency problem with tagging scheme for pixel cache

Texture cache sharing with BE(Bump and Environment) mappingEnvironment) mapping Efficient structure

COOL Chips IV

Page 12: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 1212 --

Rasterizer model : Neon, S3

Pixel information Pixel information

Neon S3

Texture cache

Texture read / filter

Texture blend

Z read

Z test

Z read

Texture

Z write

memory

Z test

Alpha test memory

Texture cache

Texture read / filter

Texture blend

Z write

Destination read

Alpha test

Destination readDestination read

Alpha blend

Destination read

Alpha blend

COOL Chips IV

Destination write Destination write

Page 13: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 1313 --

David Rasterizer

Tag test and Z read

Pixel information

Pixel cache

Z test

cache

rate

memory

Texture cache

Texture read / filter

Texture blend ide

sepa

Alpha test

wi

Destination readPixel

Tag update and Z write

Alpha blend

Destination write

Pixel cache

COOL Chips IV

Destination write

Page 14: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 1414 --

Architecture Comparison Neon

When is texture mapping

S3 David

When is texture mapping performed? before Z test

OpenGL semantics for perfectly

after Z test after Z test

OpenGL semantics for perfectly transparent texture Support Not support Support

• Simple scheme

Advantages • Support OpenGL semantics

• No wasting bandwidth No fetching texture data that are obscured

• No wasting bandwidth No fetching texture data that are obscured

• Support OpenGL semantics

• Wide separation

Disadvantages • Wasting bandwidth • Unable to supportOpenGL semantics

• Wide separation Inconsistency problemSolve it using additional

flag bits in a pixel cache

COOL Chips IV

Page 15: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 1515 --

Texture Cache Sharing

Texture Mapping #1 C h DRAM

Current 3D Architecture

Texture Mapping #1

David 3D Architecture

Texture Mapping #2Texture Mapping #1 Cache DRAM

Bump Mapping DRAM

Shared H/W

Bump Mapping SharedCache

DRAM

Environment Mapping

Texture Mapping #2

Shared H/W

Environment Mapping

Cache

DRAM

Current Architecture David Architecture

M i H d Shared H/WMapping Hardware Independent H/W Shared H/W Reduce H/W cost (about 30%)

Features BE Mapping : DRAM Access BE Mapping : Cache AccessRemove Pipeline Stalls due to DRAM Access

Cache Size

# of Read Port in Cache

Same Same

2 2

COOL Chips IV

Throughput 1 Cycle Texture Mapping : 1 CycleBE Mapping : 2 Cycles (infrequent operations)

Page 16: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

Performance AnalysisPerformance Analysis

COOL Chips IV

Page 17: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 1717 --

Environments for Performance Evaluation

Model DataModel DataO GL f t

Mesa Library CallMesa Library CallOpenGL format

Trace GenerationTrace Generation

RasterizerSetup pipeline

Rasterizer Simulator CallRasterizer Simulator CallTexture Cache SimulatorTexture Cache Simulator

Edge work pipeline

Span processing

Texture cache

Mapping unit (texture,

Pixel Cache SimulatorPixel Cache Simulator

Miss Ratio, PerformanceMC for frame buffer access

pp g ( ,bump, environment,

displacement)

MC for image map access

Z Compare Color Blend

Z Depth Complexity # of Z Test Fails

,

COOL Chips IV

Z Depth Complexity, # of Z Test Fails

Page 18: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 1818 --

Model Data

SPECviewperf

LightscapeProCDRS

Crystal Space(Game engine)(Game engine)

COOL Chips IV

Blocks Flarge

Page 19: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 1919 --

Bandwidth Saving in Texture Data(ProCDRS)

25

Depth Complexity Bandwidth Saving

20

25

th S

avin

g

15

% B

andw

idt

10

ompl

exity

, %

0

5

Dep

th C

o

0Frame Number

Depth Complexity Bandwidth Saving

COOL Chips IV

Depth Complexity Bandwidth SavingAverage 2.22 21.55%

Page 20: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 2020 --

Bandwidth Saving in Texture Data(Light)

45

Depth Complexity Bandwidth Saving

33363942

th S

avin

g

24273033

% B

andw

idt

12151821

ompl

exity

, %

369

12

Dep

th C

0

Frame NumberDepth Complexity Bandwidth Saving

COOL Chips IV

Depth Complexity Bandwidth SavingAverage 2.31 29.16%

Page 21: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 2121 --

Bandwidth Saving in Texture Data(Blocks)

9

Depth Complexity Bandwidth Saving

7

8

9

th S

avin

g

5

6

7

% B

andw

idth

3

4

ompl

exity

, %

0

1

2

Dep

th C

o

0Frame Number

Depth Complexity Bandwidth Saving

COOL Chips IV

Depth Complexity Bandwidth SavingAverage 1.43 4.76%

Page 22: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 2222 --

Bandwidth Saving in Texture Data(Flarge)

25

Depth Complexity Bandwidth Saving

20

25

th S

avin

g

15

% B

andw

idth

10

ompl

exity

, %

0

5

Dep

th C

o

0Frame Number

Depth Complexity Bandwidth Saving

COOL Chips IV

Depth Complexity Bandwidth SavingAverage 1.31 4.93%

Page 23: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

ConclusionsConclusions

COOL Chips IV

Page 24: A High Performance 3D Graphics Rasterizer with Effective Memory Structureweb.yonsei.ac.kr/wjlee/document/HP3DGrasterizer01Park.pdf · A High Performance 3D Graphics Rasterizer with

-- 2424 --

Conclusions

Simulation Environment (David Simulator)Evaluation for 3D graphics accelerator architecture Evaluation for 3D graphics accelerator architecture

Performance comparison

David ArchitectureDavid Architecture Performing z-test before TBE mapping completion

• 5% 29% bandwidth savings for texture data in Scenes with• 5%~29% bandwidth savings for texture data in Scenes with 1~3 depth complexity

• As the depth complexity grows, the amount of bandwidth i b lsavings become large

– Recently, a 3D graphic application shows high depth complexity

Texture cache sharing with BE mapping• Hardware saving from sharing and hardware reduction for BE

mappings

COOL Chips IV