24
COOL Chips IV A High Performance 3D Graphics Raste rizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-H ee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-N am Jung, Il-San Kim, Won-Ho Chun, Won-Suk Kim, Tack-Don Han , Moon-Key Lee, Sung-Bong Yang, and Shin-Dug Kim Media System Lab. Yonsei University Seoul, Korea E-mail : [email protected]

A High Performance 3D Graphics Rasterizer with Effective Memory Structure

  • Upload
    austin

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

A High Performance 3D Graphics Rasterizer with Effective Memory Structure. Woo-Chan Park, Kil-Whan Lee * , Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il-San Kim, Won-Ho Chun, Won-Suk Kim, Tack-Don Han, - PowerPoint PPT Presentation

Citation preview

Page 1: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

A High Performance 3D Graphics Rasterizer with Effective Memory Structure

Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi,

Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,

Il-San Kim, Won-Ho Chun, Won-Suk Kim, Tack-Don Han,

Moon-Key Lee, Sung-Bong Yang, and Shin-Dug Kim

Media System Lab.Yonsei University

Seoul, KoreaE-mail : [email protected]

Page 2: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 22 - -

Outline

IntroductionDavid SimulatorHigh Performance 3D Graphics Rasterizer with

Effective Memory Structure (David Rasterizer)Performance AnalysisConclusions

Page 3: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

Introduction

Page 4: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

5th Year

1st Year

2nd Year

3rd Year∼Yearly Research Plan

Study Simulation Environment 3D GA Simulator development Survey of Related Works

Propose an Effective Architectur

e Performance Evaluation IP Co-Development

High Performance Architecture Building international core IP Implementation of Prototype System Parallel Rendering Architecture

Basic Environment & ResearchBasic Environment & Research

Technology Prevalent 3D Graphic Accelerator

Technology Prevalent 3D Graphic Accelerator

High Performance 3D Graphic Accelerator

High Performance 3D Graphic Accelerator

NRL Project

PropertiesProperties

Tech

nology P

revalent

(15million

textured

polygon

s)T

echn

ology Prevalen

t(15m

illion textu

red p

olygons)

High

Perform

ance

(25million

textu

red p

olygons)

High

Perform

ance

(25million

textu

red p

olygons)

• Institution for bringing up an excellent lab. with a core technology• A Government-initiated Project

NecessitiesNecessities

• Leadership of an advanced research area• Synergy effect through research interchanges

TitleTitle

The Design of High Performance 3D Graphics Accelerator for Realistic Image

Page 5: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 55 - -

Architecture

Research

Architecture

Research

The Design of A High Performance 3D Graphics Acceleratorfor Realistic Image & Building Core IPs

The Design of A High Performance 3D Graphics Acceleratorfor Realistic Image & Building Core IPs

• Geometry Processing Unit, Rendering Unit

• Realization Mapping Unit

• P-M Architecture, Memory Architecture for 3D GA

• Cache & Memory Hierarchy

• Parallel 3D Rendering System

• Geometry Processing Unit, Rendering Unit

• Realization Mapping Unit

• P-M Architecture, Memory Architecture for 3D GA

• Cache & Memory Hierarchy

• Parallel 3D Rendering System

Technology Prevalent3D Graphics Accelerator

High Performance3D Graphics Accelerator

Design

Research

Design

Research

• Execution Model(VLIW, SIMD, RISC etc.), Control & Interface

• Appliance to other system library by implementing VHDL

• Execution Model(VLIW, SIMD, RISC etc.), Control & Interface

• Appliance to other system library by implementing VHDL

SW

Research

SW

Research

• API & Rendering Algorithm

• Geometry Compression / Modeling

• API & Rendering Algorithm

• Geometry Compression / Modeling

Verification

/Integration

Verification

/Integration

• Construction of Simulation Environment & Verification by Simulation

• Prototype System

• Construction of Simulation Environment & Verification by Simulation

• Prototype System

Page 6: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 66 - -

Current Research Work

Related Works & Basic ResearchRelated Works & Basic Research

Texture cache simulation for various cache architectures Texture Cache Sharing Memory bandwidth saving scheme for texture data

Texture cache simulation for various cache architectures Texture Cache Sharing Memory bandwidth saving scheme for texture data

Perspective texture mapping Efficient bump mapping Modified anti-aliasing execution model

Perspective texture mapping Efficient bump mapping Modified anti-aliasing execution model

Object-oriented rendering using the analytic model Effective reuse buffer for triangle mesh Order-independent transparency

Object-oriented rendering using the analytic model Effective reuse buffer for triangle mesh Order-independent transparency

Overlapped light geometry processing(OLGP) New method for topological compression

Overlapped light geometry processing(OLGP) New method for topological compression

Geometry Unit

Rendering Unit

Realization Unit

Memory Unit

High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider

High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider

Arithmetic Unit

Major ResearchMajor Research David SimulatorDavid Simulator

Vertex data(x,y,z,w)

Model-view Transform

( Lighting )

Clipping

Viewport Transform

Triangle Setup

Projection

Divide by w

TextureMapping

Z-buffering

Scan-conversion

Bump map. Perspective

Fog / Alpha blending

Anti-aliasing

Pixel data

Page 7: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

David Simulator

Page 8: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 88 - -David Simulator Simulation Work Flow

Model DataModel Data

ParserParser

Mesa Library CallMesa Library Call Geometry Simulator CallGeometry Simulator Call

FPU Instructions

Ver

tex

Buf

fer

Performance Result

Rasterizer

Inst. & Data

transfer & Store to Local memory

Inst. & Data

transfer & Store to Local memory In

stru

ctio

n F

etch

In

stru

ctio

n F

etch

Dec

ode

Dec

ode

Exe

cute

#1

Exe

cute

#1

Wri

te B

ack

Wri

te B

ack

Exe

cute

#2

Exe

cute

#2

Exe

cute

#3

Exe

cute

#3

Write data to local

memory

Write data to local

memory

Geometry Engine

OpenGL format

Setup pipelineSetup pipeline

Edge work pipelineEdge work pipeline

Span processingSpan processing

MC for frame buffer accessMC for frame buffer access

Texture cacheTexture cache

Mapping unit (texture, bump, environment,

displacement)

Mapping unit (texture, bump, environment,

displacement)

MC for image map access

MC for image map access

Z CompareZ Compare Color BlendColor Blend

Rasterizer Simulator CallRasterizer Simulator Call

Obj format

Performance Result

Page 9: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 99 - -

David Rasterizer

Block Diagram

Interpolation

Lambda Calculation

Lambda Calculation

Setup

Edge Walk

Z-BufferCompare

Color Blend

Frame Buffer/ Texture Memory / Bump Map / Environment Map

AddressZ-Buffer

AddressFrame buffer

Gradient value ofU and V, W, D

X, Y

R, G, B, αZ

Lambda Calculation

Address Calculation

Tags

Texel Addresses, BumXel Addresses, EXel Address

RequestFIFO

ReorderBuffer

Texture Cache

Miss Addresses

FragmentFIFO

Cache Addresses

CacheAddresses

Pixel Cache

Span Processing

Bump Engine

Interpolation

Memory Controller

128bits24bits

Env. Engine

Texels, BumXel, EXel

Address Data

Z-test pipeline Mapping pipeline

Page 10: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

High Performance 3D Graphics Rasterizer with Effective Memory Str

ucture (David Rasterizer)

Page 11: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 1111 - -

Architectural features of David rasterizer

Performing z-test pipeline before TBE(Texture, Bump, and Environment) mapping completion Saving memory bandwidth Solve the incosistency problem with tagging schem

e for pixel cache Texture cache sharing with BE(Bump and Envi

ronment) mapping Efficient structure

Page 12: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 1212 - -

Rasterizer model : Neon, S3

Texture cache

Texture read / filter

Texture blend

Pixel information

memory

Z read

Z test

Z write

Alpha test

Destination read

Alpha blend

Destination write

memory

Texture cache

Z read

Z test

Z write

Texture read / filter

Texture blend

Alpha test

Destination read

Alpha blend

Destination write

Pixel information

Neon S3

Page 13: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 1313 - -

David Rasterizer

memory

Texture cache

Tag test and Z read

Z test

Texture read / filter

Texture blend

Alpha test

Destination read

Alpha blend

Destination write

Pixel information

Pixel cache

Pixel cache

Tag update and Z write

wid

e se

par

ate

Page 14: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 1414 - -

Architecture Comparison Neon

When is texture mapping performed?

before Z test

OpenGL semantics for perfectly transparent texture

Support

S3

after Z test

Not support

David

after Z test

Support

Advantages• Support OpenGL semantics

• No wasting bandwidth No fetching texture data that are obscured

• Simple scheme• No wasting bandwidth No fetching texture data that are obscured• Support OpenGL semantics

Disadvantages • Wasting bandwidth• Unable to support OpenGL semantics

• Wide separation Inconsistency problemSolve it using additional flag bits in a pixel cache

Page 15: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 1515 - -

Texture Cache Sharing

Texture Mapping #2Texture Mapping #1 Cache DRAM

Bump Mapping

Environment Mapping

Current 3D Architecture

Texture Mapping #1

DRAM

David 3D Architecture

Shared H/W

Bump Mapping

Texture Mapping #2

Shared H/W

Environment Mapping

SharedCache

DRAM

DRAM

Current Architecture David Architecture

Mapping Hardware

Cache Size

# of Read Port in Cache

Independent H/WShared H/W

Reduce H/W cost (about 30%)

Same Same

2 2

Throughput 1 CycleTexture Mapping : 1 Cycle

BE Mapping : 2 Cycles (infrequent operations)

Features BE Mapping : DRAM AccessBE Mapping : Cache Access

Remove Pipeline Stalls due to DRAM Access

Page 16: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

Performance Analysis

Page 17: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 1717 - -

Z Depth Complexity, # of Z Test Fails

Environments for Performance Evaluation

Model DataModel Data

Mesa Library CallMesa Library Call

Miss Ratio, Performance

Rasterizer

OpenGL format

Setup pipelineSetup pipeline

Edge work pipelineEdge work pipeline

Span processingSpan processing

MC for frame buffer accessMC for frame buffer access

Texture cacheTexture cache

Mapping unit (texture, bump, environment,

displacement)

Mapping unit (texture, bump, environment,

displacement)

MC for image map access

MC for image map access

Z CompareZ Compare Color BlendColor Blend

Rasterizer Simulator CallRasterizer Simulator Call

Trace GenerationTrace Generation

Texture Cache SimulatorTexture Cache Simulator

Pixel Cache SimulatorPixel Cache Simulator

Page 18: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 1818 - -

Model Data

Blocks Flarge

LightscapeProCDRS

Crystal Space(Game engine)

SPECviewperf

Page 19: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 1919 - -Bandwidth Saving in Texture Data(ProCDRS)

0

5

10

15

20

25

Frame Number

Dep

th C

ompl

exity

, % B

andw

idth

Sav

ing

Depth Complexity Bandwidth Saving

Depth Complexity Bandwidth SavingAverage 2.22 21.55%

Page 20: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 2020 - -Bandwidth Saving in Texture Data(Light)

0369

121518212427303336394245

Frame Number

Dep

th C

ompl

exit

y, %

Ban

dwid

th S

avin

g

Depth Complexity Bandwidth Saving

Depth Complexity Bandwidth SavingAverage 2.31 29.16%

Page 21: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 2121 - -Bandwidth Saving in Texture Data(Blocks)

0

1

2

3

4

5

6

7

8

9

Frame Number

Dep

th C

ompl

exity

, % B

andw

idth

Sav

ing

Depth Complexity Bandwidth Saving

Depth Complexity Bandwidth SavingAverage 1.43 4.76%

Page 22: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 2222 - -

Bandwidth Saving in Texture Data(Flarge)

0

5

10

15

20

25

Frame Number

Dep

th C

ompl

exity

, % B

andw

idth

Sav

ing

Depth Complexity Bandwidth Saving

Depth Complexity Bandwidth SavingAverage 1.31 4.93%

Page 23: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

Conclusions

Page 24: A High Performance 3D Graphics Rasterizer with Effective Memory Structure

COOL Chips IV

- - 2424 - -

ConclusionsSimulation Environment (David Simulator)

Evaluation for 3D graphics accelerator architecture

Performance comparisonDavid Architecture

Performing z-test before TBE mapping completion

• 5%~29% bandwidth savings for texture data in Scenes with 1~3 depth complexity

• As the depth complexity grows, the amount of bandwidth savings become large

– Recently, a 3D graphic application shows high depth complexity

Texture cache sharing with BE mapping• Hardware saving from sharing and hardware reduction for BE

mappings