24
1 Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Texture

Real-Time Graphics Architecture - Computer Graphics at Stanford

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Real-Time Graphics Architecture - Computer Graphics at Stanford

1

Real-Time Graphics Architecture

Kurt Akeley

Pat Hanrahan

http://www.graphics.stanford.edu/courses/cs448a-01-fall

Texture

Page 2: Real-Time Graphics Architecture - Computer Graphics at Stanford

2

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Topics

1. Review of texture mapping

2. RealityEngine and InfiniteReality

3. Texture caching

4. Texture prefetching

5. Trends and pitfalls

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Readings

Required

1. Z. Hakura, A. Gupta, The design and analysis of a cache architecture for texture mapping

2. H. Igehy, M. Eldridge, K. Proudfoot, Prefetching in a texture cache architecture

Background

1. P. Heckbert, Texture mapping polygons in perspective

2. P. Heckbert and H. Moreton, Interpolation for polygon texture mapping and shading

3. J. Blinn, Hyperbolic interpolation

4. L Williams, Pyramidal parametrics

Page 3: Real-Time Graphics Architecture - Computer Graphics at Stanford

3

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Mapping

2D (3D) Texture Space

Texture Transformation

2D Object Parameters

Parameterization

3D Object Space

Model Transformation

3D World Space

Viewing Transformation

3D Camera Space

Projection

2D Image Space

s

t

x

y

s

t

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Mapping Polygons

Forward transformation: linear projective map

Backward transformation: linear projective map

x a b c s

y d e f t

w g h i r

=

1

s a b c x

t d e f y

r g h i w

=

Page 4: Real-Time Graphics Architecture - Computer Graphics at Stanford

4

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Perspective-Correct Interpolation

Transform & Clip

Project (/w)

Rasterize and Interpolate

Texture Project (/qw’)

[ ], , ,1, , , , , , , ,1,x y z r g b a s t r

[ ], , , , , , , , , , , ,xw yw zw w r g b a sq tq rq q

[ ], , , , , , , , , , , ,xw yw zw w r g b a sw tw rw qw′ ′ ′ ′ ′ ′ ′ ′

[ ], , ,1, , , , , , , , ,xw yw zw r g b a sw tw rw qw′ ′ ′ ′ ′ ′ ′

[ ], , , , , , , , / , / , / ,1,xw yw zw w r g b a sw qw tw qw rw qw′ ′ ′ ′ ′ ′ ′ ′ ′ ′

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Linear Perspective

Linear Interpolation, BadPerspective Interpolation, Good

Incorrect Perspective

Correct Linear Perspective (0,0)

(1,1)

(1,0)

(0,1)

Page 5: Real-Time Graphics Architecture - Computer Graphics at Stanford

5

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Filtering Textures

TextureImage

Minification

Magnification

� Texture footprint

Footprint changes from pixel to pixel

i.e. not shift-invariant

� Resampling theory: two cases

1. Magnification => Interpolation

2. Minification => Filter (averaging)

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

MipMaps - L. Williams

s

t

d

s

d=2

d=4

Multum In Parvo = Many things in a small place

d=1R G

BR G

B

Address: 7 CMP(3 FIX + 4 RANGE), 2 MUL, 2 ADD

Page 6: Real-Time Graphics Architecture - Computer Graphics at Stanford

6

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Filtering

Constant time filtering

fs

ft

fd

1-fs

1-ft

1-fd

Linear (LERP)

1 MUL + 2 ADD / comp

Bilinear

3 LERPs

3 MUL + 6 ADD / comp

Trilinear

7 LERPs

7 MUL + 14 ADD / comp

Quadrilinear

15 LERPs

15 MUL + 30 ADD / comp

1 2 1 2 1( , , ) ( )lerp t v v v t v v= + −

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Filtering

Constant time filtering

fs ft fd

Linear (LERP)

1 ADD

2 MUL / comp

Bilinear

4 MUL + 2 ADD

4 MUL / comp

Trilinear

12 MUL + 3 ADD

8 MUL / comp

Quadrilinear

28 MUL + 4 ADD

16 MUL / comp

(1-fs) ft fd

Page 7: Real-Time Graphics Architecture - Computer Graphics at Stanford

7

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Principle of Texture Thrift

Given a scene consisting of 3D textured surfaces, the amount of texture information minimally required to render an image of the scene is proportional to the resolution of the image and is independent of the number of surfaces and the size of the textures.

T = d t I

d – depth complexity

t – average number of textures per surface

D. Peachey, Texture on demand, PIXAR Technical Memo, 1990

Mipmaps

1. Constant time to filter a textured fragment

2. Output sensitive algorithm

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Derivatives

( 1, ) ( , )

( , 1) ( , )

ss x y s x y

x

ss x y s x y

y

∂= + −

∂= + −

( 1, ) ( , )

( , 1) ( , )

tt x y t x y

x

tt x y t x y

y

∂= + −

∂= + −

Page 8: Real-Time Graphics Architecture - Computer Graphics at Stanford

8

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

mipd

Approximates quadrilateral with a square

Common formula:

[Heckbert]

s t

x xA

s t

y y

∂ ∂

∂ ∂=

∂ ∂

∂ ∂

2 2 2 2

max ,s t s t

Ax x y y

∂ ∂ ∂ ∂ ≈ + + ∂ ∂ ∂ ∂

2log

d A

mipd d

=

=

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Compute & Bandwidth Requirements

Texture access random (sort texels to fragments)

Requires high bandwidth

Address computation and filtering arithmetic intensive

Performance goal: 1 billion fragments per second

Most demanding stage of the graphics pipeline

83182418Total

1410Filter

8722Address

3146LOD

14Project

READSPEDIVCMPMULADD

Page 9: Real-Time Graphics Architecture - Computer Graphics at Stanford

9

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

History

Flight simulators

GE Apollo Simulator 1963

Clever method for procedurally generating textures

Workstations

SGI RE and IR and others …

Single-chip PC

3DFX and Nvidia and others …

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

RealityEngine (3rd Generation)

Page 10: Real-Time Graphics Architecture - Computer Graphics at Stanford

10

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

RE Fragment Generator

Capacity: 16 MB = 8 MT (>1024^2 mipmap)

Texture replicated per fragment generator (5,10,20)x16MB

Fill Rate: 12 MT/s x (5,10,20) = (60,120,240)

ProjectDerivative

Address

Filter

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

InfiniteReality (3rd Generation)

Page 11: Real-Time Graphics Architecture - Computer Graphics at Stanford

11

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

InfiniteReality

2x2 fragment quads (TA)

4x8 texture memories (TM)

2x2 texture filterers (TF)

32 by 80 crossbar TF->IE

Capacity: (16 MB, 64, 256)

Fill: 200 MT/s x (1,2,4)

Page 12: Real-Time Graphics Architecture - Computer Graphics at Stanford

12

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Memory Access

High bandwidth required

� 1 GF/s ⇒ 32 GB/s texture read (8 16-bit texels)

Mip map accesses

� Small granularity when interleaving

Memories

� Large granularity

� High latency

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Solutions

Caching

� Reduce the bandwidth requirement

� Match granularity of accesses to memory

Prefetching

� Hide the high latency of memory accesses

� Handle highly variable latency

Compression (not covered)

Page 13: Real-Time Graphics Architecture - Computer Graphics at Stanford

13

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Unique T/F Ratio

Key statistic: unique texel to fragment ratio

� Average memory bandwidth required

Definition:

Total texels accessed / Total fragments generated

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Locality

quake

flight2x

qtvr

qtvr2x

Percent trilinear

30%

47%

62%

87%

0%

100%

Unique T/F

0.033

0.092

Image

quake2x

flight 0.706

1.554

0.569

2.832

Page 14: Real-Time Graphics Architecture - Computer Graphics at Stanford

14

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Locality Measures

Texture locality variables

� Small repeated textures

� Average magnification textures

� Percentage of magnified textures

� Level of detail bias

� Average minification when mipmapping

Frac(d)~0 ⇒ T/F ~1.25

Frac(d)~1 ⇒ T/F ~5

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Caching

Cache parameters

� Line size (blocking)

� Cache size (working set)

� Direct-mapped or associative

Representation of textures in memory

Rasterization order

Page 15: Real-Time Graphics Architecture - Computer Graphics at Stanford

15

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Quake Miss Rate

0

0.1

0.2

0.3

0.4

1 2 4 8 16 32 64 128

total cache size [KB]

mis

se

s p

er

fra

gm

en

t

0

2

4

6

tex

els

pe

r fr

ag

me

nt

1-way

2-way

4-way

1-way

2-way

4-way2x

1x

4x4 32-bit T

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Flight Miss Rate

0

0.1

0.2

0.3

0.4

1 2 4 8 16 32 64 128

total cache size [KB]

mis

se

s p

er

fra

gm

en

t

0

2

4

6

tex

els

pe

r fr

ag

me

nt

1-way

2-way

4-way

1-way

2-way

4-way2x

1x

4x4 32-bit T

Page 16: Real-Time Graphics Architecture - Computer Graphics at Stanford

16

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

QTVR Miss Rate

0

0.1

0.2

0.3

0.4

1 2 4 8 16 32 64 128

total cache size [KB]

mis

se

s p

er

fra

gm

en

t

0

2

4

6

tex

els

pe

r fr

ag

me

nt

1-way

2-way

4-way

1-way

2-way

4-way2x

1x

4x4 32-bit T

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Bandwidth Savings

0

20

40

60

80

100

me

ga

by

tes

te

xtu

re r

ea

d

quake quake2x flight flight2x qtvr qtvr2x

without cache with cache

2 8 KB DM-caches4x4 32-bit T blocks

Page 17: Real-Time Graphics Architecture - Computer Graphics at Stanford

17

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Blocking

Texture Map

2D blocks

Hide orientation effects

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Blocking

4x4 texels

Cache Line SizeCache Size6D Organization

(s2,t2)(s1,t1) (s2,t2)

s1 t1 s2 t2 s3 t3baseAddress

4x4 blocks

Page 18: Real-Time Graphics Architecture - Computer Graphics at Stanford

18

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Rasterization Order

Scanline Order Tile Order

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Tiling and Blocking Results

Blocked Textures

Tiled Rasterization Blocked and Tiled

32KB, 2-way Associative, 128 byte lines

Misses123+

Linear

Page 19: Real-Time Graphics Architecture - Computer Graphics at Stanford

19

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Summary: Texture Caching

� Reasonably small workings sets

16-32 KB caches has 95% hit rate

� Separate caches for even and odd mip-levels to prevent conflicts

Alternatively 2-way associative cache

� Blocked textures further reduces miss rate

� Tiled rasterization further reduces miss rate

� 6D tiling minimizes working set

Conclusion

Caches highly effective for reducing texture memory bandwidth (roughly 5-10:1)

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Prefetching Architecture

Request

FIFO

Texture Apply

FIFO

Rasterizer

Reorder

Buffer

Texture

Filter

Texture

Memory

Page 20: Real-Time Graphics Architecture - Computer Graphics at Stanford

20

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

“Microprocessor” Architecture

Texture Apply

FIFO

Rasterizer

Texture

Filter

Cache Data

Stall Fetch Buffer

Cache Tags

Texture

Memory

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Disadvantages

1. Prefetches may generate conflict misses

2. Cache tags accessed twice

3. Large, fully associative fetch buffer

Page 21: Real-Time Graphics Architecture - Computer Graphics at Stanford

21

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Texture Prefetching Architecture

Texture Apply

FIFO

Rasterizer

Texture

Memory

Texture

Filter

Cache Data

Cache TagsRequest

FIFO

Reorder

BufferStall

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Advantages

1. No conflicts caused by prefetching

2. Cache tags accessed only once

3. No fully associative fetch buffer

4. Reorder buffer tolerates out-of order memory replies

Page 22: Real-Time Graphics Architecture - Computer Graphics at Stanford

22

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

50 100

agp

Bandwidth(texels/cycle)

1

24

Latency(cycles)

numa

rdram2xrdram

4

Memory Models

200 Mpixel fragment generator

5 ns cycle time

50 150 250

20

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Prefetching Results

0

5

10

15

20

25

30

quake

quake2x

flig

ht

flig

ht2

x

qtv

r

qtv

r2x

quake

quake2x

flig

ht

flig

ht2

x

qtv

r

qtv

r2x

quake

quake2x

flig

ht

flig

ht2

x

qtv

r

qtv

r2x

quake

quake2x

flig

ht

flig

ht2

x

qtv

r

qtv

r2x

cycl

es

per

fra

gm

ent

ideal prefetch no prefetch

agp rdram rdram2x numa

97% stall free unless limited by memory bandwidth

Buffering requirements modest compared to cache size

Page 23: Real-Time Graphics Architecture - Computer Graphics at Stanford

23

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Summary: Texture Prefetching

Prefetching effectively hides memory latency

� Early calculation of texture coordinates

Tolerating latency

� FIFO implements “context switch”

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Trends and Pitfalls

Trends

� Programmable texture coordinate generation

� Multitexture (4) and dependent textures (1)

� Programmable texture combination

� Better quality filters

Pitfalls

� 2D texture memory allocation; use 1D!

� Texture thrashing (draw in texture order)

� Handling borders is complicated

� Precision: very large textures (swimming)

Page 24: Real-Time Graphics Architecture - Computer Graphics at Stanford

24

CS448 Lecture 7 Kurt Akeley, Pat Hanrahan, Fall 2001

Additional Topics

CATS and RATS; RIP-MAPS

Anisotropic filtering

Detail textures

Texture compression

Texture management; clip-maps

Parallel texture caching