38
Anatomy of a Texture Fetch Mark J. Kilgard [email protected] presented for Dr. Bajaj’s graphics class University of Texas, Austin April 29, 2010

Anatomy of a Texture Fetch

Embed Size (px)

DESCRIPTION

Presentation for Dr. Bajaj's computer graphics class (CS 354) at University of Texas, Austin (April 29, 2010).

Citation preview

Page 1: Anatomy of a Texture Fetch

Anatomy of aTexture Fetch

Mark J. Kilgard

[email protected]

presented for Dr. Bajaj’s graphics classUniversity of Texas, Austin

April 29, 2010

Page 2: Anatomy of a Texture Fetch

About Me

Principal System Software EngineerNVIDIA Distinguished Inventor

Native Texan, living in AustinBut instead of UT, attended Rice (C.S. 1991)

Yet I root for Texas in football (unless playing Rice)

Interested inProgrammable shading languages for GPUs

Novel GPU-accelerated rendering paradigms

OpenGL & GPU Architecture Improvements

Page 3: Anatomy of a Texture Fetch

Our Topic: the Texture Fetch Dissected

Texture FetchAll modern 3D games rely on texture-rich rendering

Basis for all sorts of richness and detail

More than graphics—GPU compute supports texture fetches

ConceptuallyAn image load with built-in re-sampling

GeForce 480 capable of 42,000,000,000 per second! *

* 700 Mhz clock × 15 Streaming Multiprocessors × 4 fetches per clock

Page 4: Anatomy of a Texture Fetch

A Texture Fetch Simplified

Seems pretty simple…

Given1. An image

2. A position

Return thecolor of image at position

Fetch at

(s,t) = (0.6, 0.25)

T a

xis

S axis0.0

1.0

0.0

1.0

RGBA Result is

0.95,0.4,0.24,1.0)

Page 5: Anatomy of a Texture Fetch

Texture Supplies Detail to Rendered Scenes

Without texture

With texture

Page 6: Anatomy of a Texture Fetch

Textures Make Graphics Pretty

Texture → detail,

detail → immersion,

immersion → fun

Unreal Tournament

Microsoft Flight Simulator X

Sacred 2

Page 7: Anatomy of a Texture Fetch

Shaders Combine Multiple Textures

(modulate)

=

lightmaps onlylightmaps only

decal onlydecal only

combined scenecombined scene* Id Software’s Quake 2

circa 1997

Page 8: Anatomy of a Texture Fetch

Projected Texturing for Shadow Mapping

Depth map from light’s point of viewis re-used as a texture andre-projected into eye’s view to generate shadows

lightposition

without

shadows

with

shadows

“what thelight sees”

Page 9: Anatomy of a Texture Fetch

Shadow Mapping Explained

Planar distance from lightPlanar distance from light Depth map projected onto sceneDepth map projected onto scene

≤≤ ==lesslessthanthan

True “un-shadowed” True “un-shadowed” region shown greenregion shown green

equalsequals

Page 10: Anatomy of a Texture Fetch

Volume rendering of fluid turbulence,

Lawrence Berkley National Lab

Automotive design, RTT

Seismic visualization, Landmark Graphics

Texture’s Not All Fun and Games

Page 11: Anatomy of a Texture Fetch

What’s so hard?

FilteringPoor quality results if you just return the closest color sample in the imageBilinear filtering + mipmapping needed

ComplicationsWrap modes, formats, compression, color spaces, other dimensionalities (1D, 3D, cube maps), etc.

Gotta be quickApplications desire billions of fetches per secondWhat’s done per-fragment in the shader, must be done per-texel in the texture fetch—so 8x times as much work!

Essentially a miniature, real-time re-sampling kernel

Page 12: Anatomy of a Texture Fetch

Anatomy of a Texture Fetch

Filteredtexel vectorTexel

SelectionTexel

Combination

Texel offsets

Texel data

Texture images

Combination parameters

Texture coordinate

vector

Texture parameters

Page 13: Anatomy of a Texture Fetch

Texture Fetch Functionality (1)

Texture coordinate processingProjective texturing (OpenGL 1.0)Cube map face selection (OpenGL 1.3)Texture array indexing (OpenGL 2.1)Coordinate scale: normalization (ARB_texture_rectangle)

Level-of-detail (LOD) computationLog of maximum texture coordinate partial derivative (OpenGL 1.0)LOD clamping (OpenGL 1.2)LOD bias (OpenGL 1.3)Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias)

Wrap modesRepeat, clamp (OpenGL 1.0)Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3)Mirrored repeat (OpenGL 1.4)Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp)Wrap to adjacent cube map faceRegion clamp & mirror (PlayStation 2)

Page 14: Anatomy of a Texture Fetch

Arrays of 2D Textures

Multiple skins packed in texture arrayMotivation: binding to one multi-skin texture array avoids texture bind per object

Texture array index

0 1 2 3 4

0

1

234

Mip

map

lev

el i

nd

ex

Page 15: Anatomy of a Texture Fetch

Texture Fetch Functionality (2)

Filter modesMinification / magnification transition (OpenGL 1.0)

Nearest, linear, mipmap (OpenGL 1.0)

1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D)

Anisotropic (EXT_texture_filter_anisotropic)

Fixed-weights: Quincunx, 3x3 Gaussian

Used for multi-sample resolves

Detail texture magnification (SGIS_detail_texture)

Sharpen texture magnification (SGIS_sharpen_texture)

4x4 filter (SGIS_texture_filter4)

Sharp-edge texture magnification (E&S Harmony)

Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)

Page 16: Anatomy of a Texture Fetch

Texture Fetch Functionality (3)

Texture formatsUncompressed

Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1)Type: unsigned, signed (NV_texture_shader)Normalized: fixed-point vs. integer (OpenGL 3.0)

CompressedDXT compression formats (EXT_texture_compression_s3tc)4:2:2 video compression (various extensions)1- and 2-component compression (EXT_texture_compression_latc,OpenGL 3.0)Other approaches: IDCT, VQ, differential encoding, normal maps, separable decompositions

Alternate encodingsRGB9 with 5-bit shared exponent (EXT_texture_shared_exponent)Spherical harmonicsSum of product decompositions

Page 17: Anatomy of a Texture Fetch

Texture Fetch Functionality (4)

Pre-filtering operationsGamma correction (OpenGL 2.1)

Table: sRGB / arbitraryShadow map comparison (OpenGL 1.4)

Compare functions: LEQUAL, GREATER, etc. (OpenGL 1.5)Needs “R” depth value per texel

Palette lookup (EXT_paletted_texture)Thresh-holding

Color keyGeneralized thresh-holding

Page 18: Anatomy of a Texture Fetch

Delicate Color Fidelity with sRGB

Problem: PC display devices have a legacy non-linear (sRGB) display gamut—delicate color shading looks wrong

Conventional

rendering(uncorrect

ed color)

Gamma correct(sRGB rendered)

Softer and more natural

Unnaturallydeep facial

shadows

NVIDIA’s Adriana GeForce 8 Launch Demo

Implication for texture hardware: Should perform sRGB-to-linear color space convert per-texel, so 24 scalar conversions—or more—per fetch

Page 19: Anatomy of a Texture Fetch

Texture Fetch Functionality (5)

OptimizationsLevel-of-detail weighting adjustments

Mid-maps (extra pre-filtered levels in-between existing levels)

Unconventional usesBitmap textures for fonts with large filters (Direct3D 10)

Rip-mapping

Non-uniform texture border color

Clip-mapping (SGIX_clipmap)

Multi-texel borders

Silhouette maps (Pardeep Sen’s work)

Shadow mapping

Sharp piecewise linear magnification

Page 20: Anatomy of a Texture Fetch

Phased Data Flow

Must hide long memory read latency between Selection and Combination phases

Memoryreads for samples

FIFOing of combination

parameters

Filteredtexel vector

TexelSelection

TexelCombination

Texel offsets

Texel data

Texture images

Combination parameters

Texture parameters

Texture coordinate

vector

Page 21: Anatomy of a Texture Fetch

What really happens?

Let’s consider a simple tri-linear mip-mapped 2D projective texture fetch

Logically just one instructionfloat4 color = tex2Dproj(decalSampler, st);

TXP o[COLR], f[TEX3], TEX2, 2D;

LogicallyTexel selection

Texel combination

How many operations are involved?

Assembly instruction

(NV_fragment_program)

High-level language

statement (Cg/HLSL)

Page 22: Anatomy of a Texture Fetch

Medium-Level Dissectionof a Texture Fetch

Converttexel

coordsto

texeloffsets

integer /fixed-point

texelcombination

texel offsets

texel data

texture images

combinationparameters

interpolatedtexture coords

vector

texture parameters

Converttexturecoords

totexel

coords

filteredtexelvector

texel coords

floor /frac

integercoords & fractional

weights floating-pointscaling

andcombination

integer /fixed-pointtexelintermediates

Page 23: Anatomy of a Texture Fetch

Interpolation

First we need to interpolate (s,t,r,q)This is the f[TEX3] part of the TXP instructionProjective texturing means we want (s/q, t/q)

And possible r/q if shadow mapping

In order to correct for perspective, hardware actually interpolates(s/w, t/w, r/w, q/w)

If not projective texturing, could linearly interpolate inverse w (or 1/w)Then compute its reciprocal to get w

Since 1/(1/w) equals w Then multiply (s/w,t/w,r/w,q/w) times w

To get (s,t,r,q)

If projective texturing, we can insteadCompute reciprocal of q/w to get w/qThen multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)

Observe projective texturing is same cost as perspective correction

Page 24: Anatomy of a Texture Fetch

Interpolation Operations

Ax + By + C per scalar linear interpolation2 MADs

One reciprocal to invert q/w for projective texturingOr one reciprocal to invert 1/w for perspective texturing

Then 1 MUL per component for s/w * w/qOr s/w * w

For (s,t) means4 MADs, 2 MULs, & 1 RCP(s,t,r) requires 6 MADs, 3 MULs, & 1 RCP

All floating-point operations

Page 25: Anatomy of a Texture Fetch

Texture Space Mapping

Have interpolated & projected coordinates

Now need to determine what texels to fetch

Multiple (s,t) by (width,height) of texture base levelCould convert (s,t) to fixed-point first

Or do math in floating-point

Say based texture is 256x256 so

So compute (s*256, t*256)=(u,v)

Page 26: Anatomy of a Texture Fetch

Mipmap Level-of-detail Selection

Tri-linear mip-mapping means compute appropriate mipmap levelHardware rasterizes in 2x2 pixel entities

Typically called quad-pixels or just quadFinite difference with neighbors to get change in u and v with respect to window space

Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂yMeans 4 subtractions per quad (1 per pixel)

Now compute approximation to gradient lengthp = max(sqrt((∂u/∂x)2+(∂u/∂y)2), sqrt((∂v/∂x)2+(∂v/∂y)2))

one-pixel separation

Page 27: Anatomy of a Texture Fetch

Level-of-detail Bias and Clamping

Convert p length to power-of-two level-of-detail and apply LOD bias

λ = log2(p) + lodBias

Now clamp λ to valid LOD rangeλ’ = max(minLOD, min(maxLOD, λ))

Page 28: Anatomy of a Texture Fetch

Determine Mipmap Levels andLevel Filtering Weight

Determine lower and upper mipmap levelsb = floor(λ’)) is bottom mipmap level

t = floor(λ’+1) is top mipmap level

Determine filter weight between levelsw = frac(λ’) is filter weight

Page 29: Anatomy of a Texture Fetch

Determine Texture Sample Point

Get (u,v) for selected top and bottom mipmap levelsConsider a level l which could be either level t or b

With (u,v) locations (ul,vl)

Perform GL_CLAMP_TO_EDGE wrap modesuw = max(1/2*widthOfLevel(l), min(1-1/2*widthOfLevel(l), u))

vw = max(1/2*heightOfLevel(l), min(1-1/2*heightOfLevel(l), v))

Get integer location (i,j) within each level(i,j) = ( floor(uw* widthOfLevel(l)), floor(vw* ) )

border

edge

s

t

Page 30: Anatomy of a Texture Fetch

Determine Texel Locations

Bilinear sample needs 4 texel locations(i0,j0), (i0,j1), (i1,j0), (i1,j1)

With integer texel coordinatesi0 = floor(i-1/2)

i1 = floor(i+1/2)

j0 = floor(j-1/2)

j1 = floor(j+1/2)

Also compute fractional weights for bilinear filteringa = frac(i-1/2)

b = frac(j-1/2)

Page 31: Anatomy of a Texture Fetch

Determine Texel Addresses

Assuming a texture level image’s base pointer, compute a texel address of each texel to fetch

Assume bytesPerTexel = 4 bytes for RGBA8 texture

Exampleaddr00 = baseOfLevel(l) + bytesPerTexel*(i0+j0*widthOfLevel(l))addr01 = baseOfLevel(l) + bytesPerTexel*(i0+j1*widthOfLevel(l))addr10 = baseOfLevel(l) + bytesPerTexel*(i1+j0*widthOfLevel(l))addr11 = baseOfLevel(l) + bytesPerTexel*(i1+j1*widthOfLevel(l))

More complicated address schemes are needed for good texture locality!

Page 32: Anatomy of a Texture Fetch

Initiate Texture Reads

Initiate texture memory reads at the 8 texel addressesaddr00, addr01, addr10, addr11 for the upper level

addr00, addr01, addr10, addr11 for the lower level

Queue the weights a, b, and wLatency FIFO in hardware makes these weights available when texture reads complete

Page 33: Anatomy of a Texture Fetch

Phased Data Flow

Must hide long memory read latency between Selection and Combination phases

Memoryreads for samples

FIFOing of combination

parameters

Filteredtexel vector

TexelSelection

TexelCombination

Texel offsets

Texel data

Texture images

Combination parameters

Texture parameters

Texture coordinate

vector

Page 34: Anatomy of a Texture Fetch

Texel Combination

When texels reads are returned, begin filteringAssume results are

Top texels: t00, t01, t10, t11Bottom texels: b00, b01, b10, b11

Per-component filtering math is tri-linear filterRGBA8 is four components

result = (1-a)*(1-b)*(1-w)*b00 + (1-a)*b*(1-w)*b*b01 + a*(1-b)*(1-w)*b10 + a*b*(1-w)*b11 + (1-a)*(1-b)*w*t00 + (1-a)*b*w*t01 + a*(1-b)*w*t10 + a*b*w*t11;24 MADs per component, or 96 for RGBA

Lerp-tree could do 14 MADs per component, or 56 for RGBA

Page 35: Anatomy of a Texture Fetch

Total Texture Fetch Operations

Interpolation6 MADs, 3 MULs, & 1 RCP (floating-point)

Texel selectionTexture space mapping

2 MULs (fixed-point)LOD determination (floating-point)

1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2LOD bias and clamping (fixed-point)

1 ADD, 1 MIN, 1 MAXLevel determination and level weighting (fixed-point)

1 FLOOR, 1 ADD, 1 FRACTexture sample point

4 MAXs, 4 MINs, 2 FLOORs (fixed-point)Texel locations and bi-linear weights

8 FLOORs, 4 FRACs, 8 ADDs (fixed-point)Addressing

16 integer MADs (integer)Texel combination

56 fixed-point MADs (fixed-point)

Assuming a fixed-point RGBAtri-linear mipmap filteredprojective texture fetch

Page 36: Anatomy of a Texture Fetch

Observations about the Texture Fetch

Lots of ways to implement the mathLots of clever ways to be efficient

Lots more texture operations not considered in this analysis

Compression

Anisotropic filtering

sRGB

Shadow mapping

Arguably TEX instructions are “world’s most CISC instructions”Texture fetches are incredibly complex instructions

Good deal of GPU’s superiority at graphics operations over CPUs is attributable to TEX instruction efficiency

Good for compute too

Page 37: Anatomy of a Texture Fetch

Take Away Information

The GPU texture fetch is about two orders of magnitude more complex than the most complex CPU instruction

And texture fetches are extremely common

Dozens of billions of texture fetches are expected by modern GPU applications

Not just a graphics thingUsing CUDA, you can access textures from within your compute- and bandwidth-intensive parallel kernels

Page 38: Anatomy of a Texture Fetch

GPU Technology Conference 2010Monday, Sept. 20 - Thurs., Sept. 23, 2010San Jose Convention Center, San Jose, California

The most important event in the GPU ecosystemLearn about seismic shifts in GPU computing

Preview disruptive technologies and emerging applications

Get tools and techniques to impact mission critical projects

Network with experts, colleagues, and peers across industries

Opportunities

Call for SubmissionsSessions & posters

Sponsors / ExhibitorsReach decision makers

“CEO on Stage” Showcase for Startups

Tell your story to VCs and analysts

Opportunities

Call for SubmissionsSessions & posters

Sponsors / ExhibitorsReach decision makers

“CEO on Stage” Showcase for Startups

Tell your story to VCs and analysts