52
PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

Embed Size (px)

Citation preview

Page 1: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS

Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

Page 2: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

2| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Part 1 – Introduction to HD7970 and Partially Resident Textures, Bill Bilodeau

Part 2 – Implementation in OpenGL, Graham Sellers

Part 3 – Ptex, an example PRT application, Karl Hillesland

AGENDA FOR TODAY’S TALK

Page 3: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

3| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

PART 1INTRODUCTION TO THE RADEON

HD7970 AND PARTIALLY RESIDENT TEXTURES

Page 4: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

4| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Partially Resident Textures (PRTs) are textures that have only portions of the texture stored in GPU video memory

Best known example of virtual texturing (software implementation) is John Carmack’s “MegaTextures”

WHAT ARE PARTIALLY RESIDENT TEXTURES?

Image from id Software’s Rage

Page 5: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

5| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

World’s first GPU to have dedicated hardware for Partially Resident Textures

Completely new Shader architecture

Improved cache and memory bandwidth

World’s first Direct3D® 11.1 GPU

RADEON HD7970 OVERVIEW

Page 6: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

6| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Previous AMD GPUs used VLIW (Very Long Instruction Word) architecture

– Combines instructions into a 4-wide VLIW that gets executed on a SIMD

PREVIOUS SHADER ARCHITECTURE

b + c c + d d + e e + fa = b + c;b = c + d;c = d + e;d = e + f;

b + a idle idle idlea = b + c;b = a + c;c = b + a;d = c + d;

Shader Instructions VLIW Instruction

b + c idle idle idlea + c idle idle idlec + d idle idle idle

X Y Z W

b + c c + d d + e e + f

b + c c + d d + e e + f

b + c c + d d + e e + f

b + c idle idle idle

b + c idle idle idle

b + c idle idle idle

a + c idle idle idle

a + c idle idle idle

a + c idle idle idle

b + a idle idle idle

b + a idle idle idle

b + a idle idle idle

c + d idle idle idle

c + d idle idle idle

c + d idle idle idle

Thread 0

Thread 1

Thread 2

Thread 63

Page 7: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

7| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

64-wide SIMD architecture without VLIW instructions

– No need to combine instructions, since multiple threads can run in parallel

NEW SHADER ARCHITECTURE

b + a b + a b + a

a = b + c;b = a + c;c = b + a;d = c + d;

Shader Instructions ALUs

b + c b + c b + ca + c a + c a + cc + d c + d c + d

No idle ALUs!

b + cb + ac + db + c

S0 S1 S2 S63

....

Page 8: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

8| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Each Compute unit consists of 4 SIMDs and one Scalar unit

Higher execution efficiency

Simplified logic design

Simplified assembly language

HD7970 has 32 Compute Units– 4 SIMDs per CU

COMPUTE UNITS ARE THE NEW BASIC BUILDING BLOCK FOR SHADERS

SIMD0 SIMD1

SQInstruction

Buffers/ArbitersScalar ALU

LDS32 banks of

512x32(total – 64kb)

SIMD2 SIMD3

Texture Unit(Data Section)

Texture Unit(Addr Section)

16kbR/WL1

Compute Unit

Page 9: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

9| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Improved Tessellation PerformanceImproved Geometry Shader PerformanceFast depth accept for fully visible triangles, depth bounds testing support384 bit memory busDX11.1 And of course, Partially Resident Texture support!

ADDITIONAL FEATURES OF THE HD7970

Page 10: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

10| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Enables application to manage more texture data than can physically fit in a fixed footprint

– A.k.a. Virtual texturing or Sparse texturing

The principle behind PRT is that not all texture contents is likely to be needed at any given time

– Current render view may only require selected portions of the texture to be resident in memory

– Or selected MIPMap levels

PRT textures only have a portion of their data mapped into GPU-accessible memory at a given time

INTRODUCTION TO PARTIALLY RESIDENT TEXTURES

Page 11: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

11| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

The PRT texture is chunked into 64 KB tiles

–Fixed memory size

–Not dependant on texture type or format

PRT TILES

Highlighted areas represent texture data that needs highest resolution

Chunked texture Texture tiles needing to be resident in GPU memory

Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008

Page 12: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

12| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

The GPU virtual memory page table translates tiles into a resident texture tile pool

TRANSLATION TABLE

Texture Map Texture Tile Pool (Video Memory)

(linear storage)

Unmapped page entryMapped page entry

64Kb tile

Mapped page entry

Texture Map Texture Tile Pool (Video Memory)

(linear storage)

Unmapped page entry64Kb tile

Page Table

Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008

Page 13: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

13| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

MIPMaps can be included in the Texture Tile Pool

TRANSLATION TABLE - MIPMAPS

Texture Map Page Table Texture Tile Pool (Video Memory)

Unmapped page entryMapped page entry

64Kb tile

Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008

Page 14: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

14| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

“FAILED” TEXEL FETCH CONDITION

How does the application know which texture tiles to upload?Answer: PRT-specific texture fetch instructions in a shader

–Return a “Failed” texel fetch condition when sampling a PRT pixel whose tile is currently not in the pool

This information is then stored in render target or UAV

–Texel fetch failed for a given (x,y) tile location...and then copied to the CPU so that application can upload required tilesApp chooses what to render until missing data gets uploaded

Page 15: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

15| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

“LOD WARNING” TEXEL FETCH CONDITION

PRT fetch condition code can also indicate an “LOD Warning”The minimum LOD warning is specified by the application on a per texture basis

If a fetched pixel’s LOD is below the specified LOD warning value then the condition code is returned

This functionality is typically used to try to predict when higher-resolution MIP levels are going to be needed

–E.g. Camera getting closer to PRT-mapped geometry

Page 16: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

16| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

EXAMPLE USAGE

1) App allocates PRT (e.g. 16kx16k DXT1) using PRT API

2) App uploads MIP levels using API calls

3) Shader fetches PRT data at specified texcoords

Two possibilities:3a) Texel data belongs to a resident (64KB) tile

- Valid color returned, no error code3b) Texel data points to non-resident tile or specified LOD

- Error/LOD Warning code returned- Shader writes tile location and error code to RT or UAV

4) App reads RT or UAV and upload/release new tiles as needed

Page 17: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

17| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

PRT ADVANTAGES VS SOFTWARE IMPLEMENTATION

PRT

Ease of implementation•Eliminates the complexity and limitations of SW solutions

Full filtering support•Includes anisotropic filtering

Full-speed filtering•SW solution requires “manual” filtering in pixel shader•Can be quite costly if anisotropic filtering is used

Don’t go overboard with PRT allocation!•Page table entry size is 4 DWORDs•Have to be resident in video memory

Software Impementation

Page 18: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

18| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

PART 2IMPLEMENTATION IN OpenGL AMD_sparse_texture Extension

Page 19: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

19| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Partially Resident Textures exposed in OpenGL via extension

Two design goals for the extension

– Minimally invasive to the API

Easy to retrofit into existing application

Plays well with non-sparse textures

– Easy fallback path

Most of the same code will work in the absence of the extension

Two parts to the extension

– Update to the API – 1 function, a hand full of tokens

– Update to the shading language

OPENGL EXTENSION | AMD_sparse_texture

Page 20: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

20| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Use of immutable texture storage

This is the existing OpenGL immutable storage API – declare storage, specify image data

UPLOAD TEXTURES | Example Using Existing OpenGL API

GLuint tex;

glGenTextures(1, &tex);glBindTexture(GL_TEXTURE_2D, tex);glTexStorage2D(GL_TEXTURE_2D, 10, GL_RGBA8, 1024, 1024);glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);

Page 21: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

21| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Use of sparse texture storage

glTexStorageSparseAMD is the one new function in the extension

– Notice very little difference to previous API

UPLOAD TEXTURES | Example Using New OpenGL Extension

GLuint tex;

glGenTextures(1, &tex);glBindTexture(GL_TEXTURE_2D, tex);glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 1, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD);glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);

Page 22: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

22| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Previous example used glTexSubImage2D

– Upload sub-region of the texture

– Physical pages allocated on demand by the OpenGL driver

– Unused pages remain free

Enough storage for two 256x256 regions allocated

MAKE PAGES RESIDENT | Reuse Existing API

glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 10, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD);glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, data1);glTexSubImage2D(GL_TEXTURE_2D, 0, 768, 768, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, data2);

Page 23: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

23| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Passing NULL to glTexSubImage2D makes pages non-resident

– Driver returns physical pages to the pool

FREE PHYSICAL PAGES | Again, Reuse Existing API

glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, NULL);

Page 24: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

24| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Sparse Textures rely on VM subsystem

– Pages are 64KB in size on Southern Islands

Note size is measured in bytes, not texels

– Texel size of a page depends on texture format

PAGE SIZES | Determining Page Sizes

Page 25: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

25| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Reuse existing API: glGetInternalFormativ

– New OpenGL tokens – GL_VIRTUAL_PAGE_SIZE_{X,Y,Z}_AMD

Given a target (texture dimensionality) and format, returns the page size

– It is not necessary to create a texture to get this information

PAGE SIZE | Retrieving Page Size from OpenGL

GLint page_size_x;

glGetInternalFormativ(GL_TEXTURE_2D, GL_RGBA8, GL_VIRTUAL_PAGE_SIZE_X_AMD, sizeof(GLint), &page_size_x);

Page 26: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

26| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Highest resolution LOD requires multiple pages

Each LOD requires fewer and fewer pages

Eventually, one LOD does not fill a page

– Now what?

At some point, we must make all LODs resident

– But which LOD?

Use glGetInternalFormativ to retrieve the lowest sparse level for a given target/format

– All levels below this reside in the same page and share residency

MIPMAPS | Dealing With Small Textures

GLint min_sparse_level;

glGetInternalFormativ(GL_TEXTURE_2D, GL_RGBA16F, GL_MIN_SPARSE_LEVEL_AMD, 1, &min_sparse_level);

Page 27: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

27| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

To assist in streaming we include a per-texture low water mark

– Set this to the highest resolution LOD that’s fully resident

– Once you hit this, you’ll get a signal in the shader

Returned data is still valid

Signal says it’s time to start streaming the next mip

Exposed using the glTexParameter API

– Here, an LOD warning will be returned to the shader if hardware attempts to access LOD 4 or lower

More on residency returns later...

LOD WARNING | Low Water Mark

glTexParameteri(GL_TEXTURE_2D, GL_MIN_WARNING_LOD_AMD, 4);

Page 28: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

28| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

It is possible to render to a PRT using an FBO

Writes to unmapped regions are simply dropped

RENDERING TO PRT | Attach PRT to FBO

GLuint prt, fbo;

glGenTextures(1, &prt);glBindTexture(GL_TEXTURE_2D, prt);glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 1, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD);glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);glGenFramebuffers(1, &fbo);glBindFramebuffer(GL_FRAMEBUFFER, fbo);glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, prt, 0);

Page 29: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

29| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Applications can read PRTs to CPU memory using existing APIs

– Call glGetTexImage to read the entire content back

– Bind to FBO and use glReadPixels or glBlitFramebuffer Reads to system memory or into another FBO, respectively

READING FROM PRT | Retrieving Data from PRTs

glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, data);

glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, prt, 0);glReadPixels(0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);glBlitFramebuffer(0, 0, 1024, 1024, 0, 0, 128, 128, GL_COLOR_BUFFER_BIT, GL_LINEAR);

Page 30: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

30| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

There are some restrictions on the use of sparse textures

– Dimensions of the base level must be integer multiples of the page size (GL_VIRTUAL_PAGE_SIZE_{X,Y,Z}_AMD)

This means... no sparse textures below this size

– No buffer textures or “TBOs” – another extension is coming for that!

– No depth or stencil textures, nor MSAA textures

RESTRICTIONS | Mostly Everything Works

Page 31: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

31| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Virtual address space is extremely large – 10’s to 100’s of gigabytes

– You will run out eventually, but it’ll take a while

Physical memory is still limited

– glTexSubImage2D etc., may fail

– Draw calls may fail

Feel free to create an 4k x 4k x 4k volume texture

– Don’t try to make it all resident at the same time!

There are no sparse read-backs

– glGetTexImage could read gigabytes of data back

– This will fail

MANAGING FAILURE | Memory is not Unlimited

Page 32: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

32| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

First and most important:

SPARSE TEXTURES IN SHADERS | Extending GLSL

IT IS NOT NECESSARY TO MAKE SHADER CHANGES TO USE SPARSE TEXTURES

Page 33: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

33| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Basic type for textures in GLSL is the ‘sampler’

– Several types of samplers exist... sampler2D, sampler3D, samplerCUBE, sampler2DArray, etc.

– We didn’t add any new sampler types

PRTs look like regular textures in the shader

Textures are read using the ‘texture’ built-in function, its overloads and variants

– We didn’t add any overloads

SPARSE TEXTURES IN SHADERS | Extending GLSL

gvec4 texture(gsampler1D sampler, float P [, float bias]);gvec4 texture(gsampler2D sampler, vec2 P [, float bias]);gvec4 texture(gsampler2DArray sampler, vec3 P [, float bias]);gvec4 textureLod(gsampler2D sampler, vec2 P, float lod);gvec4 textureProj(gsampler2D sampler, vec4 P [, float bias]);gvec4 textureOffset(gsampler2D sampler, vec2 P, ivec2 offset [, float bias]);// ... etc.

Page 34: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

34| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Adding more overloads to existing functions was difficult

– Need to return a status code and a texel

– Need user-specified defaults with conditional move like functionality

– Optional parameters in existing overloads made this very difficult

Added new built-in functions

– New built-in functions return status code

– New built-in functions return texel data via inout parameters

– Most existing texture functions have a sparseTexture equivalent

Non-PRTs work with new functions

– Will appear as fully-resident PRT

EXTENDING GLSL | New Built-in Functions

int sparseTexture(gsampler2D sampler, vec2 P, inout gvec4 texel [, float bias]);int sparseTextureLod(gsampler2D sampler, vec2 P, float lod, inout gvec4 texel);// ... etc.

Page 35: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

35| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

All sparseTexture functions return two pieces of data:

– Texel data via inout parameter

– Residency status code

Texel data returned in inout parameter

– If texel fetch fails, old data remains in variable

– Think of it as a CMOV type operation

Return code is hardware-dependent bit-field information

– More built-in functions for decoding status codes

– This allows us to extend this further in the future, or to change the implementation

EXTENDING GLSL | sparseTexture Functions

int sparseTexture(gsampler2D sampler, vec2 P, inout gvec4 texel [, float bias]);

Page 36: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

36| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Texel data is returned in inout parameter

– No direct support for ‘default value’ behavior

– This is emulated in the shader:

Note that regular texture fetch functions work on PRTs too:

– Value of texel is undefined if you miss ...

... but feel free to use on known-resident data (atlases, explicit LoD, etc.)

sparseTexture FUNCTIONS | Texture Data Return

vec4 texel = vec4(1.0, 0.0, 0.7, 1.0); // Default value

sparseTexture(s, texCoord, texel);

// On success, texel contains texture data. On failure, it has the shader-supplied// default value in it (pinkish magenta here).

vec4 texel = texture(s, texCoord);

Page 37: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

37| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Residency data is bit-packed into the return value from the fetch

After this, code can be interpreted by three additional functions:

sparseTexture FUNCTIONS | Residency Data Return

vec4 texel = vec4(1.0, 0.0, 0.7, 1.0); // Default valueint code;

code = sparseTexture(s, texCoord, texel);

bool sparseTexelResident(int code);bool sparseTexelMinLodWarning(int code);int sparseTexelLodWarningFetch(int code);

Page 38: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

38| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

sparseTexelResident simply indicates whether the data fetched is valid

Returns true if data is valid, false otherwise

Texel miss is generated if any required sample is not resident, including:

– Texels required for bilinear or trilinear sampling

– Missing mip maps

– Anisotropic filter taps

It is up to the shader to ‘do the right thing’

– Fall back to lower mips

– Write out to an image or framebuffer attachment

– etc., etc.

RESIDENCY DATA | sparseTexelResident

bool sparseTexelResident(int code);

Page 39: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

39| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

sparseTexelMinLodWarning returns true if a min LOD warning was generated

– This occurs when generating the returned texel required fetching from an LOD lower than the low-water mark specified by the application

– This can be a signal to the application to start streaming more mip levels

RESIDENCY DATA | sparseTexelMinLodWarning

bool sparseTexelMinLodWarning(int code);

Page 40: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

40| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Returns the LOD that caused the low-watermark warning to be generated

– This also causes sparseTexelMinLodWarning to return true

– sparseTexelLodWarningFetch returns 0 if the warning was not hit

RESIDENCY DATA | sparseTexelLodWarningFetch

int sparseTexelLodWarningFetch(int code);

Page 41: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

41| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Drop in replacement for traditional 2D Sparse Virtual Texture (SVT)

– Well, almost – maximum texture size hasn’t increased

Very large texture arrays

– Sparsely populate array

– Can almost eliminate texture binds in some applications

Volume textures + ray marching

– Sparse or homogeneous media

– Default value is maximum step distance for ray marching distance fields

Arrays of variable sized textures

– Make a large array, but populate different mip levels in each slice

– Store LOD bias per array slice in an auxiliary array (UBO, for example)

Etc., etc., etc.

EXAMPLE USE CASES | What Can I Use This For?

Page 42: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

42| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

PART 3PRT PTEX

PTex Using Sparse Textures

Page 43: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

43| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Ptex: Per-face Texture Mapping for Production Rendering

[Burley and Lacewell, 2008]

No UV setup (it’s implicit)

No Seams

Per-Patch Resolution Control

Out-of-core Performance Advantages

PTEX | Introduction

Page 44: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

44| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Ptex: Per-face Texture Mapping for Production Rendering

[Burley and Lacewell, 2008]

Per-face textures + MIPs

Adjacency for filtering

PTEX | Introduction

Page 45: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

45| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

BORDERS FOR FILTERING

Face Texture A Face Texture B

Page 46: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

46| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

MANUAL TRILINEAR FILTERING

Resolution Lookup

(ddx ddy)Lerp

floor

floor+1

frac

Page 47: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

47| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

PRT PTEX

Packed in one texture array

– Slice per resolution

– Resolution includes MIPs

– Cannot fit in standard MIP chain

– Easy lookups

– Easy resolution management

– Still one texture

Page 48: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

48| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

PRT PTEX PRAGMATICS

Better organization possibilities

– Pack pages

– Scaled squares

Other Methods

– Packed Ptex – all in one texture slice

– Face per slice, array per resolution

Page 49: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

49| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

MULTIRES SLICES

Page 50: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

50| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

MIP FALLBACK

Resolution Lookup

(ddx ddy)Lerp

floor

floor+1

frac

Page 51: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

Demo

Page 52: PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD

52| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012

Trademark Attribution

AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners.

©2012 Advanced Micro Devices, Inc. All rights reserved.