Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
GliftGlift: Generic, Efficient: Generic, EfficientRandomRandom--Access GPU Data StructuresAccess GPU Data Structures
Aaron LefohnAaron LefohnUniversity of California, DavisUniversity of California, Davis
Aaron LefohnUniversity of California, Davis
Problem StatementProblem Statement
•• GoalGoal•• Simplify creation and use of randomSimplify creation and use of random--access GPU access GPU
data structures for graphics and GPGPU data structures for graphics and GPGPU programmingprogramming
•• ContributionsContributions•• Abstraction for GPU data structuresAbstraction for GPU data structures
•• GliftGlift template librarytemplate library
•• IteratorIterator computation model for computation model for GPUsGPUs
Aaron LefohnUniversity of California, Davis
CollaboratorsCollaborators•• Joe Joe KnissKniss
University of UtahUniversity of Utah
•• Robert Robert StrzodkaStrzodkaStanford UniversityStanford University
•• ShubhabrataShubhabrata SenguptaSenguptaUniversity of California, DavisUniversity of California, Davis
•• John OwensJohn OwensUniversity of California, DavisUniversity of California, Davis
Aaron LefohnUniversity of California, Davis
Many Interesting GPU Data StructuresMany Interesting GPU Data Structures
•• Photon mapPhoton map PurcellPurcell
•• Sparse matrixSparse matrix BoltzBoltz, Krueger, Krueger
•• Sparse simulation gridSparse simulation grid LefohnLefohn
•• PolycubePolycube (3D grid, (3D grid, cubeMapcubeMap, , ……)) TariniTarini
•• NN--treetree LefebvreLefebvre
•• ButBut……•• No way to distribute/reuse implementationsNo way to distribute/reuse implementations
•• Complexity stifles innovationComplexity stifles innovation
Motivation
Aaron LefohnUniversity of California, Davis
CPU Software DevelopmentCPU Software Development
•• BenefitsBenefits•• Algorithms and data structures expressed in problem domainAlgorithms and data structures expressed in problem domain
•• Decouple algorithms and data structuresDecouple algorithms and data structures
•• Code reuseCode reuse
Motivation
Application
Data Structure Library
CPU Memory
Algorithm Library
Aaron LefohnUniversity of California, Davis
GPU Software DevelopmentGPU Software Development
•• ProblemsProblems•• Code is tangled mess of algorithm and data structure accessCode is tangled mess of algorithm and data structure access
•• Algorithms expressed in GPU memory domainAlgorithms expressed in GPU memory domain
•• No code reuseNo code reuse
Application- Data structure and algorithm
GPU Memory
Motivation
Aaron LefohnUniversity of California, Davis
GPU Data StructuresGPU Data Structures
•• WhatWhat’’s Missing?s Missing?•• Standalone abstraction for GPU data structures for Standalone abstraction for GPU data structures for
graphics or GPGPU programminggraphics or GPGPU programming
Motivation
C++ Cg OpenGL
STL ???
ShScoutBrook
Aaron LefohnUniversity of California, Davis
•• CPU (C++)CPU (C++)
float srcData[10][10][10];float srcData[10][10][10];
float dstData[10][10][10];float dstData[10][10][10];
…… initialize data initialize data ……
forfor ((size_tsize_t z = 1; z < 10; ++z) {z = 1; z < 10; ++z) {
forfor ((size_tsize_t y = 1; z < 10; ++y) {y = 1; z < 10; ++y) {
forfor ((size_tsize_t x = 1; z < 10; ++x) {x = 1; z < 10; ++x) {
dst[z][y][xdst[z][y][x] = log( 1 + ] = log( 1 + src[z][y][xsrc[z][y][x] );] );
}}
}}
}}
Simple ExampleSimple ExampleMotivation
Aaron LefohnUniversity of California, Davis
We Want To Transform ThisWe Want To Transform This……
•• GPU (Cg)GPU (Cg)
float3float3 getAddr3D( getAddr3D( float2float2 winPoswinPos, , float2float2 winSizewinSize, , float3float3 sizeConst3D ) {sizeConst3D ) {float3float3 curAddr3D;curAddr3D;float2float2 winPosIntwinPosInt = = floor(winPosfloor(winPos););floatfloat addr1D = addr1D = winPosInt.ywinPosInt.y * * winSize.xwinSize.x + + winPosInt.xwinPosInt.x;;
addr3D.z = floor( addr1D / sizeConst3D.z );addr3D.z = floor( addr1D / sizeConst3D.z );addr1D addr1D --= addr3D.z * sizeConst3D.z; = addr3D.z * sizeConst3D.z; addr3D.y = floor( addr1D / sizeConst3D.y );addr3D.y = floor( addr1D / sizeConst3D.y );addr3D.x = addr1D addr3D.x = addr1D -- addr3D.y * sizeConst3D.y;addr3D.y * sizeConst3D.y;
returnreturn addr3D;addr3D;}}
float3float3 logAlg(logAlg(uniformuniform samplerRECTsamplerRECT data, data, uniform float2uniform float2 winSizewinSize, , uniformuniform float3float3 sizeConst3D,sizeConst3D,
float2float2 winPoswinPos : WPOS ) : COLOR: WPOS ) : COLOR{{
float3float3 addr3D = getAddr3D( addr3D = getAddr3D( winPoswinPos, , winSizewinSize, sizeConst3D );, sizeConst3D );floatfloat data = data = texRECTtexRECT(data(data, addr3D );, addr3D );returnreturn log( 1 + data );log( 1 + data );
}}
Motivation
Aaron LefohnUniversity of California, Davis
We Want To Transform ThisWe Want To Transform This……
•• GPU (Cg and C++)GPU (Cg and C++)
float3float3 getAddr3D( getAddr3D( float2float2 winPoswinPos, , float2float2 winSizewinSize, , float3float3 sizeConst3D ) {sizeConst3D ) {
float3float3 curAddr3D;curAddr3D;float2float2 winPosIntwinPosInt = = floor(winPosfloor(winPos););floatfloat addr1D = addr1D = winPosInt.ywinPosInt.y * * winSize.xwinSize.x + + winPosInt.xwinPosInt.x;;
addr3D.z = floor( addr1D / sizeConst3D.z );addr3D.z = floor( addr1D / sizeConst3D.z );addr1D addr1D --= addr3D.z * sizeConst3D.z; = addr3D.z * sizeConst3D.z; addr3D.y = floor( addr1D / sizeConst3D.y );addr3D.y = floor( addr1D / sizeConst3D.y );addr3D.x = addr1D addr3D.x = addr1D -- addr3D.y * sizeConst3D.y;addr3D.y * sizeConst3D.y;
returnreturn addr3D;addr3D;
}}
float3float3 logAlg(logAlg(uniformuniform samplerRECTsamplerRECT data, data, uniform float2uniform float2 winSizewinSize, , uniformuniform float3float3 sizeConst3D,sizeConst3D,
float2float2 winPoswinPos : WPOS ) : COLOR: WPOS ) : COLOR
{{
float3float3 addr3D = getAddr3D( addr3D = getAddr3D( winPoswinPos, , winSizewinSize, sizeConst3D );, sizeConst3D );
floatfloat data = data = texRECTtexRECT(data(data, addr3D );, addr3D );
returnreturn log( 1 + data );log( 1 + data );
}}
Motivation
GLuintGLuint srcDataIdsrcDataId = 1;= 1;
glBindTexture(GL_TEXTURE_RECTANGLE_ARBglBindTexture(GL_TEXTURE_RECTANGLE_ARB, , srcDataIdsrcDataId););
glTexParameteri(GL_TEXTURE_RECTANGLE_ARBglTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_MIN_FILTER, GL_NEAREST);, GL_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_RECTANGLE_ARBglTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_MAG_FILTER, GL_NEAREST);, GL_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_RECTANGLE_ARBglTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_WRAP_S, GL_CLAMP);, GL_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_RECTANGLE_ARBglTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_WRAP_T, GL_CLAMP);, GL_WRAP_T, GL_CLAMP);
glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_LUMINANCE32F_ARB, glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_LUMINANCE32F_ARB, 0, 0, 40, 40, GL_LUMINANCE, NULL);0, 0, 40, 40, GL_LUMINANCE, NULL);
GLuintGLuint dstDataIddstDataId = 2;= 2;
glBindTexture(GL_TEXTURE_RECTANGLE_ARBglBindTexture(GL_TEXTURE_RECTANGLE_ARB, , dstDataIddstDataId););
glTexParameteri(GL_TEXTURE_RECTANGLE_ARBglTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_MIN_FILTER, GL_NEAREST);, GL_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_RECTANGLE_ARBglTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_MAG_FILTER, GL_NEAREST);, GL_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_RECTANGLE_ARBglTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_WRAP_S, GL_CLAMP);, GL_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_RECTANGLE_ARBglTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_WRAP_T, GL_CLAMP);, GL_WRAP_T, GL_CLAMP);
glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_LUMINANCE32F_ARB, glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_LUMINANCE32F_ARB, 0, 0, 40, 40, GL_LUMINANCE, NULL);0, 0, 40, 40, GL_LUMINANCE, NULL);
…… Initialize data Initialize data ……
Aaron LefohnUniversity of California, Davis
Into This.Into This.
•• GPU (C++ and Cg with GPU (C++ and Cg with GliftGlift))typedeftypedef glift::ArrayGpuglift::ArrayGpu<vec3i,vec1f><vec3i,vec1f> ArrayType;ArrayTypeArrayType srcsrc( vec3i(10,10,10) );( vec3i(10,10,10) );ArrayTypeArrayType dstdst( vec3i(10,10,10) );( vec3i(10,10,10) );
…… initialize data initialize data ……
floatfloat logAlglogAlg( ( ElementIterElementIter srcDatasrcData ) : COLOR) : COLOR{{
returnreturn log( 1 + log( 1 + srcData.valuesrcData.value() );() );}}
Motivation
Aaron LefohnUniversity of California, Davis
OverviewOverview
•• Motivation and Previous WorkMotivation and Previous Work•• AbstractionAbstraction•• ImplementationImplementation•• ExamplesExamples•• ConclusionsConclusions
Aaron LefohnUniversity of California, Davis
Abstraction Design GoalsAbstraction Design Goals
•• GPU data structure abstraction thatGPU data structure abstraction that•• Enables easy creation of new structures Enables easy creation of new structures
•• Is minimal abstraction of GPU memory modelIs minimal abstraction of GPU memory model
•• Separates data structures and algorithmsSeparates data structures and algorithms
•• Encourages efficiencyEncourages efficiency
Abstraction
Aaron LefohnUniversity of California, Davis
Building the AbstractionBuilding the Abstraction
•• ApproachApproach•• BottomBottom--up, working towards STLup, working towards STL--like syntaxlike syntax
•• Identify common patterns in GPU papers and codeIdentify common patterns in GPU papers and code
•• Inspired byInspired by
••STL, Boost, Brook, STAPL, STL, Boost, Brook, STAPL, StepanovStepanov
Abstraction
Aaron LefohnUniversity of California, Davis
What is the GPU Memory Model?What is the GPU Memory Model?
•• CPU interfaceCPU interface•• glTexImageglTexImage mallocmalloc•• glDeleteTexturesglDeleteTextures freefree•• glTexSubImageglTexSubImage memcpymemcpy GPU GPU --> CPU> CPU•• glGetTexSubImageglGetTexSubImage** memcpymemcpy CPU CPU --> GPU> GPU•• glCopyTexSubImageglCopyTexSubImage memcpymemcpy GPU GPU --> GPU> GPU•• glBindTextureglBindTexture readread--only only parameter bindparameter bind•• glFramebufferTextureglFramebufferTexture writewrite--only only parameter bindparameter bind
* * Does not exist. Emulate withDoes not exist. Emulate with glReadPixelsglReadPixels
Abstraction
Aaron LefohnUniversity of California, Davis
What is the GPU Memory Model?What is the GPU Memory Model?•• GPU Interface (shown in Cg)GPU Interface (shown in Cg)
•• uniform uniform samplerNDsamplerND data structure data structure paramparam declarationdeclaration•• texND(textexND(tex, , addraddr)) randomrandom--access readaccess read
•• varying varying floatNfloatN streamstream streamstream parameter declarationparameter declaration•• streamstream streamstream readread
Abstraction
Aaron LefohnUniversity of California, Davis
GPU Data Structure AbstractionGPU Data Structure Abstraction
•• Factor GPU data structures intoFactor GPU data structures into•• Physical memoryPhysical memory
•• Virtual memoryVirtual memory
•• Address translatorAddress translator
•• IteratorsIterators
Abstraction
Aaron LefohnUniversity of California, Davis
Physical MemoryPhysical Memory
•• Native GPU texturesNative GPU textures•• Choose based on algorithm efficiency requirementsChoose based on algorithm efficiency requirements
•• 1D, 2D, 3D, Cube, 1D, 2D, 3D, Cube, MipMip
••DimensionalityDimensionality••ReadRead--only vs. readonly vs. read--writewrite••PointPoint--sample vs. filteringsample vs. filtering••Maximum sizeMaximum size
Abstraction
Aaron LefohnUniversity of California, Davis
Virtual MemoryVirtual Memory
•• Virtual NVirtual N--D address spaceD address space•• Choose based on problem space of algorithmChoose based on problem space of algorithm
•• Defined by physical memory and address translatorDefined by physical memory and address translator
Abstraction
Virtual representation of memory: 3D grid
Translation
3D native mem
Translation
2D slices
Translation
Flat 3D texture
Aaron LefohnUniversity of California, Davis
Address TranslatorAddress Translator
•• Mapping between physical and virtual Mapping between physical and virtual addrsaddrs
•• Core of data structureCore of data structure
•• Small amount of code defines Small amount of code defines allall required CPU and required CPU and GPU memory interfacesGPU memory interfaces
Abstraction
PhysicalAddress
VirtualAddress
Aaron LefohnUniversity of California, Davis
Address TranslatorAddress Translator
•• Core of data structureCore of data structure•• Extension point for creating new structuresExtension point for creating new structures
•• Must defineMust define
translate(translate(……))translate_rangetranslate_range((……))
Implementation
Aaron LefohnUniversity of California, Davis
Address Translator ClassificationsAddress Translator Classifications•• RepresentationRepresentation
•• Analytic / DiscreteAnalytic / Discrete
•• Memory ComplexityMemory Complexity•• O(1), O(1), O(logO(log N), O(N), N), O(N), ……
•• Compute ComplexityCompute Complexity•• O(1), O(1), O(logO(log N), O(N), N), O(N), ……
Abstraction
•• Compute ConsistencyCompute Consistency•• Uniform vs. nonUniform vs. non--uniformuniform
•• Total / PartialTotal / Partial•• Complete vs. sparseComplete vs. sparse
•• OneOne--toto--one / Manyone / Many--toto--oneone•• Uniform vs. adaptiveUniform vs. adaptive
Aaron LefohnUniversity of California, Davis
Data Structure ExamplesData Structure Examples
•• Brook streamsBrook streams (Buck et al. 2004)(Buck et al. 2004)
Abstraction
1D Virtual 2D Physical
Aaron LefohnUniversity of California, Davis
Data Structure ExamplesData Structure Examples
•• Brook streamsBrook streams (Buck et al. 2004)(Buck et al. 2004)•• Physical addressPhysical address 2D2D
•• Virtual addressVirtual address NN--DD
•• Address translatorAddress translator NDND--toto--2D2D
••AnalyticAnalytic••O(1) memoryO(1) memory••O(1) computeO(1) compute••Uniform consistencyUniform consistency••Total, uniform mappingTotal, uniform mapping
Abstraction
Aaron LefohnUniversity of California, Davis
Data Structure ExamplesData Structure Examples
•• Dynamic sparse 3D grid Dynamic sparse 3D grid (Lefohn et al. 2003)(Lefohn et al. 2003)
Application
Physical MemoryPage TableVirtual Domain
Aaron LefohnUniversity of California, Davis
Data Structure ExamplesData Structure Examples
•• Dynamic sparse 3D gridDynamic sparse 3D grid (Lefohn et al. 2003)(Lefohn et al. 2003)
•• Physical addressPhysical address 2D2D
•• Virtual addressVirtual address 3D3D
•• Address translatorAddress translator 3D page table3D page table
••DiscreteDiscrete••O(N) memoryO(N) memory••O(1) computeO(1) compute••Uniform consistencyUniform consistency••Partial, uniform mappingPartial, uniform mapping
Abstraction
Aaron LefohnUniversity of California, Davis
Data Structure ExamplesData Structure Examples
•• Photon Map (Photon Map (kNNkNN--grid)grid) (Purcell et al. 2003)(Purcell et al. 2003)
Abstraction
Image from “Implementing Efficient Parallel Data Structures on GPUs,”Lefohn et al., GPU Gems II, ch. 33, 2005
Aaron LefohnUniversity of California, Davis
Data Structure ExamplesData Structure Examples
•• Photon Map (Photon Map (kNNkNN--grid)grid) (Purcell et al. 2003)(Purcell et al. 2003)•• Physical addressPhysical address 2D2D
•• Virtual addressVirtual address 3D3D
•• Address translatorAddress translator 3D page table3D page table-- Variable sized phys pagesVariable sized phys pages-- ““Grid of listsGrid of lists””
••DiscreteDiscrete••O(N) memoryO(N) memory••O(L) computeO(L) compute••NonNon--uniform consistencyuniform consistency••Partial, adaptive mappingPartial, adaptive mapping
Abstraction
Aaron LefohnUniversity of California, Davis
GliftGlift IteratorsIterators
•• WeWe’’ve so far only discussed datave so far only discussed data accessaccess•• What about data structure What about data structure traversaltraversal??
Aaron LefohnUniversity of California, Davis
IteratorsIterators
•• Separate algorithms and data structuresSeparate algorithms and data structures•• Minimal interface between data and algorithmMinimal interface between data and algorithm
•• Required for GPGPU use of data structureRequired for GPGPU use of data structure
•• Encapsulate GPGPU optimizationsEncapsulate GPGPU optimizations
Abstraction
Aaron LefohnUniversity of California, Davis
IteratorsIterators
•• Abstract data access and traversalAbstract data access and traversal
DataStructureType::iterator it;
for (it = data.begin(); it != data.end(); ++it)
{
*it = -(*it);
}
Abstraction
Aaron LefohnUniversity of California, Davis
GliftGlift IteratorsIterators
•• Address Address iteratorsiterators•• IteratorIterator value is Nvalue is N--D addressD address
•• GPU GPU interpolantsinterpolants
•• Element Element iteratorsiterators•• IteratorIterator value is data structure elementvalue is data structure element
•• C/C++ pointer, STL C/C++ pointer, STL iteratoriterator, streams, streams
Abstraction
Aaron LefohnUniversity of California, Davis
Element Element IteratorIterator ConceptsConcepts
•• PermissionPermission•• ReadRead--only, writeonly, write--only, readonly, read--writewrite
•• Access regionAccess region•• Single, neighborhood, randomSingle, neighborhood, random
•• TraversalTraversal•• Forward, backward, parallel rangeForward, backward, parallel range
Abstraction
Aaron LefohnUniversity of California, Davis
Which Element Which Element IteratorsIterators??
•• ReadRead--only, single access, range only, single access, range iteratoriterator•• GPU stream inputGPU stream input
•• ReadRead--only, randomonly, random--access, range access, range iteratoriterator•• GPU texture inputGPU texture input
•• WriteWrite--only, single access, range only, single access, range iteratoriterator•• GPU render targetGPU render target
Abstraction
Aaron LefohnUniversity of California, Davis
Example 1 : Example 1 : ““BeforeBefore”” and and ““AfterAfter”” GliftGlift
•• Transform GPU code with Transform GPU code with GliftGlift
Aaron LefohnUniversity of California, Davis
•• 3D Array with 2D physical memory3D Array with 2D physical memory
CPU (C++)CPU (C++)float srcData[10][10][10];float srcData[10][10][10];float dstData[10][10][10];float dstData[10][10][10];
…… initialize data initialize data ……
forfor ((size_tsize_t z = 1; z < 10; ++z) {z = 1; z < 10; ++z) {forfor ((size_tsize_t y = 1; z < 10; ++y) {y = 1; z < 10; ++y) {
forfor ((size_tsize_t x = 1; z < 10; ++x) {x = 1; z < 10; ++x) {dstData[z][y][xdstData[z][y][x] = ] = srcData[zsrcData[z––1][y1][y––1][x1][x––1];1];
}}}}
}}
Simple ExampleSimple ExampleAbstraction
Aaron LefohnUniversity of California, Davis
float3float3 physToVirtphysToVirt( ( float2float2 pa, pa, float2float2 physSizephysSize, , float3float3 virtSizesvirtSizes ) {) {float3float3 vava;;floatfloat addr1D = addr1D = pa.ypa.y * * physSize.xphysSize.x + + pa.xpa.x;;
va.zva.z = floor( addr1D / = floor( addr1D / virtSizes.zvirtSizes.z ););addr1D addr1D --= = va.zva.z * sizeConst3D.z; * sizeConst3D.z; va.yva.y = floor( addr1D / = floor( addr1D / virtSizes.yvirtSizes.y ););va.xva.x = addr1D = addr1D -- va.yva.y * * virtSizes.yvirtSizes.y;;
returnreturn vava;;}}
float2float2 virtToPhysvirtToPhys( ( float3float3 vava, , float2float2 physSizephysSize, , float3float3 virtSizesvirtSizes ) {) {floatfloat addr1D = dot( addr1D = dot( vava, , virtSizesvirtSizes ););floatfloat normAddr1D = addr1D / normAddr1D = addr1D / physSize.xphysSize.x;;float2float2 pa = pa = float2float2(frac(normAddr1D) * (frac(normAddr1D) * physSize.xphysSize.x, normAddr1D);, normAddr1D);
}}
float3float3 main( main( uniform uniform samplerRECTsamplerRECT physMemphysMem, , uniform float2uniform float2 physSizephysSize, , uniformuniform float3float3 virtSizesvirtSizes,,
float2float2 pa : WPOS ) : COLORpa : WPOS ) : COLOR{{
float3float3 vava = = physToVirtphysToVirt( ( floor(pafloor(pa), ), physSizephysSize, , virtSizesvirtSizes ););float3float3 neighborAddrneighborAddr = = vava -- float3(1, 1, 1);float3(1, 1, 1);returnreturn texRECTtexRECT(data(data, virtToPhys(neighborAddr3D, , virtToPhys(neighborAddr3D, physSizephysSize, , virtSizesvirtSizes) );) );
}}
Example 1: Example 1: ShaderShader w/out w/out GliftGliftAbstraction
PhysicalPhysical--toto--Virtual Virtual Address TranslationAddress Translation
VirtualVirtual--toto--PhysicalPhysicalAddress TranslationAddress Translation
Physical Memory ReadPhysical Memory Read
Aaron LefohnUniversity of California, Davis
float3float3 physToVirtphysToVirt( ( float2float2 pa, pa, float2float2 physSizephysSize, , float3float3 virtSizesvirtSizes ) {) {float3float3 vava;;floatfloat addr1D = addr1D = pa.ypa.y * * physSize.xphysSize.x + + pa.xpa.x;;
va.zva.z = floor( addr1D / = floor( addr1D / virtSizes.zvirtSizes.z ););addr1D addr1D --= = va.zva.z * sizeConst3D.z; * sizeConst3D.z; va.yva.y = floor( addr1D / = floor( addr1D / virtSizes.yvirtSizes.y ););va.xva.x = addr1D = addr1D -- va.yva.y * * virtSizes.yvirtSizes.y;;
returnreturn vava;;}}
float2float2 virtToPhysvirtToPhys( ( float3float3 vava, , float2float2 physSizephysSize, , float3float3 virtSizesvirtSizes ) {) {floatfloat addr1D = dot( addr1D = dot( vava, , virtSizesvirtSizes ););floatfloat normAddr1D = addr1D / normAddr1D = addr1D / physSize.xphysSize.x;;float2float2 pa = pa = float2float2(frac(normAddr1D) * (frac(normAddr1D) * physSize.xphysSize.x, normAddr1D);, normAddr1D);
}}
float3float3 main( main( uniform uniform samplerRECTsamplerRECT physMemphysMem, , uniform float2uniform float2 physSizephysSize, , uniformuniform float3float3 virtSizesvirtSizes,,
float2float2 pa : WPOS ) : COLORpa : WPOS ) : COLOR{{
float3float3 vava = = physToVirtphysToVirt( ( floor(pafloor(pa), ), physSizephysSize, , virtSizesvirtSizes ););float3float3 neighborAddrneighborAddr = = vava -- float3(1, 1, 1);float3(1, 1, 1);returnreturn texRECTtexRECT(data(data, virtToPhys(neighborAddr3D, , virtToPhys(neighborAddr3D, physSizephysSize, , virtSizesvirtSizes) );) );
}}
Example 1: Example 1: GliftGlift ComponentsComponentsAbstraction
Address Address IteratorIterator
VirtMemVirtMem
VirtMemVirtMem
Aaron LefohnUniversity of California, Davis
Example 1: GPU Example 1: GPU ShaderShader with with GliftGlift
Cg UsageCg Usage
float3float3 main( main( uniformuniform VMem3D VMem3D srcDatasrcData, ,
AddrIter3DAddrIter3D iteriter ) : COLOR) : COLOR
{{
float3float3 vava = = iter.valueiter.value();();
returnreturn srcData.vTex3D( srcData.vTex3D( vava –– float3(1,1,1) );float3(1,1,1) );
}}
Abstraction
Aaron LefohnUniversity of California, Davis
Example 1: Example 1: GliftGlift Data StructuresData Structures
C++ UsageC++ Usagevec3i origin(0,0,0); vec3i origin(0,0,0); vec3i size(10,10,10);vec3i size(10,10,10);
typedeftypedef ArrayGpuArrayGpu<vec3i,vec1f> <vec3i,vec1f> ArrayTypeArrayType;;ArrayTypeArrayType srcDatasrcData( size );( size );ArrayTypeArrayType dstDatadstData( size );( size );
…… initialize initialize dataPtrdataPtr ……srcData.writesrcData.write( origin, size, ( origin, size, dataPtrdataPtr ););
typedeftypedef ArrayType::addr_transArrayType::addr_trans AddrTransTypeAddrTransType;;AddrTransType::gpu_rangeAddrTransType::gpu_range it = it =
dstData.addr_trans().gpu_range(origindstData.addr_trans().gpu_range(origin, size);, size);
it.bind_for_readit.bind_for_read( ( iterCgParamiterCgParam ););srcData.bind_for_readsrcData.bind_for_read( ( srcCgParamsrcCgParam ););dstData.bind_for_writedstData.bind_for_write( COLOR0, ( COLOR0, myFrameBufferObjectmyFrameBufferObject ););
exec_gpu_iteratorsexec_gpu_iterators(( itit ););
Abstraction
Aaron LefohnUniversity of California, Davis
OverviewOverview
•• Motivation Motivation •• AbstractionAbstraction•• ImplementationImplementation•• ExamplesExamples•• ConclusionsConclusions
Aaron LefohnUniversity of California, Davis
GliftGlift ComponentsComponents
Application
PhysMem AddrTrans
C++ / Cg / OpenGL
VirtMem
Container Adaptors
Implementation
Aaron LefohnUniversity of California, Davis
GliftGlift Design GoalsDesign Goals
•• Efficiency Efficiency •• Easy, incremental adoptionEasy, incremental adoption•• Easily extensibleEasily extensible•• CPU/GPU interoperabilityCPU/GPU interoperability
Implementation
Aaron LefohnUniversity of California, Davis
GliftGlift Design GoalsDesign Goals
•• Efficiency Efficiency •• Static polymorphism (C++ and Cg)Static polymorphism (C++ and Cg)
•• Cg program specializationCg program specialization
•• Cg compiler optimizationsCg compiler optimizations
•• Easy, incremental adoptionEasy, incremental adoption•• Easily extensibleEasily extensible•• CPU/GPU interoperabilityCPU/GPU interoperability
Implementation
Aaron LefohnUniversity of California, Davis
GliftGlift Design GoalsDesign Goals
•• EfficiencyEfficiency•• Easy, incremental adoptionEasy, incremental adoption
•• Integrate with Cg/OpenGL/C++Integrate with Cg/OpenGL/C++
•• STLSTL--like and texturelike and texture--like interfaceslike interfaces
•• Use components alone or Use components alone or compositedcomposited
•• Easily extensibleEasily extensible•• CPU/GPU interoperabilityCPU/GPU interoperability
Implementation
Aaron LefohnUniversity of California, Davis
GliftGlift Design GoalsDesign Goals
•• EfficiencyEfficiency•• Easy, incremental adoptionEasy, incremental adoption•• Easily extensibleEasily extensible
•• Create new structure by:Create new structure by:
••Change behavior of existing address translatorChange behavior of existing address translator••New address translatorNew address translator••New container adaptorNew container adaptor
•• CPU/GPU interoperabilityCPU/GPU interoperability
Implementation
Aaron LefohnUniversity of California, Davis
GliftGlift Design GoalsDesign Goals
•• EfficiencyEfficiency•• Easy, incremental adoptionEasy, incremental adoption•• Easily extensibleEasily extensible•• CPU/GPU interoperabilityCPU/GPU interoperability
•• Unified C++/Cg code baseUnified C++/Cg code base
•• Map memory to CPU or GPUMap memory to CPU or GPU
•• CPU and GPU CPU and GPU iteratorsiterators
Implementation
Aaron LefohnUniversity of California, Davis
C++/Cg IntegrationC++/Cg Integration•• Each component defines C++ and Cg codeEach component defines C++ and Cg code
•• C++ objects have Cg C++ objects have Cg structstruct representationrepresentation
•• StringifiedStringified Cg parameterized by C++ templatesCg parameterized by C++ templates
•• Cg Cg ““templatetemplate”” instantiationinstantiation•• Insert generated Insert generated GliftGlift source code into source code into shadershader
glift::cgGetTemplateTypeglift::cgGetTemplateType<<MyDataStructTypeMyDataStructType>();>();glift::cgInstantiateParameterglift::cgInstantiateParameter((……););
•• All other compilation/loading/binding identical to All other compilation/loading/binding identical to standard standard shadershader
Implementation
Aaron LefohnUniversity of California, Davis
Cg Compilation ExampleCg Compilation Example
•• Cg codeCg codefloat4 main( uniform VMem3D float4 main( uniform VMem3D octreeoctree, ,
float3 float3 coordcoord ) : COLOR ) : COLOR
{{
return octree.vMem3D(coord);return octree.vMem3D(coord);
}}
•• C++ codeC++ codetypedef OctreeGPU<vec4ub> octree_type;
GliftType type = cgGetTemplateType<octree_type>();
CGprogram prog = cgCreateProgram(…);
prog = cgInstantiateParameter(prog, “octree”, type);
cgCompileProgram(prog);
Implementation
Aaron LefohnUniversity of California, Davis
OverviewOverview
•• Motivation and previous workMotivation and previous work•• AbstractionAbstraction•• Case StudyCase Study
•• Adaptive shadow maps and Adaptive shadow maps and octreeoctree 3D paint3D paint
•• ConclusionsConclusions
Aaron LefohnUniversity of California, Davis
Example 2: Adaptive Shadow MapsExample 2: Adaptive Shadow Maps
•• Show Show GliftGlift usage withusage with•• Complex applicationComplex application
•• Complex data structureComplex data structure
Application
Aaron LefohnUniversity of California, Davis
Example 2: Adaptive Shadow MapsExample 2: Adaptive Shadow Maps
•• Fernando et al., ACM SIGGRAPH 2001Fernando et al., ACM SIGGRAPH 2001•• Elegant solution to shadow map aliasingElegant solution to shadow map aliasing
•• QuadtreeQuadtree of small shadow mapsof small shadow maps
•• Shadow maps need resolution only on shadow boundaryShadow maps need resolution only on shadow boundary
•• Required resolution determined by projected area of Required resolution determined by projected area of screen space pixel into light spacescreen space pixel into light space
Application
Aaron LefohnUniversity of California, Davis
Adaptive Shadow MapsAdaptive Shadow Maps
•• Why Adaptive Shadow Maps with Why Adaptive Shadow Maps with GliftGlift??•• Many recent (2004) shadow papers cite Many recent (2004) shadow papers cite ASMsASMs as high as high
quality solution but not possible on graphics hardwarequality solution but not possible on graphics hardware
•• Algorithm is simple. Data structure is hard.Algorithm is simple. Data structure is hard.
Application
Aaron LefohnUniversity of California, Davis
Adaptive Shadow Map AlgorithmAdaptive Shadow Map Algorithm
•• Iterative refinement algorithmIterative refinement algorithm•• Identify shadow pixels w/ resolution mismatchIdentify shadow pixels w/ resolution mismatch
•• Create small shadow map Create small shadow map ““pagespages”” at requested resolutionat requested resolution
•• Shadow lookupShadow lookup•• Compute shadow map coordinate and resolutionCompute shadow map coordinate and resolution
•• Lookup in ASM (tree of small shadow map pages)Lookup in ASM (tree of small shadow map pages)
•• ASM depends on both camera and light position!ASM depends on both camera and light position!
Application
Aaron LefohnUniversity of California, Davis
ASM Data Structure RequirementsASM Data Structure Requirements
•• AdaptiveAdaptive•• MultiresolutionMultiresolution•• Fast, parallel randomFast, parallel random--access readaccess read
•• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
•• TrilinearTrilinear interpolated interpolated mipmappedmipmapped PCFPCF
•• Fast, parallel writeFast, parallel write•• Fast, parallel insert and eraseFast, parallel insert and erase
Application
Aaron LefohnUniversity of California, Davis
ASM Data StructureASM Data Structure
•• Start with page table address translatorStart with page table address translator•• Coarse, uniform Coarse, uniform discretizationdiscretization of virtual domainof virtual domain
•• O(N) memoryO(N) memory O(1) insertO(1) insert•• O(1) computationO(1) computation O(1) eraseO(1) erase•• Uniform consistencyUniform consistency•• Partial mapping (sparse)Partial mapping (sparse)
Application
Aaron LefohnUniversity of California, Davis
ASM Data StructureASM Data Structure
•• Page table examplePage table example
Application
Physical MemoryPage TableVirtual Domain
vpn = va / pageSizeppa = pageTable(vpn)
off = va % pageSizepa = ppa + off
Aaron LefohnUniversity of California, Davis
ASM Data Structure RequirementsASM Data Structure Requirements
•• AdaptiveAdaptive•• MultiresolutionMultiresolution•• Fast, parallel randomFast, parallel random--access readaccess read
•• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
•• TrilinearTrilinear interpolated interpolated mipmappedmipmapped PCFPCF
•• Fast, parallel writeFast, parallel write•• Fast, parallel insert and eraseFast, parallel insert and erase
Application
Aaron LefohnUniversity of California, Davis
ASM Data StructureASM Data Structure
•• Adaptive Page TableAdaptive Page Table•• Map multiple virtual pages to single physical pageMap multiple virtual pages to single physical page
Application
Physical MemoryVirtual Domain
ppa = pageTable(vpn).ppa()
vpn = va / pageSizes = pageTable(vpn).s()off = (va * s) % pageSizepa = ppa + off
Page Table
Aaron LefohnUniversity of California, Davis
ASM Data Structure RequirementsASM Data Structure Requirements
•• AdaptiveAdaptive•• MultiresolutionMultiresolution•• Fast, parallel randomFast, parallel random--access readaccess read
•• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
•• TrilinearTrilinear interpolated interpolated mipmappedmipmapped PCFPCF
•• Fast, parallel writeFast, parallel write•• Fast, parallel insert and eraseFast, parallel insert and erase
Application
Aaron LefohnUniversity of California, Davis
ASM Data StructureASM Data Structure
•• MultiresolutionMultiresolution Page TablePage Table
Application
Physical MemoryVirtual DomainMipmap
Page Table
Aaron LefohnUniversity of California, Davis
ASM Data Structure RequirementsASM Data Structure Requirements
•• AdaptiveAdaptive•• MultiresolutionMultiresolution•• Fast, parallel randomFast, parallel random--access readaccess read
•• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
•• TrilinearTrilinear interpolated interpolated mipmappedmipmapped PCFPCF
•• Fast, parallel writeFast, parallel write•• Fast, parallel insert and eraseFast, parallel insert and erase
Application
Aaron LefohnUniversity of California, Davis
ASM Data Structure RequirementsASM Data Structure Requirements
•• How support bilinear filtering?How support bilinear filtering?•• Duplicate 1 column and 1 row of Duplicate 1 column and 1 row of texelstexels in each pagein each page
•• MipmappedMipmapped trilineartrilinear??•• ““ByBy--handhand”” interpolation between interpolation between mipmapmipmap levelslevels
Application
Aaron LefohnUniversity of California, Davis
ASM Data Structure RequirementsASM Data Structure Requirements
•• AdaptiveAdaptive•• MultiresolutionMultiresolution•• Fast, parallel randomFast, parallel random--access readaccess read
•• 2x2 native Percentage Closer Filtering (PCF)2x2 native Percentage Closer Filtering (PCF)
•• TrilinearTrilinear interpolated interpolated mipmappedmipmapped PCFPCF
•• Fast, parallel writeFast, parallel write•• Fast, parallel insert and eraseFast, parallel insert and erase
Application
Aaron LefohnUniversity of California, Davis
How Define ASM Structure in How Define ASM Structure in GliftGlift??
•• Start with generic page table Start with generic page table AddrTransAddrTrans•• Use Use mipmappedmipmapped PhysMemPhysMem for page tablefor page table
•• Change template parameter to add Change template parameter to add adaptivityadaptivity
•• Write page Write page allocatorallocator•• alloc_pagesalloc_pages, , free_pagesfree_pages
•• FinallyFinally……typedeftypedef PageTableAddrTransPageTableAddrTrans<<……>> PageTablePageTable;;
typedeftypedef PhysMemGPUPhysMemGPU<vec2f, vec1s><vec2f, vec1s> PMem2D;PMem2D;
typedeftypedef VirtMemGPUVirtMemGPU<<PageTablePageTable, PMem2D> , PMem2D> VPageTableVPageTable;;
typedeftypedef AdaptiveMemAdaptiveMem<<VPageTableVPageTable, , PageAllocatorPageAllocator> ASM;> ASM;
Application
Aaron LefohnUniversity of California, Davis
ASM Data Structure UsageASM Data Structure Usagefloat4float4 main(main( uniformuniform VMem2D VMem2D asmasm,,
float3float3 shadowCoordshadowCoord,,
float4float4 litColorlitColor ) : ) : COLORCOLOR
{{
floatfloat isInLightisInLight = asm.vTex2Ds( = asm.vTex2Ds( shadowCoordshadowCoord ););
return lerp( black, return lerp( black, litColorlitColor, , isInLightisInLight ););
}}
asm.bind_for_readasm.bind_for_read( ( …… ););
asm.bind_for_writeasm.bind_for_write( ( …… ););
asm.alloc_pagesasm.alloc_pages( ( …… ););
asm.free_pageasm.free_page( ( …… ););
……
Application
Aaron LefohnUniversity of California, Davis
Adaptive Shadow Map AlgorithmAdaptive Shadow Map Algorithm
•• Faithful to Fernando et al. 2001Faithful to Fernando et al. 2001•• Refinement algorithmRefinement algorithm
•• Identify shadow pixels w/ resolution mismatch (GPU)Identify shadow pixels w/ resolution mismatch (GPU)
•• Compact pixels into small stream (GPU)Compact pixels into small stream (GPU)
•• CPU reads back compacted stream (GPUCPU reads back compacted stream (GPU CPU)CPU)
•• Allocate pagesAllocate pages
•• Draw new Draw new PTEsPTEs into into mipmapmipmap page tables (CPUpage tables (CPU GPU)GPU)
•• Draw depth into ASM for each new page (GPU)Draw depth into ASM for each new page (GPU)
Application
Aaron LefohnUniversity of California, Davis
[Thanks to Yong Kil for the tree model]
ASM: Effective resolution 131,0722 (37 MB); SM: 20482
Aaron LefohnUniversity of California, Davis
““OctreeOctree”” 3D Paint3D Paint•• Interactive painting on Interactive painting on unparameterizedunparameterized 3D surfaces3D surfaces
•• 3D version of ASM data structure3D version of ASM data structure
•• Differs from previous work:Differs from previous work:•• QuadrilinearQuadrilinear filteringfiltering
•• O(1), uniform accessO(1), uniform access
•• Interactive withInteractive witheffectiveeffectiveresolutionsresolutionsbetweenbetween646433 and 2048and 204833
Application
Aaron LefohnUniversity of California, Davis
DemoDemo
Aaron LefohnUniversity of California, Davis
ASM ResultsASM Results
•• Effective shadow map resolution up to Effective shadow map resolution up to 131,072131,07222
161622 -- 646422 page sizepage size5125122 2 -- 204820482 2 page tablepage table204820482 2 -- 4096409622 physical memoryphysical memory20 20 -- 80 MB80 MB
•• Performance (45k polygon model)Performance (45k polygon model)•• 15 fps while moving camera (including refinement)15 fps while moving camera (including refinement)
•• 55--10 fps while moving light10 fps while moving light
•• Lookup time compared to 2048Lookup time compared to 204822 shadow map:shadow map:•• Bilinear filtered: 90% performance of traditionalBilinear filtered: 90% performance of traditional
•• TrilinearTrilinear filtered filtered mipmappedmipmapped: 73%: 73%
Application
Aaron LefohnUniversity of California, Davis
GliftGlift ResultsResults•• Static instruction resultsStatic instruction results
•• With Cg program specializationWith Cg program specialization
GliftGlift ByBy--HandHand BrookBrook•• 1D 1D 2D2D 44 33 44
•• 3D page table3D page table 55 55
•• ASM ASM 99 99
•• OctreeOctree 1010 99
•• ASM + offsetASM + offset 1010 99
•• Conclusion : Conclusion : GliftGlift structures within 1 structures within 1 instrinstr of handof hand--coded Cgcoded Cg
Measured with Measured with NVShaderPerfNVShaderPerf, NVIDIA driver 75.22, Cg 1.4a, NVIDIA driver 75.22, Cg 1.4a
Application
Aaron LefohnUniversity of California, Davis
OverviewOverview
•• Motivation and previous workMotivation and previous work•• AbstractionAbstraction•• ImplementationImplementation•• ExamplesExamples•• ConclusionsConclusions
Aaron LefohnUniversity of California, Davis
SummarySummary
•• GPU programming needs data structure GPU programming needs data structure abstractionabstraction•• Separate data structures and algorithmsSeparate data structures and algorithms
•• More complex data structures and algorithmsMore complex data structures and algorithms
•• Why programmable address translation?Why programmable address translation?•• Common pattern in GPU data structuresCommon pattern in GPU data structures
•• Small amount of code virtualizes GPU memory modelSmall amount of code virtualizes GPU memory model
Aaron LefohnUniversity of California, Davis
SummarySummary
•• GliftGlift template librarytemplate library•• Generic C++/Cg implementation of abstractionGeneric C++/Cg implementation of abstraction
•• Nearly as efficient as hand codingNearly as efficient as hand coding
•• Integrates with OpenGL/CgIntegrates with OpenGL/Cg
•• IteratorIterator computation modelcomputation model•• Generalize GPU computation modelGeneralize GPU computation model
•• Can future rasterizer increment Can future rasterizer increment iterators?iterators?
Aaron LefohnUniversity of California, Davis
AcknowledgementsAcknowledgements•• Craig Kolb, Nick Craig Kolb, Nick TriantosTriantos, Cass , Cass EverittEveritt NVIDIANVIDIA•• Fabio Fabio PellaciniPellacini DartmouthDartmouth
•• Adam Adam MoerschellMoerschell, Yong , Yong KilKil UCDavisUCDavisSerbanSerban PorumbescuPorumbescu, Chris Co, , Chris Co, ……..
•• Ross Whitaker, Chuck Hansen, Milan Ross Whitaker, Chuck Hansen, Milan IkitsIkits U. of UtahU. of Utah
•• National Science Foundation Graduate FellowshipNational Science Foundation Graduate Fellowship•• Department of EnergyDepartment of Energy
Aaron LefohnUniversity of California, Davis
More InformationMore Information•• Upcoming paper in ACM Transactions on GraphicsUpcoming paper in ACM Transactions on Graphics
•• ““GliftGlift : Generic, Efficient, Random: Generic, Efficient, Random--Access GPU Data Access GPU Data StructuresStructures””
•• ACM SIGGRAPH 2005 SketchesACM SIGGRAPH 2005 Sketches•• ““Dynamic Adaptive Shadow Maps on Graphics HardwareDynamic Adaptive Shadow Maps on Graphics Hardware””
•• ““OctreeOctree Texture on Graphics HardwareTexture on Graphics Hardware””
•• Google Google ““GliftGlift””•• http://http://graphics.cs.ucdavis.edu/~lefohngraphics.cs.ucdavis.edu/~lefohn//