40
DAVID K. MCALLISTER, PH.D. OPTIX MANAGER ADVANCES IN OPTIX

ADVANCES IN OPTIX - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc/2015/presentation/S5246-David... · OPTIX EXECUTION MODEL rtContextLaunch Exception Program Selector

  • Upload
    lekhanh

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

DAVID K. MCALLISTER, PH.D.

OPTIX MANAGER

ADVANCES IN OPTIX

OPTIX EXECUTION MODEL

rtContextLaunch

Launch

Ray Generation

Program

Traverse Shade

SAMPLE DEVICE CODERT_PROGRAM void dome_camera(){

size_t2 screen = output_buffer.size();

float2 d = make_float2(launch_index) / make_float2(screen)* make_float2(2.0f, 2.0f) - make_float2(1.0f, 1.0f);

float3 angle = make_float3(d.x, d.y, sqrtf(1.0f - (d.x*d.x + d.y*d.y)));float3 ray_origin = eye;float3 ray_direction = normalize(angle.x*normalize(U) +

angle.y*normalize(V) +angle.z*normalize(W));

optix::Ray ray(ray_origin, ray_direction, radiance_ray_type, scene_epsilon);

PerRayData_radiance prd;prd.importance = 1.f;prd.depth = 0;

rtTrace(top_object, ray, prd);

output_buffer[launch_index] = make_color(prd.result);}

OPTIX EXECUTION MODEL

rtContextLaunchException

Program

Selector Visit

Program

Miss

ProgramNode Graph

Traversal

Acceleration

Traversal

Launch

Traverse Shade

rtTrace

Closest Hit

Program

Any Hit

Program

Intersection

Program

Callable

Program

Ray Generation

Program

OPTIX ENCAPSULATES THE ALGORITHM

OptiX is a to-the-algorithm API

Processor

Algorithm

SoftwareTo-the-metal

To-the-algorithm

GOLDENROD

MAJOR ARCHITECTURAL RENOVATIONLLVM-based OptiX compiler

Better GPU ray tracing performance

More fluid interactive rendering

Better multi-GPU scaling

More efficient complex node graphs

Additional input languages

CPU backend

UNIFIED VIRTUAL MEMORYMerges CPU and GPU memory spaces

Full read/write access from both processors

Eliminates GPU memory footprint barrier

Coming in Pascal architecture (2016)

OPTIX 3.7

OPTIX PRIMESpecialized for ray tracing

Latest algorithms from NVIDIA Research

Ray tracing kernels

Treelet Reordering BVH (TRBVH)

Support for asynchronous computation

CPU support

No programing model support for shading

No support for Quadro VCA

No support for dynamic materials

Triangles only

No ability to target different architectures

INSTANCING IN PRIMEA model is a set of instances:

RTP_BUFFER_FORMAT_INSTANCE_MODEL

RTP_BUFFER_FORMAT_TRANSFORM_FLOAT4x3

New API call

rtpModelSetInstances

Hit result formats

RTP_BUFFER_FORMAT_HIT_T_TRIID_INSTID

RTP_BUFFER_FORMAT_HIT_T_TRIID_INSTID_U_V

Context

ModelBufferDesc

transforms

instances

ModelModel

BufferDesc

INSTANCING IN PRIME

std::vector<instInfo_t> instanceData;std::vector<RTPmodel> instanceList;std::vector<SimpleMatrix4x3> transformList;createInstances(numInstances, models, instanceList, transformList, instanceData);

RTPbufferdesc instances, transforms;rtpBufferDescCreate(context, RTP_BUFFER_FORMAT_INSTANCE_MODEL, RTP_BUFFER_TYPE_HOST, &instanceList[0], &instances);rtpBufferDescSetRange(instances, 0, instanceList.size());

rtpBufferDescCreate(context, RTP_BUFFER_FORMAT_TRANSFORM_FLOAT4x3, RTP_BUFFER_TYPE_HOST, &transformList[0], &transforms);rtpBufferDescSetRange(transforms, 0, transformList.size());

RTPmodel scene;rtpModelCreate(context, &scene);rtpModelSetInstances(scene, instances, transforms);

OPTIX PRIME IN MENTAL RAY 3.12

OPTIX 3.8

PROGRESSIVE APIRender all subframes in a single API call

Encapsulate even more of the algorithm

STREAM BUFFERS

RTbuffer output_buffer, stream_buffer;rtBufferCreate(context, RT_BUFFER_OUTPUT, &output_buffer);rtBufferCreate(context, RT_BUFFER_PROGRESSIVE_STREAM, &stream_buffer);

rtBufferSetSize2D(output_buffer, width, height);rtBufferSetSize2D(stream_buffer, width, height);rtBufferSetFormat(output_buffer, RT_FORMAT_FLOAT4);rtBufferSetFormat(stream_buffer, RT_FORMAT_UNSIGNED_BYTE4);

rtBufferBindProgressiveStream(stream_buffer, output_buffer);

PROGRESSIVE APIrtContextLaunchProgressive2D(context, width, height, num_subframes);

while(!finished) {int ready;rtBufferGetProgressiveUpdateReady(stream_buffer, &ready, 0, 0);

if(ready) {rtBufferMap(stream_buffer, &data);display(data);rtBufferUnmap(stream_buffer);

}

if(scene_changed()) {// Update OptiX statertVariableSet(...);

}

rtContextLaunchProgressive2D(context, width, height, num_subframes);}

PROGRESSIVE API (DEVICE)

rtDeclareVariable(unsigned int, subframe_idx, rtSubframeIndex, );

unsigned int seed = rand_seed(launch_index, frame, subframe_idx);

Quadro VCA Under the Hood

GPUs 8 x M6000-VCA GPUs

GPU Memory 12 GB per GPU

CUDA Cores 23,040

CPU Cores 20 Physical

System Memory 256 GB

Storage 4 x 512GB SSD

Network

2 x 1GigE

2 x 10GigE (SFP+)

1 x InfiniBand

Installed SoftwareIray IQ + Cent OS Linux

+ VCA Cluster Manager

U.S. MSRP $50,000

Interactive

Image

Stream

Incremental

Updates

OptiX App

Ethernet or

InternetCustom OptiX Applications

All Processing on VCA

OptiX Leveraging

Same Infrastructure as Iray

(using DiCE)

Minimal Work

within the OptiX App

CONNECTION APIRTremotedevice rdev;rtRemoteDeviceCreate("url", "user", "password", &rdev));

unsigned int num_configs;rtRemoteDeviceGetAttribute(rdev, RT_REMOTEDEVICE_ATTRIBUTE_NUM_CONFIGURATIONS,

sizeof(unsigned int), &num_configs);

int vca_config_index = chooseConfig(num_configs);

rtRemoteDeviceReserve(rdev, vca_num_nodes, vca_config_index);

int ready;do {

rtRemoteDeviceGetAttribute(*rdev, RT_REMOTEDEVICE_ATTRIBUTE_STATUS, sizeof(int), &ready);if(ready != RT_REMOTEDEVICE_STATUS_READY) sleep(10);

} while(ready != RT_REMOTEDEVICE_STATUS_READY);

rtContextCreate(context);rtContextSetRemoteDevice(*context, rdev));

JOHN STONE

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

S5246—Innovations in OptiXGuest Presentation: Integrating OptiX in VMD

John E. Stone

Theoretical and Computational Biophysics Group

Beckman Institute for Advanced Science and Technology

University of Illinois at Urbana-Champaign

http://www.ks.uiuc.edu/

S5246, GPU Technology Conference

15:00-15:50, Room LL21E, San Jose Convention Center,

San Jose, CA, Wednesday March 18, 2015

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

VMD – “Visual Molecular Dynamics”Goal: A Computational Microscope

Study the molecular machines in living cells

Ribosome: target for antibiotics Poliovirus

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

Lighting ComparisonTwo lights, no

shadows

Two lights,

hard shadows, 1

shadow ray per light

Ambient occlusion

+ two lights,

144 AO rays/hit

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

VMD Chromatophore Rendering on Blue

Waters• New representatinos, GPU-accelerated

molecular surface calculations, memory-

efficient algorithms for huge complexes

• VMD GPU-accelerated ray tracing engine

w/ CUDA+OptiX+MPI+Pthreads

• Each revision: 7,500 frames render on

~96 Cray XK7 nodes in 290 node-hours,

45GB of images prior to editing

GPU-Accelerated Molecular Visualization on Petascale Supercomputing Platforms.

J. E. Stone, K. L. Vandivort, and K. Schulten. UltraVis’13, 2013.

Visualization of Energy Conversion Processes in a Light Harvesting Organelle at Atomic Detail. M. Sener, et al. SC'14 Visualization and Data Analytics Showcase, 2014. ***Winner of the SC'14 Visualization and Data Analytics Showcase

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

VMD 1.9.2 Interactive GPU Ray Tracing

• Ray tracing heavily used for VMD

publication-quality images/movies

• High quality lighting, shadows,

transparency, depth-of-field focal

blur, etc.

• VMD now provides –interactive–

ray tracing on laptops, desktops,

and remote visual supercomputers

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

Scene Graph

VMD TachyonL-OptiX Interactive RT w/

Progressive Rendering

RT Rendering Pass

Seed RNGs

TrBvh

RT Acceleration

Structure

Accumulate RT samples

Normalize+copy accum. buf

Compute ave. FPS,

adjust RT samples per pass Output Framebuffer

Accum. Buf

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

VMD Scene

VMD TachyonL-OptiX:

Multi-GPU on a Desktop or Single Node

Scene Data Replicated,

Image Space Parallel Decomposition

onto GPUs

GPU 0

TrBvh

RT Acceleration

Structure

GPU 3

GPU 2

GPU 1

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

Scene Graph

VMD TachyonL-OptiX Interactive RT w/

OptiX 3.8 Progressive API

RT Rendering Pass

Seed RNGs

TrBvh

RT Acceleration

Structure

Accumulate RT samples

Normalize+copy accum. buf

Compute ave. FPS,

adjust RT samples per pass Output Framebuffer

Accum. Buf

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

Scene Graph

VMD TachyonL-OptiX Interactive RT w/

OptiX 3.8 Progressive API

RT Progressive Subframe

rtContextLaunchProgressive2D()

TrBvh

RT Acceleration

Structure

rtBufferGetProgressiveUpdateReady()

Draw Output Framebuffer

Check for User Interface Inputs,

Update OptiX Variables

rtContextStopProgressive()

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

VMD Scene

VMD TachyonL-OptiX:

Multi-GPU on NVIDIA VCA Cluster

Scene Data Replicated,

Image Space / Sample Space Parallel

Decomposition onto GPUs

VCA 0:

8 K6000 GPUs

VCA N:

8 K6000 GPUs

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

Future Work• Improved performance / quality trade-offs in

interactive RT stochastic sampling strategies

• Optimize GPU scene DMA and BVH regen speed for

time-varying geometry, e.g. MD trajectories

• Continue tuning of GPU-specific RT intersection

routines, memory layout

• GPU-accelerated movie encoder back-end

• Interactive RT combined with remote viz on HPC

systems, much larger data sizes

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

Acknowledgements• Theoretical and Computational Biophysics Group, University of Illinois at

Urbana-Champaign

• NVIDIA CUDA Center of Excellence, University of Illinois at Urbana-Champaign

• NVIDIA CUDA team

• NVIDIA OptiX team

• NCSA Blue Waters Team

• Funding:

– DOE INCITE, ORNL Titan: DE-AC05-00OR22725

– NSF Blue Waters: NSF OCI 07-25070, PRAC “The Computational Microscope”, ACI-1238993, ACI-1440026

– NIH support: 9P41GM104601, 5R01GM098243-02

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/Beckman Institute,

U. Illinois at Urbana-Champaign

REGISTERED DEVELOPER PROGRAMAccess latest OptiX version

Access private beta releases

Tighter communication with OptiX developers

https://developer.nvidia.com/optix

MORE OPTIX TALKSSessionTitle Day Start End Room Speaker

S5659 Accelerating Mountain Bike Development with Optimized Design Visualization Tuesday 13:30 13:55 LL21A Geoff Casey

S5188 FurryBall RT: New OptiX Core and 30x Speed Up Tuesday 15:00 15:25 LL21D Jan Tománek

S5643 Advanced Rendering Solutions from NVIDIA Tuesday 15:30 16:20 LL21E Phillip Miller

S5622 Dekko: A Framework for Real-Time Preview for VFX Wednesday 9:30 9:55 LL21D Damien Fagnou

S5644 Flexible Cluster Rendering with NVIDIA VCA Wednesday 10:00 10:50 LL21E Phillip Miller

S5541 CATIA Live Rendering Iray and NVIDIA VCA Wednesday 10:00 10:50 LL21A Pierre Maheut

S5409 Custom Iray Applications and MDL for Consistent Visual Appearance Wednesday 14:00 14:50 Ll21E Dave Hutchinson

S5246 Innovations in OptiX Wednesday 15:00 15:50 LL21E David McAllister

S5628 Simulation-Based CGI for Automotive Applications Wednesday 16:00 16:25 LL21A Benoit Deschamps

S5386 VMD: Publication-Quality Ray Tracing of Molecular Graphics with OptiX Thursday 9:00 9:25 LL21E John Stone

S5416 Accelerad: Daylight Simulation for Architectural Spaces Using GPU Ray Tracing Thursday 14:00 14:25 LL21E Nathaniel Jones

S5210 GPU-Accelerated Spectral Caustic Rendering of Homogeneous Caustic Objects Thursday 14:30 14:55 LL21E Budianto Tandianus