Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Fast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality
Applications
SAMSUNG Advanced Institute of Technology Won-Jong Lee, Seok Joong Hwang, Youngsam Shin,
Jeong-Joon Yoo, Soojung Ryu
- 1 -
What is ray tracing?
Ray tracing is a technique for generating an image by tracing the path of light
through pixels in an image plane and simulating the effects of its encounters
with virtual objects.
Naturally represents the global effects such as shadow, reflection, and
refraction.
Image source: Wikipedia
- 2 -
Ray Tracing and Mobile
Ray tracing provides a potential rendering technique for future
mobile applications that require photorealistic graphics..
- 3 -
Ray tracing for VR
Stereo Iray VR (NVIDIA, 2016) Foveated rendering (AMD, 2014)
VRWorks Audio (NVIDIA, 2016)
- 4 -
Background: Ray Tracing
Early desktop CPU/GPU (`00~09)
- Packet tracing [Gunther 07][Overbeck 08][Benthin 09]
HW Specialization (`02~06)
- SarrCor [Schmitter `02], RPU, D-RPU [Woop `05, `06]
- Not commericalized
Modern GPUs and MICs (`10~)
- OptiX [Steven `10], Embree [Wald `14]
- For professional graphics
Mobile GPU and H/W revisit (`13~present)
- SGRT [Lee `13, `15], GR6500 [McCombe `14], RayCore [Nah `14]
- Targeted for real-time applications (Game, UX, AR/VR)
- 5 -
Background: Ray Tracing
Performance & Power Consumption
- Future GFX application will require higher resolution (4K>).
E.g. VR, high-quality 3D game
- 3Grays/s for full RT game engine, 100~250 Mrays /s for hybrid rendering
Ray tracing requirements (1080p, 60fps),
Techniques vs Ray throughput
Reference: PowerVR Graphics Keynote, Imagination Developer Connection 2015
- 6 -
Motivation: stereoscopic reprojection
L R
GOOD
L R
BAD
Detect by examining reprojection indices
Project Left eye’s ray hit points onto Right eye2) [Adelson et al. 1992]
4 pixel classification1) [Badt et al. 1988]
- Good, Missed, Overlapped,
- Bad : projected but may be occluded
1) Badt, “Two algorithms taking advantage of temporal coherence in ray tracing” Visual Computer, 1988 2) Adelson and Larry, “Visible Surface Ray-Tracing of Stereoscopic Images” Southeast Regional Conf., 1992
- 7 -
Goal of this paper
Efficiently map the reprojection algorithm onto the existing ray tracing GPUs
Stereoscopic ray traced rendering with reprojection method. Except the yellow pixels (indicate bad pixels), the most of the pixels (91.54%) in the right image can be reused with the results of the left image.
Proposed Framework
- 9 -
Target platform: a mobile ray tracing GPUs (SGRT)
SGRT Core #1
SGRT Core #1
SGRT Core #1
SGRT Core #1
External DRAM
T&I Unit
Intersection Unit
Cache(L1)
Traversal Unit
Cache(L1)
Traversal Unit
Cache(L1)
Traversal Unit
Cache(L1)
Traversal Unit
Cache(L1)
Cache(L2)
SRP VLIW Engine
Internal SRAM
Coarse Grained Reconfigurable
Array
I-Cache
C-Mem
Texture Unit
Cache(L1)
Host CPUs
Core #1 Core #2
Core #3 Core #4
Host DRAM
Host System BUS AXI System BUS
A mobile GPU based on ray tracing, which combines the advantages of programmable DSP cores and a dedicated hardware
- T&I Units : fast, compact H/W to accelerate traversal & intersection
- SRP : programmable shader core support flexible shading and ray generation
High performance features : dual AABB test unit [Lee et al. 2014], reorder buffer [Lee et al. 2015], hybrid number representation [Hwang et al. 2015]
- 10 -
SGRT: T&I (Traversal and Intersection) Units
Specialized H/W for BVH tree traversal and intersection operation for ray tracing
‒ MIMD parallel architecture[Lee et al 2013], 2-AABB traversal unit [Lee et al
2014], latency hiding [Shin et al 2015], and hybrid number representation [Hwang et al 2015]
‒ T&I unit and shader core in GPU are connected via direct interfaces
- 11 -
SGRT: SRP (Samsung Reconfigurable Processor)
A flexible architecture template [Lee et al, 2011]
ISA such as arithmetic, special function and texture are properly implemented.
The VLIW engine useful for GP computations (functions, control flow).
The CGRA makes full use of software pipeline technique for loop acceleration.
FU
RF
FU
RF
FU
RF
FU
RF
FU
RF
FU
RF
FU
RF
FU
RF
FU
RF
FU
RF
FU
RF
FU
RF
Central RF (Register file)
FU FU FU FU
Instruction DATA
CGA
VLIW
for ( )
{
Loop
}
for ( )
{
Loop
}
for ( )
{
Loop
}
Control proc
Data proc
Control proc
Data proc
Control proc
Data proc
- 12 -
Overview: stereoscopic reprojection rendering on SGRT
This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU
for VR applications.
- 13 -
Overview: stereoscopic reprojection rendering on SGRT
This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU
for VR applications.
SGRT efficiently supports flexible real-time ray tracing by combining the advantages of
the hardware and the software Thus, easily added new software kernels – Reprojection,
Validation, and Reuse.
- 14 -
Overview: stereoscopic reprojection rendering on SGRT
This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU
for VR applications.
SGRT efficiently supports flexible real-time ray tracing by combining the advantages of
the hardware and the software Thus, easily added new software kernels – Reprojection,
Validation, and Reuse.
H/W accelerator (T&I unit) efficiently performed fast ray traversal processing
- 15 -
Overview: stereoscopic reprojection rendering on SGRT
This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU
for VR applications.
SGRT efficiently supports flexible real-time ray tracing by combining the advantages of
the hardware and the software Thus, easily added new software kernels – Reprojection,
Validation, and Reuse.
H/W accelerator (T&I unit) efficiently performed fast ray traversal processing
Tile based ray tracing - By conducting ray tracing per-tile basis, the G-buffer can be fit
into the internal memory, which allows the kernels (reprojection and reusing) to be
performed using the on-chip internal SRAM without having to access the external DRAM.
(exception, validation kernels)
- 16 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
Traversal & Intersection
For left image
Ray Generation
Shading
Tile G-Buffer
Re-projection
Rays
Hit Points
Span G-Buffer
L R
For right image
- 17 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
Traversal & Intersection
For left image
Ray Generation
Shading
Tile G-Buffer
Re-projection
Rays
Hit Points
Span G-Buffer
L R
For right image
- 18 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
Traversal & Intersection
For left image
Ray Generation
Shading
Tile G-Buffer
Re-projection
Rays
Hit Points
Span G-Buffer
Color Normal
Texcoord Position
L R
For right image
- 19 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
Traversal & Intersection
For left image
Ray Generation
Shading
Tile G-Buffer
Re-projection
Rays
Hit Points
Span G-Buffer
Color Normal
Texcoord Position
L R
For right image
- 20 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
Traversal & Intersection
For left image
Ray Generation
Shading
Tile G-Buffer
Re-projection
Rays
Hit Points
Span G-Buffer
Color Normal
Texcoord Position
L R
For right image
- 21 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
Traversal & Intersection
For left image
Ray Generation
Shading
Tile G-Buffer
Re-projection
Rays
Hit Points
Span G-Buffer
Color Normal
Texcoord Position
L R
For right image
- 22 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
Traversal & Intersection
For left image
Ray Generation
Shading
Tile G-Buffer
Re-projection
Rays
Hit Points
Span G-Buffer
Color Normal
Texcoord Position
L R
For right image
- 23 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
Traversal & Intersection
For left image
Ray Generation
Shading
Tile G-Buffer
Re-projection
Rays
Hit Points
Span G-Buffer
Color Normal
Texcoord Position
L R
For right image
- 24 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
For left image
L R
For right image
Re-use
Traversal & Intersection
Shading
Tile G-Buffer
Ray Generation
Tile Color-Buffer
Yes
No
Frame Buffer
Updating tile colors
Span G-Buffer
Bad Pixel? Validation
Test
Tile G-Buffer
Rays
Hit Points
- 25 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
For left image
L R
For right image
Re-use
Traversal & Intersection
Shading
Tile G-Buffer
Ray Generation
Tile Color-Buffer Prefetching a row (by DMA)
Yes
No
Frame Buffer
Updating tile colors
Span G-Buffer
Bad Pixel? Validation
Test
Tile G-Buffer
Rays
Hit Points
- 26 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
For left image
L R
For right image
Re-use
Traversal & Intersection
Shading
Tile G-Buffer
Ray Generation
Tile Color-Buffer Prefetching a row (by DMA)
Yes
No
Frame Buffer
Updating tile colors
Span G-Buffer
Bad Pixel? Validation
Test
Tile G-Buffer
Rays
Hit Points
- 27 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
For left image
L R
For right image
Re-use
Traversal & Intersection
Shading
Tile G-Buffer
Ray Generation
Tile Color-Buffer Prefetching a row (by DMA)
Yes
No
Frame Buffer
Updating tile colors
Span G-Buffer
Bad Pixel? Validation
Test
Tile G-Buffer
Rays
Hit Points
- 28 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
For left image
L R
For right image
Re-use
Traversal & Intersection
Shading
Tile G-Buffer
Ray Generation
Tile Color-Buffer Prefetching a row (by DMA)
Yes
No
Frame Buffer
Updating tile colors
Span G-Buffer
Bad Pixel? Validation
Test
Tile G-Buffer
Rays
Hit Points
Red: bad pixels
- 29 -
Processing Detail
Ray tracing H/W (T&I Unit)
Shader Core (SRP)
DRAM
Internal Mem.
For left image
L R
For right image
Re-use
Traversal & Intersection
Shading
Tile G-Buffer
Ray Generation
Tile Color-Buffer Prefetching a row (by DMA)
Yes
No
Frame Buffer
Updating tile colors
Span G-Buffer
Bad Pixel? Validation
Test
Tile G-Buffer
Rays
Hit Points
Evaluation
- 31 -
Experimental Setup
Performance and energy simulation model
‒ Cycle accurate simulators for SGRT integrated with energy model
‒ Energy and power model of [Lee et al. 2015] which utilized a custom model
based on the database built with the power values per component
from Synopsys PrimTime PX [SYNOPSYS 2016] with SAMSUNG 14nm LPP
process technology [Samsung 2016].
Configuration of the T&I unit is the same as [Lee et al 2014]
‒ 4 TRV + 1 IST units, 500MHz
- 32 -
Experimental Setup
Test Application
‒ we used five datasets (Figure 1): Teapot (15K triangles), Chess (42K), BMW (55K),
Chemical Lab. (98K), Music box (106K), and Provence (600K).
‒ Test scenes were all rendered at 2048 x 1024 resolution with enough secondary
ray effects.
We compared with the standard reference; ray tracing without reprojection
in the same hardware platform
Teapot (15K triangles) Chess (42K) BMW (55K)
Chemical Lab. (98K) Music box (106K) Provence (600K)
- 33 -
Reused pixels
The results of the stereoscopic rendering for six test scenes*
The pixels, marked as yellow, in the Right-image indicates the bad pixels.
We could find that most of the pixels (91.54% in average) in Left-image could
be reused as shown in the figure.
* Intentionally “Barrel Distortion Correction” filter has not been applied to this rendered scenes so that we would focus on the reprojection effect in the scene.
- 34 -
Relative Performance
Overall, it achieved up to 1.64 times better performance compared with the
reference platform. This is because it can substantially reduce the computing cost of
the T&I unit.
In terms of the absolute performance, we could obtain 131.3, 14.4, 18.2, 8.6, 28.9
and 44.5 fps for each test scene, respectively.
1.64
1.13 1.28
1.42
1.11
1.51
0.00
0.50
1.00
1.50
2.00
Teapot Chess Musicbox BMW Chemical
Lab.
Provence
Standard Reprojected
1.64x
- 35 -
Relative Performance
Regarding energy consumption, our implementation could
reduce up to 20% because it could cut the workloads in the
hardware.
0.96 1.02 0.98 0.97 0.99
0.80
0.00
0.50
1.00
1.50
Teapot Chess Musicbox BMW Chemical
Lab.
Provence
Standard Reprojected
20%
Conclusion
- 37 -
Summary
In this work,
‒ we present a solution to realize ray tracing based stereoscopic rendering
utilizing a mobile ray tracing GPU.
‒ With the combination of the reprojection and tile-based ray tracing, our
approach could be a versatile solution for future VR applications,
‒ As it achieves up to 1.64 times better performance and 20% better
energy efficiency, compared with the state-of-the-art solution.
Future work,
‒ Apply more adaptive rendering such as foveated rendering
Thank you!