64
Antoine Cohade & Emil Persson 16/03/2016 More Explosions, More Chaos, and Definitely More Blowing Stuff Up : Optimizations and New DirectX Features in

More explosions, more chaos, and definitely more blowing stuff up

Embed Size (px)

Citation preview

Page 1: More explosions, more chaos, and definitely more blowing stuff up

Antoine Cohade & Emil Persson 16/03/2016

More Explosions, More Chaos,and Definitely More Blowing Stuff Up : Optimizations and New DirectX Features in

Page 2: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Legal Copyright © 2016 Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.

Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.

Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.Intel.com/performance

Iris™ graphics is available on select systems. Consult your system manufacturer.

Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries.

Page 3: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Agenda

- Introduction

- Tools

- Optimizations

- DirectX features

- Future Work

- Conclusion

Page 4: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

4

Intel ® HD Graphics: Market Share

72.8% total GPU market

18.49% STEAM

23.9% Unity

Jon Peddie Research, Q3 2015 Steam HW Survey, Jan 2016 Unity HW Stats, Q4 2015

Millions of gamers with Intel ® HD Graphics equipped PCs

Page 5: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

5

Introduction: Avalanche Studios

Avalanche Studios

- Founded in 2003

- Offices in Stockholm and New York

Games

- Just Cause 1, 2 and 3

- Mad Max

- theHunter, theHunter: Primal

- Renegade Ops

- Rumble City

Page 6: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

6

Introduction: Just Cause 3

• Open world action-adventure game

• Developed by Avalanche Studios

• Published by Square Enix

• Released Dec 1, 2015

• Huge open world

• 1000 km2 or 400 square miles

• Advanced graphics technology

Page 7: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

7

About this project

Small targeted development effort

- Collaboration with Intel®

- Focused on Intel® GPU performance optimizations

- DirectX features pioneered by Intel®

- Additional resources (from R&D, Engine etc.)

- Separate from JC3 mainline development

Page 8: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Agenda

- Introduction

- Tools

- Optimizations

- DirectX features

- Future Work

- Conclusion

Page 9: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

9

GPA: Just Cause 3 Analysis

HUD / System Analyzer:

Frame Analyzer:

Platform Analyzer: CPU Limited

GPU Limited

Capture frame

Capture trace

?Run with Intel® GPA

Live Analysis Offline Analysis

Page 10: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

10

GPA : Just Cause 3 – Platform Analyzer viewFrames

GPU queue

Other metrics

CPU Threads

Page 11: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

11

GPA : Just Cause 3 – Frame Analyzer view

Custom view chartRe

nd

er ta

rge

t ov

erv

iew

Render target preview

RT & drawcalls(Erg) selection & timings

Detailed metrics

Page 12: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Agenda

- Introduction

- Tools

- Optimizations

- DirectX features

- Conclusion

Page 13: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

13

Performance Optimizations: Low-Level ALU

Deferred Lighting Shader

- 6-8ms on Iris™ Pro 5200

- Very long shader, lots of math

- Lots of history

Low-Level ALU optimizations

- Tweaking the math to generate fewer instructions

- Low-Level Thinking in High-Level Shading Languages [Persson13]Low-Level Shader Optimization for Next-Gen and DX11 [Persson14]

- No changes to output

Page 14: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

14

Performance Optimizations: Low-Level ALU

Remove a division, use MAD-form:

Separate scalars and vectors:

Precompute:

float k = 2.0f / sqrt(PI * (spec_power + 2.0f)); add + mul + sqrt + rcp + mul

float k = rsqrt((0.25f*PI) * spec_power + (0.5f*PI)); mad + rsqrt

return spec_color * spec_intensity * spec_mask; 6×mul (3+3)

return spec_color * (spec_intensity * spec_mask); 4×mul (1+3)

float3 Color = PointLights[index + 1].rgb;float HDRScale = PointLights[index + 1].w;Color *= HDRScale;

3×mul

float3 Color = PointLights[index + 1].rgb; -

Page 15: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

15

Performance Optimizations: Low-Level ALU

Optimizing inputs

- 4x float4 for spotlights, partly packed

- LightDir stored in 2 floats + sign

- HDRScale and NearCap, 16bits each in a float

- Unpacking the packed

- Falloff scale and bias was 2 floats

- Compute falloff bias from scale (saved one float, added one ALU op)

- LightDir a full float3 (saves ~10 cycles of unpacking)

- HDRScale gone, NearCap gets entire float (saved 6 ALU ops of unpacking)

- Rearranged in access order

- Fewer fetches if branch not taken

Page 16: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

16

Performance Optimizations: Low-Level ALU

Reuse intermediate results:

dist = dot(l_vec, l_vec) * InvRadSqr;if (dist < 1.0f){

l_vec = normalize(l_vec);dist *= rsqrt(dist);

...

dist = dot(l_vec, l_vec);if (dist < RadiusSqr){

float rd = rsqrt(dist);l_vec *= rd; // normalize()dist *= rd;

...

mul + 2×mad + mulrsqrt + 3×mulrsqrt + mul

mul + 2×madrsqrt + 3×mulmul

Page 17: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

17

Performance Optimizations: Low-Level ALU

Operation modifiers:

float3 tex_proj;tex_proj = mat[0].xyz * light_vec.x;tex_proj += mat[1].xyz * light_vec.y;tex_proj += mat[2].xyz * light_vec.z;tex_proj *= float3(-1, -1, -1);

float3 tex_proj;tex_proj = mat[0].xyz * -light_vec.x;tex_proj += mat[1].xyz * -light_vec.y;tex_proj += mat[2].xyz * -light_vec.z;

3×mul + 6×mad + 3×mul 3×mul + 6×mad

Page 18: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

18

Performance Optimizations: Low-Level ALU

Loop counters:

// Point lights

for (uint pl = 0; pl < pl_count; pl++) {uint index = LightIndices[light_index++];...

}

// Spot lightsfor (uint sl = 0; sl < sl_count; sl++) {uint index = LightIndices[light_index++];...

}

// Point lightsuint end = light_index + pl_count;for (; light_index < end; ++light_index) {uint index = LightIndices[light_index];...

}

// Spot lightsend += sl_count;for (; light_index < end; ++light_index) {uint index = LightIndices[light_index];...

}

2×iadd / loop 1×iadd / loop + 2×iadd

Page 19: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

19

Performance Optimizations: Low-Level ALU

Share computations:

if (shadow > 0) {float2 rot = ExpensivePseudoRandom();shadow *= SampleShadow(..., rot);...

}

float2 rot = ExpensivePseudoRandom();for (spotlights) {if (spot > 0) {

if (shadow_caster) {shadow = SampleShadow(..., rot);...

}}

}

float2 rot = ExpensivePseudoRandom();

if (shadow > 0) {shadow *= SampleShadow(..., rot);...

}

for (spotlights) {if (spot > 0) {

if (shadow_caster) {shadow = SampleShadow(..., rot);...

} }

}

Page 20: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

20

Performance Optimizations: Low-Level ALU

Pull computations out of the loop:

for (int i = 0; i < 16; i++) {float2 offset;offset.x = kernel[i].x * rot.x +

kernel[i].y * rot.y;offset.y = kernel[i].y * rot.x –

kernel[i].x * rot.y;

float2 tap = coord.xy + offset * scale;...

}

rot *= scale;

for (int i = 0; i < 16; i++) {float2 tap;tap.x = coord.x + kernel[i].x * rot.x

+ kernel[i].y * rot.y;tap.y = coord.y + kernel[i].y * rot.x

- kernel[i].x * rot.y;...

}

(2×mul + 4×mad)×16 (96 ops) (4×mad)×16 + 2×mul (66 ops)

Page 21: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

21

Performance Optimizations: Low-Level ALU

Game had configurable attenuation curve

- Not really used. Only one curve existed.

- Using ALU saved 0.2ms on Iris™ Pro 5200.

- Small script to brute-force match a set of functions and parameters

- Picked the best match

- ~1% error

ALU instead of lookup table:

atten = Falloff.SampleLevel(Samp, dist, 0.0f); sample_l

atten = saturate((1.0f - dist) / (dist * dist * 12.21f + 1.0f)); 2×mul + mad + add + rcp

Page 22: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

22

Performance Optimizations: Low-Level ALU

Deferred Lighting Shader

- 4-5.5ms on Iris™ Pro 5200

- About 2ms saved (depending on scene)

- Potential for saving more

Page 23: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

23

Performance Optimizations: GPU gaps

Learning : Regularly check CPU/GPU concurrency to avoid surprises

Page 24: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

24

Performance Optimizations: Instancing

Solution

- Instancing support added to common materials

- Drastic reduction in number of draw calls

- Reduced constant buffer updates

- Removed lots of unused constants

- Removed debug constants, tweak variables etc.

Page 25: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

25

Performance Optimizations: Instancing

Page 26: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

26

Performance Optimizations: Manual Instancing

Tree Impostors

- Many instances, tiny mesh. (4 vertices, 6 indices)

- Standard instancing implementation

- DrawIndexedInstanced(6, num_instances, 0, 0, 0);

- Poor wavefront occupancy

Manual Instancing optimization

- Draw as regular indexed mesh

- DrawIndexed(6 * num_instances, 0, 0);

- Immutable index buffer of MAX_INSTANCES * 6

- Manually fetch data from texture buffer

Page 27: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

27

Performance Optimizations: Manual Instancing

v2p main(a2v In, uint VertexID: SV_VertexID) {...

}

struct InputData {float Elevation;float2 Data;

};StructuredBuffer<InputData> Insts;

v2p main(uint VertexID: SV_VertexID) {uint InstanceID = VertexID >> 2;VertexID = VertexID & 0x3;

// Manually fetch vertex dataa2v In;In.Elevation = Insts[InstanceID].Elevation;uint2 prt = asuint(Insts[InstanceID].Data);In.Data.x = int(prt.x & 0xFFFF) * scale;In.Data.y = int(prt.x >> 16) * scale;In.Data.z = int(prt.y & 0xFFFF) * scale;In.Data.w = int(prt.y >> 16) * scale;...

}

Page 28: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

28

Performance Optimizations: Manual Instancing

Manual Instancing

- 2.4ms before, 0.7ms after, on Iris™ Pro 5200

Page 29: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

29

Performance Optimizations: Vegetation stalls

Page 30: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

30

Performance Optimizations: Stencil stalls

Learning: If vegetation rendering seems abnormally long, try disabling stencil writes.If the rendering speeds up significantly, you are impacted.

Page 31: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

31

Performance Optimizations: Stencil stalls

Page 32: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

32

Performance Optimizations: Forest Layer

Forest Layer

- Lowest LOD tree representation

- Provides forest silhouette in distance

- Alpha texture filling in detail

Dense grid mesh

- 129x129 per patch

- 5ms in some scenes

- Stencil writes enabled, but disabling didn’t help

Page 33: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

33

Performance Optimizations: Forest Layer

Optimization

- Mostly Vertex Bound

- Mesh optimizations

- Added 65x65 and 33x33 LODs

- Large reduction in total vertices shaded

- Small visual difference. High settings mostly use highest LODs.

- Packed vertex format (2 floats → 2 shorts)

- 16bit index buffer

Page 34: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

34

Performance Optimizations: Forest Layer

Optimization

- Shader optimizations

- Added a simpler “no-fade” vertex shader, used by most patches

- Pre-computations

- Prebaked scaling into the world matrix

- Folded constants

- Handful of low-level optimizations

- Simplified math

Page 35: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

35

Performance Optimizations: Forest Layer

Results

- Good performance gain

- Down from 5.0ms to 2.5ms, Iris™ Pro 5200

- Revisited disabling stencil writes

- Down to 0.5ms (!!)

- Revisited triangle strips

- Down to 0.4ms

- More than an order of magnitude faster in the end!

Page 36: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

36

Performance Optimizations: Low/normal settings

Optimize lower graphical settings

- Shadow size culling

- Made dependent on shadow buffer size

- Disabled cloud shadows for low shadow settings

- Velocity buffer rendering

- Disabled when motion blur and temporal AA is disabled

- Disabled planar reflection pass when screen-space reflection is enabled

Page 37: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

37

Performance Optimizations: Buffer Clears

G-Buffers

- Cleared to (0.5f, 0.5f, 1.0f, 1.0f) for historical reasons

- Now clears to (0, 0, 0, 0), or skipped entirely

- Still clearing for SLI / CrossFire

- Screen space reflections

- Only clear when enabled and needed

Page 38: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

38

Performance Optimizations: Shadows

Shadow Cascades

- 4 sun shadow cascades

- Scattered update pattern

- 2 cascades / frame, cycled over 8 frames

- Saves many milliseconds

- Problematic for camera flipping (shadows pop in over a few frames)

- Center outer cascade on camera

- Keeps shadow behind player (in theory)

- Have to disable frustum culling for outer cascade

- Many milliseconds lost

- Lost shadow range

Page 39: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

39

Performance Optimizations: Shadows

3,747.0

Page 40: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

40

Performance Optimizations: Shadows

Solution

- Revert to previous outer cascade

- Restores lost milliseconds

- Restores lost shadow range

- Reset refresh cycle on camera flip

- Outer cascade always gets updated first frame after flip

- Added resolution dependent size culling

Page 41: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

41

Performance Optimizations: Shadows

718.0

Page 42: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

42

Performance Optimizations: float3x4

Convert float4[] to float3x4[]:

float4 MatrixPalette[2 /*(SIZE*3)*/];

index *= 3;float3x4 mat = float3x4(

MatrixPalette[index ],MatrixPalette[index + 1],MatrixPalette[index + 2]);

float3 s_pos = mul(mat, pos);...

float3x4 MatrixPalette[2 /*SIZE*/];

float3x4 mat = MatrixPalette[index];

float3 s_pos = mul(mat, pos);...

imad r2.xy, v1.xx, l(3, 3), l(1, 2)dp4 r0.x, cb0[r0.x + 9], r1dp4 r0.y, cb0[r2.x + 9], r1dp4 r0.z, cb0[r2.y + 9], r1...

imul null, r0.x, v1.x, l(3)dp4 r2.x, cb0[r0.x + 9], r1dp4 r2.y, cb0[r0.x + 10], r1dp4 r2.z, cb0[r0.x + 11], r1...

Page 43: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

43

Performance Optimizations: Terrain optimization

Page 44: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

44

Performance Optimizations: Terrain optimization

Solution

- Terrain system continuously developed

- New system was in the build, but disabled by default

- Saved around 1-2 milliseconds depending on scene

- Unstable on some drivers

- Detect old drivers and fall back to previous system

Page 45: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

45

Performance Optimizations: Terrain optimization

Page 46: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

46

Performance Optimizations: Misc

Misc optimizations

- CPU vs. GPU performance very different on Intel vs. the consoles

- Moved some work back to CPU

- Shorter shader, more computations for CPU

- Better culling

- When all waterboxes are culled, we could save a render pass

Page 47: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Performance Optimizations: Final Results

Performance benefits: Rendering time* (ms)

Performance benefits: Real performance* (ms) – impact of power

Car scene City scene Sky scene

Before 51 59 59

After 27 32 28

Delta 24 ms 27 ms 31 ms

Car scene City scene Sky scene

Average frame time (static) 27 32 28

Average frame time (dynamic) 30 35 30

*Measured on a 5th gen core™ i7 with Iris™ pro graphics 6200 @ 1366x768 Medium settings

Page 48: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Agenda

- Introduction

- Tools

- Optimizations

- DirectX features

- Future Work

- Conclusion

Page 49: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

49

Conservative Rasterization

Light assignment using CR [Örtegren16]

- Shell pass

- Lights as low-res meshes

- CR to touch all affected clusters

- Allows arbitrary convex light shapes

- “Perfect clustering”

- MIN blending resolves depth range

- Fill pass

- Writes results to cluster light lists

Page 50: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

50

Conservative Rasterization

Light list

- Max 256 lights per type

- JC3: Tightly packed list of light indexes [ 2, 7, 12, 38 … ]

- [Örtegren16]: Linked list (2, next)→ (7, next)→ …

- New approach: Bitfield 001000010000100000 …

- Performance

- Bitfield: Faster under heavy load (0.5ms), slower under light load (-0.2ms)

- LA: 0.1 - 0.3ms cost. Shading: 0 - 3ms saved. (6gen core w/ HD Graphics 520)

- Shell pass independent of depth slice count

- Can scale to higher slice count

Page 51: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

UAV & Rastered Order Views 101

• The DX API specifies “in order” processing rules

• UAV’s enable arbitrary R/W memory ops from a pixel shader…

Page 52: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

52

UAV & Rastered Order Views 101

• The DX API specifies “in order” processing rules

• UAV’s enable arbitrary R/W memory ops from a pixel shader…

… but no ordering of data input…

shade fragment from 1st triangle r/m/w

shade fragment from 2nd triangle r/m/w

Timeline

data race

e.g. programmable blending

Page 53: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

53

UAV & Rastered Order Views 101

• The DX API specifies “in order” processing rules

• UAV’s enable arbitrary R/W memory ops from a pixel shader…

… but no ordering of data input…

shade fragment from 1st triangle r/m/w

shade fragment from 2nd triangle r/m/w

Timeline

order is not deterministic

e.g. programmable blending

Page 54: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

54

UAV & Rastered Order Views 101

• The DX API specifies “in order” processing rules

• UAV’s enable arbitrary R/W memory ops from a pixel shader…

… but no ordering of data input…

• ROV is a DX12 feature which guarantees primitive order for R/M/W operations and : • Avoid data races• Ensure deterministic ordering

shade fragment from 1st triangle r/m/w

shade fragment from 2nd triangle r/m/w

Timeline

Wait

data is Safe !

Page 55: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Order-Independent Transparency

• Correct compositing, rendering foliage & fences with zero aliasing !

• Raster Ordered View enable a new approach Single geometry pass and fixed memory requirements Stable and predictable performance Scalable: easily trade-off image quality for

performance/memory

Correct render order

Differentcorrect render order

Page 56: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Order-Independent Transparency

• Store Visibility Function as a sorted fixed-size array of nodes, in a UAV surface

• Sort N Layers, blend furthest fragments

• Use more layers to trade-off image quality for perf/memory

Sample code : https://software.intel.com/en-us/articles/oit-approximation-with-pixel-synchronization-update-2014

New fragment insertion

Blending of furthest fragments

Page 57: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Agenda

- Introduction

- Tools

- Optimizations

- DirectX features

- Future Work

- Conclusion

Page 58: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

58

Future Work: Frame time variance

0

5

10

15

20

25

30

35

40

45

50

1

79

15

7

23

5

31

3

39

1

46

9

54

7

62

5

70

3

78

1

85

9

93

7

10

15

10

93

11

71

12

49

13

27

14

05

14

83

15

61

16

39

17

17

17

95

18

73

19

51

20

29

21

07

21

85

22

63

23

41

24

19

24

97

25

75

26

53

27

31

28

09

28

87

29

65

30

43

31

21

31

99

32

77

33

55

34

33

35

11

35

89

36

67

37

45

38

23

39

01

39

79

Fra

me

Tim

e (

ms)

Frame number

JC3 Frame time variation over a 2 minutes gameplay

Frame time - 10 frames moving average PW = 33.3ms

Page 59: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

59

Future Work: Dynamic resolution rendering

• Idea : for the most intense scene, lower the rendering resolution

• Based on an Intel sample:

https://software.intel.com/en-us/articles/dynamic-resolution-rendering-sample

if (frametime > max_allowed_frametime && render_target_size != min_RT_size)render_target_size--;

if (frametime < min_allowed_frametime && render_target_size != max_RT_size)render_target_size++;

Page 60: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

60

Future Work: G-buffer blending

• To apply tire skid marks, bullet holes or explosions!

• Same principle that AOIT

– Render your G-Buffer

– Take a normal map of a decal

– Blend it with the G-Buffer

– Result will be a correctly mapped bullet hole

• Prototyped in JC3

• Requires alpha blendable decals

Page 61: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

Agenda

- Introduction

- Tools

- Optimizations

- DirectX features

- Future Work

- Conclusion

Page 62: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

62

Conclusion

• Even the most demanding titles, such as JC3, can run on Iris graphics

• Feature-wise, integrated graphics are now on par with discrete

• Focused optimizations can bring terrific improvements ...

• … you have tools to help you …

• … and is definitely worth it

Page 63: More explosions, more chaos, and definitely more blowing stuff up

Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others.

63

References

[Persson13] Low-Level Thinking in High-Level Shading Languages, GDC 2013 presentation. http://humus.name/index.php?page=Articles&ID=6

[Persson14] Low-Level Shader Optimization for Next-Gen and DX11, GDC 2014 presentation. http://humus.name/index.php?page=Articles&ID=9

[Örtegren16] Clustered Shading: Assigning Lights Using Conservative Rasterization in DirectX 12. GPU Pro 7.

Page 64: More explosions, more chaos, and definitely more blowing stuff up