Upload
intel-software
View
325
Download
0
Embed Size (px)
Citation preview
Jeff Rous – Graphics Software Engineer, IntelTwitter: @jeff_rous
Optimization Deep Dive: Unreal Engine 4 on Intel
Intel Software – Developer Relations Division Intel Confidential
Overview
RationaleIntel Graphics Roadmap/DetailsHow We MeasuredCommon Pain PointsShader OptimizationsOptimizing for DX12VR Tips and TricksAndroid x86/x64 and ASTC Support
Intel Software – Developer Relations Division Intel Confidential 3
Why Work Together?
Benefits all games that use the engine UE4 runs on more hardware
Intel is 18% GPU share. 4 of top 10 most popular GPUs are Intel. (Steam)
Optimizations help everyone – high end to phoneCommon goals
APIs like DX12 and Vulkan are going to power tomorrow’s games
Virtual reality an important new segmentAndroid is a large market and key for Epic and Intel
Intel Software – Developer Relations Division Intel Confidential 4
Intel® HD Graphics: Roadmap
Sandy Bridge
Intel® 2nd Gen Core™ Processor
• 32nm
• Feature Level 10.1
• Up to 12 EUs
2011Ivy Bridge
Intel® 3nd Gen Core™ Processor
• 22nm
• Feature Level 11.0
• Up to 16EUs
2012Haswell
Intel® 4nd Gen Core™ Processor
• Feature Level 11.1
• DX Extensions
• GT3 (40 EUs)• EDRAM• Iris Pro™, Iris™
brands
2013 Broadwell
Intel® 5nd Gen Core™ Processor
• 14nm
• Feature Level 11.2
• Up to 48 EUs
2014Skylake
Intel® 6th Gen Core™ Processor
• Feature Level 12.0
• GT4 (72 EUs)• GT3e 15/28W• DX12 HW
2015-16
Up to 30X faster graphics over last 5 years
Intel Software – Developer Relations Division Intel Confidential 5
Intel® HD Graphics: EDRAMBasic facts Located on the same package with CPU 64-128MB Bandwidth – 50 GB/Sec each way
(100BGB/sec total BW) Acts as 4th level $ Just works: no API required to use and
take advantageBandwidth Saving Increasing compute requires more
bandwidth EDRAM helps to reduce BW
consumption and improve EU efficiency
Just works, but efficiency can be improved by re-using frame data
CPU PackageIntel 6rd Gen Core™ chip
CPU Core
CPU Core
CPU Core
Ring-bus
CPU Core
LL$System
MemoryGfx Core
EDRA
M
Intel Software – Developer Relations Division Intel Confidential 6
How We Measured – Intel GPA
Use ToggleDrawEvents commandFrame debugging and live modeExperiment!
Intel Software – Developer Relations Division Intel Confidential 7
How We Measured
ProfileGPU commandStat commandsWindows Performance AnalyzerIntel Extreme Tuning Utility
Intel Software – Developer Relations Division Intel Confidential 8
Intel Pain Points – Memory Bandwidth
Memory bandwidth at a premium with integrated graphicsGbuffers are memory hungry. UE4 is configurable where you can change the format, eliminate or even combine channels. Scaling resolution of gbuffers good to a point.
Intel Software – Developer Relations Division Intel Confidential 9
Intel Pain Points – Dense Geometry
Sub pixel or very dense mesh vertex shader execution can’t be covered by pixel shader execution leading to hardware starving. Use LOD where possible.Clipper can get bottlenecked in the worst cases. Use frustum culling on bounding boxes at the very least. Occlusion culling for hidden objects.
Intel Software – Developer Relations Division Intel Confidential 10
A Word About Power
Intel graphics typically in low power systems. Less CPU usage means more graphics.
Intel Software – Developer Relations Division Intel Confidential 11
Shaders – Local Memory
64 byte cache lines benefit from loop unrolling a great deal. Avoid small loads in tight loops
Intel Software – Developer Relations Division Intel Confidential 12
Shaders – Unused Attributes
Often shaders are bound with large structures full of constants that go unused. This is not cache friendly.Depth passes are especially bad, outputting values not used by a null pixel shader. In UE4, make use of r.ShaderPipelines for depth passes. In DX12, make liberal use of DENY_*_ACCESS to limit resource-shader visibility.
Intel Software – Developer Relations Division Intel Confidential 13
Shaders – Branching and Sampling
Using lots of temporaries can starve the hardware. Branching is expensive if loads are inside the conditional blocks.Group loads as early in the shader as possible to help cover latency.
Intel Software – Developer Relations Division Intel Confidential 14
Demo – DX12 Driver Metrics
Intel Software – Developer Relations Division Intel Confidential 15
DX12 Performance – Fast Clear
Specify optional D3D12_CLEAR_COLOR when calling CreateCommittedResourceIntel hardware has fast clear path for 1 bit per pixel clear values eg. (1,0,1,0)When clearing, use the up front specified color for maximum performance.~9% performance gain on Elemental Demo on DX12!In the engine today
Intel Software – Developer Relations Division Intel Confidential 16
DX12 Performance – Root Signature
Blueprint of resources availableRoot constantsRoot descriptorsDescriptor tables
Constants that sit directly in root are copied to each invocation of the shader (pushed) rather than read from memory when used (pulled)Can significantly speed up shader execution Automatically handled by driver in DX11
Intel Software – Developer Relations Division Intel Confidential 17
VR Tips and Tricks
Simple techniques to take advantage of an under-utilized resource, the CPU!Easily adds realism to your VR scenes without much incremental GPU work.Min spec defined for high end VR.Effects can be scaled up easily through BluePrints.
Intel Software – Developer Relations Division Intel Confidential 18
VR Tips and Tricks - Destruction
Simulates dynamic fracturing of meshes into smaller pieces. Typical destruction workloads consist of a few seconds of a lot of simulation time followed by a return to the baseline.Better CPUs can keep pieces around longer and fracture more for more realism.
Intel Software – Developer Relations Division Intel Confidential 19
VR Tips and Tricks - Cloth
Dynamic mesh simulation that responds to the player, wind or other environmental factors. Typical cloth workloads include player capes or flags. Simulated every frame. Easy to scale - More cloth systems means more CPU usage
Intel Software – Developer Relations Division Intel Confidential 20
Android x86/x64 Support
Native apps reduce CPU load, startup times and power consumptionSupported in UE4 today through editor menu
Requires source buildPackage as fat or separated APKs
OpenGL ES 3.1 + AEP for best qualityASTC texturesDeferred rendererSupported on latest Intel tablets
Intel Software – Developer Relations Division Intel Confidential 21
Fast ASTC compression
Next gen format (OpenGL ES, Vulkan)Very good compression on RGB/RGBA for variety of block sizesUE4 now has support for Intel’s fast texture compressor for ASTC
44x speed improvementQuality comparable to ARM compressorUE4 uses Intel’s BC6H/BC7 compressors already
Released with 4.13
Intel Software – Developer Relations Division Intel Confidential 22
ASTC Quality Comparison
Zoomed in portion of a 2048x2048 normal map
Original: 12 MB ETC1: 2 MB ASTC 6x6: 1.8 MB
Intel Software – Developer Relations Division Intel Confidential 23
What’s Next?
Intel Compiler Support - 4.14Vtune Amplifier Support – Event based CPU sampling using itt_notify framework. Gives deep insight into what the engine is doing at all times. Future release.VR Sample showing off techniques to take advantage of extra CPU cycles.
Intel Software – Developer Relations Division Intel Confidential 24
Wrap up
Intel and Epic have worked together to enable key technologies to enable developers to make their best games.Take advantage of scaling features in UE4 – Epic has done a lot of work to support lower end hardware.Test on Intel hardware early. UE4 is powerful but it can easily bring down a high end system. With proper optimization, UE4 games run really well on Intel hardware.
Intel Software – Developer Relations Division Intel Confidential 26
Links
Intel Developer Zone (software.intel.com)Unreal Engine 4 (unrealengine.com)Intel GPA (software.intel.com/en-us/gpa)ISPC Texture Compressor sample (software.intel.com/en-us/articles/fast-ispc-texture-compressor-update)Using Android x86 on UE4 (software.intel.com/en-us/articles/Unreal-Engine-4-with-x86-Support)UE4 Code Sharing Hub (Intel Hardware Metrics) (wiki.unrealengine.com/GitHub_Sharing_Hub)