Unity - Internals: memory and performance

Preview:

DESCRIPTION

by Marco Trivellato - In this presentation we will provide in-depth knowledge about the Unity runtime. The first part will focus on memory and how to deal with fragmentation and garbage collection. The second part will cover implementation details and their memory vs cycles tradeoffs in both Unity4 and the upcoming Unity5.

Citation preview

Internals: Memory and Performance

Codemotion

Milano, 29/11/2014

About me

Field Engineer @ Unity Technologies

Past:

o Worked as Software Engineer on several

games at EA, Next Level Games, Milestone

Agenda

• Quick Update

• Memory Overview

• Memory vs Cycles

• Graphics

• Scripting

Latest News

• New CEO

• Unity 4.6 / New UI

• Unity 5.0

• Support for Apple iOS 64 bit

• WebPlayer

MEMORY OVERVIEW

Native and Managed Memory, Garbage Collection

Memory Overview

• Native (internal)

– Assets data, game objects and components

– Engine internals

• Managed (Mono)

– Scripts objects (managed DLLs)

– wrappers for Unity objects

• Native Dlls

– User’s and 3rd parties Dlls

Managed Memory Internals

• Allocates system heap blocks for internal allocator

• Will allocate new heap blocks when needed

• Garbage collector cleans up

• Heap blocks are kept in Mono for later use – Memory can be given back to the system after a

while

– …but it depends on the platform don’t count on it

• Fragmentation can cause new heap blocks even though memory is not exhausted

Reference vs Value Types

Value types (bool, int,

float, struct, ...)

• Exist in stack memory

• De-allocated when

removed from the stack

• No Garbage

Reference types

(classes)

• Exist on the heap and

are handled by the

mono/.net GC

• De-allocated when no

longer referenced

• Lots of Garbage

Garbage Collection

• Roots are not collected in a GC.Collect– Thread stacks– CPU Registers– GC Handles (used by Unity to hold onto

managed objects)– Static variables!!

• Collection time scales with managed heap size– The more you allocate, the slower it gets

Temporary Allocations

• Don’t use FindObjects or LINQ

• Use StringBuilder for string concatenation

• Reuse large temporary work buffers

• ToString()

• .tag use CompareTag() instead

Internal Temporary Allocations

Some Examples:

– GetComponents<T>

– Vector3[] Mesh.vertices

– Camera[] Camera.allCameras

– foreach

• does not allocate by definition

• However, there can be a small allocation, depending on

the implementation of .GetEnumerator()

5.x: We are working on new non-allocating versions

Data Layout

struct Stuff

{

int a;

float b;

bool c;

string name;

};

Stuff[] arrayOfStuff;

int[] As;

float[] Bs;

bool[] Cs;

string[] names;

Memory Fragmentation

• Memory fragmentation is hard to account for– Fully unload dynamically allocated content

– Switch to a blank scene before proceeding to next level

• This scene could have a hook where you may pause the game long enough to sample if there is anything significant in memory

• Ensure you clear out variables so GC.Collect will remove as much as possible

• Avoid allocations where possible

• Reuse objects where possible within a scene play

• Clear them out for map load to clean the memory

Wrappers: Disposable Types

Some Objects used in scripts have large

native backing memory in unity

– Memory not freed for some time…

WWWDecompression buffer

Compressed file

Decompressed file

Managed Native

Garbage Collection

• GC.Collect– Runs on the main thread when

• Mono exhausts the heap space

• Or user calls System.GC.Collect()

• Finalizers– Run on a separate thread

• Controlled by mono

• Can have several seconds delay

• Unity native memory– Dispose() cleans up internal memory

• Eventually called from finalizer

• Manually call Dispose() to cleanup

Main thread Finalizer thread

www = null;

new(someclass);

//no more heap

-> GC.Collect();

www.Dispose();

Wrappers for Unity Objects

• Inherit from Object

• Types:– GameObject

– Assets: Texture2D, AudioClip, Mesh, etc…

– Components: MeshRenderer, Transform,

MonoBehaviour

• Native Memory is released when Destroy

is called

Best Practices

• Reuse objects Use object pools

• Prefer stack-based allocations Use struct instead of class

• System.GC.Collect can be used to trigger collection

• Calling it 6 times returns the unused memory to the OS

• Manually call Dispose to cleanup immediately

MEMORY VS CYCLES

Writable Meshes, Static & Dynamic Batching

Mesh Read/Write Option

• It allows you to modify the mesh at run-time

• If enabled, a system-copy of the Mesh will remain in memory

• It is enabled by default

• In some cases, disabling this option will not reduce the memory usage

– Skinned meshes

– iOS

Non-Uniform scaled Meshes

We need to correctly transform vertex normals

• Unity 4.x:

– transform the mesh on the CPU

– create an extra copy of the data

• Unity 5.0

– Scaled on GPU

– Extra memory no longer needed

Static Batching

What is it ?

• It’s an optimization that reduces number of draw calls

and state changes

How do I enable it ?

• In the player settings + Tag the object as static

Static Batching cont.ed

How does it work internally ?

• Build-time: Vertices are transformed to world-space

• Run-time: Index buffer is created with indices of visible objects

Unity 5.0:

• Re-implemented static batching without copying of index buffers

• Beware of misleading stats

Dynamic Batching

What is it ?

• Similar to Static Batching but it batches non-static

objects at run-time

How do I enable it ?

• In the player settings

• no need to tag. it auto-magically works…

Dynamic Batching cont.ed

How does it work internally ?

• objects are transformed to world space on

the CPU

• Temporary VB & IB are created

• Rendered in one draw call

GRAPHICS

Render Paths, Command Buffers, Shadows

Render Paths

• Vertex Lit

• Forward Rendering

• First pass for ambient + directional light

• One additional pass for each light hitting the object

• Deferred Lighting

• Two Geometry passes + Lighting

• GBuffer: Normal + Specular, Depth

Deferred Shading

• New Render Path in Unity 5

• Only one Geometry pass

• On Platforms with MRTs

• Fallback is Forward Rendering

Deferred Shading

Depth buffer + 4x32bit RTs:

• RT0: diffuse color (rgb), unused (a)

• RT1: spec color (rgb), roughness (a)

• RT2: normal (rgb), unused (a).

10.10.10.2 when available.

• RT3: emission/light (rgb), unused (a)

• Z: depth buffer & stencil

Command Buffers

• Command buffers

hold list of

rendering

commands

• They can be set to

execute at various

points during

camera rendering

Shadows

• Directional Light:

• Use CSM, up to 4 cascades

• they are rendered into screen space to a

32bit RT

• Point Light:

• Render 6 cube faces

• Spot Light:

• One shadow map per light

Mesh Skinning

Different Implementations depending on platform:• x86: SSE

• iOS/Android/WP8: Neon optimizations

• D3D11/XBoxOne/GLES3.0: GPU

• XBox360, WiiU: GPU (memexport)

• PS3: SPU

• WiiU: GPU w/ stream out

Unity 5.0: Skinned meshes use less memory by sharing index buffers between instances

Best Practices

• Try different Render Paths– Performance depends on scene and platform

• Mix Realtime and Baked Lighting

• Use Level-Of-Detail Techniques

– Mesh, Texture, Shader

SCRIPTING

Scripting API and JIT compilation performance, allocations

GetComponent<T>

It asks the GameObject, for a component of the specified type:

• The GO contains a list of Components

• Each Component type is compared to T

• The first Component of type T (or that derives from T), will be returned to the caller

• Not too much overhead but it still needs to call into native code

Property Accessors

• Most accessors will be removed in Unity 5.0

• The objective is to reduce dependencies,

therefore improve modularization

• Transform will remain

• Existing scripts will be converted. Example:

in 5.0:

Transform Component

• this.transform is the same as GetComponent<Transform>()

• transform.position/rotation needs to:

– find Transform component

– Traverse hierarchy to calculate absolute position

– Apply translation/rotation

• transform internally stores the position relative to the parent

– transform.localPosition = new Vector(…) simple

assignment

– transform.position = new Vector(…) costs the same if

no father, otherwise it will need to traverse the hierarchy

up to transform the abs position into local

• finally, other components (collider, rigid body, light, camera,

etc..) will be notified via messages

WWW class properties

WWW.texture: Allocates a new Texture2D

…another example is WWW.audioClip

Object.Instantiate

API:

• Object Instantiate(Object, Vector3, Quaternion);

• Object Instantiate(Object);

Implementation:

• Clone GameObject Hierarchy and Components

• Copy Properties

• Awake

• Apply new Transform (if provided)

Object.Instantiate cont.ed

• Awake can be expensive

• AwakeFromLoad (main thread)– clear states

– internal state caching

– pre-compute

Unity 5.0:

• Allocations have been reduced

• Some inner loops for copying the data have been optimized

JIT Compilation

What is it ?• The process in which machine code is generated from

CIL code during the application's run-time

Pros:

• It generates optimized code for the current platform

Cons:

• Each time a method is called for the first time, the application will suffer a certain performance penalty because of the compilation

JIT compilation spikes

What about pre-JITting ?

• RuntimeHelpers.PrepareMethod does not work:

…better to use MethodHandle.GetFunctionPointer()

CONCLUSIONS

Best Practices

• Don’t make assumptions

• Platform X != Platform Y

• Profile on target device

• Editor != Player

• Managed Memory is not returned to Native Land!

• For best results…: Profile early and regularly

Want to know more ?

• Unite: http://unity3d.com/unite/archive

• Blog: http://blog.unity3d.com

• Forum: http://forum.unity3d.com

• Support: support@unity3d.com

That’s it!

Questions?

@m_trive | marcot@unity3d.com

Recommended