48
2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 1 Coding for Multiple Cores ATI Developer Day Coding for Multiple Cores ATI Developer Day Bruce Dawson Programmer/European Technical Representative Microsoft Game Technology Group

07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 1

Coding for Multiple CoresATI Developer DayCoding for Multiple CoresATI Developer Day

Bruce DawsonProgrammer/European Technical RepresentativeMicrosoft Game Technology Group

Page 2: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 2

Why multi-threading/multi-core?Why multi-threading/multi-core?

Clock rates are stagnantFuture CPUs will be predominantly multi-thread/multi-core

Xbox 360 has 3 coresPS3 will be multi-core>70% of PC sales will be multi-core by end of 2006

Most Windows Vista systems will be multi-coreTwo performance possibilities:

Single-threaded? Minimal performance growthMulti-threaded? Exponential performance growth

Page 3: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 3

Design for MultithreadingDesign for MultithreadingGood design is criticalBad multithreading can be worse than no multithreading

Deadlocks, synchronization bugs, poor performance, etc.

Page 4: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 4

Bad MultithreadingBad Multithreading

Thread 1

Thread 2

Thread 3

Thread 4

Thread 5

Page 5: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 5

Rendering ThreadRendering ThreadRendering Thread

Game Thread

Good MultithreadingGood Multithreading

Main Thread

Physics

Rendering Thread

Animation/Skinning

Particle Systems

Networking

File I/O

Game Thread

Page 6: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 6

Handling Dependencies: CascadesHandling Dependencies: CascadesThread 1

Thread 2

Thread 3

Thread 4

Thread 5

Input

Physics

AI

Rendering

Present

Frame 1Frame 2Frame 3Frame 4

Advantages:Synchronization points are few and well-defined

Disadvantages:Increases latency (for constant frame rate)Needs simple (one-way) data flow

Page 7: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 7

Typical Threaded TasksTypical Threaded Tasks

File DecompressionRenderingGraphics FluffPhysics

Page 8: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 8

File DecompressionFile Decompression

Most common CPU heavy thread on the Xbox 360Easy to multithreadAllows use of aggressive compression to improve load timesDon’t throw a thread at a problem better solved by offline processing

Texture compression, file packing, etc.

Page 9: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 9

Threading File I/O & DecompressionThreading File I/O & Decompression

First: use large reads and asynchronous I/OThen: consider compression to accelerate loading

Don't do format conversions etc. that are better done at build time!

Have resource proxies to allow rendering to continue

Page 10: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 10

Bad Load ThreadBad Load Threadhashtable<Resource*> g_resources;// Load threadvoid LoadResource(ResID resName) {

Locker lock(&resourceLock);pNewResource = LoadCompressedResource(resName);g_resources.add(pNewResource);

}

// Render threadResource* GetResource(ResID resName) {

Locker lock(&resourceLock);return g_resources.find(resName);

}

Page 11: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 11

Good Load Thread, Poor Render ThreadGood Load Thread, Poor Render Threadhashtable<Resource*> g_resources;// Load threadvoid LoadResource(ResID resName) {

pNewResource = LoadCompressedResource(resName);Locker lock(&resourceLock);g_resources.add(pNewResource);

}

// Render threadResource* GetResource(ResID resName) {

Locker lock(&resourceLock);return g_resources.find(resName);

}

Page 12: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 12

Good Load Thread, Good Render ThreadGood Load Thread, Good Render Threadhashtable<Resource*> g_resources, g_renderRes;// Load threadvoid LoadResource(ResID resName) {

pNewResource = LoadCompressedResource(resName);Locker lock(&resourceLock);g_resources.add(pNewResource);

}

// Render threadResource* GetResource(ResID resName) {

return g_renderRes.find(resName);}

// Copy from g_resources to g_renderRes once per frame

Page 13: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 13

RenderingRendering

Separate update and render threadsMulti-threaded device ownership (D3DCREATE_MULTITHREADED) works poorly

Exception: Xbox 360 command buffers

Special case of cascades paradigmPass render state from update to render

With constant workload gives same latency, better frame rateWith increased workload gives same frame rate, worse latency

Page 14: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 14

Separate Rendering ThreadSeparate Rendering Thread

Update Thread

Buffer 1

Render Thread

Buffer 0

Page 15: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 15

Graphics FluffGraphics Fluff

Extra graphics that doesn't affect playProcedurally generated animating cloud texturesCloth simulationsDynamic ambient occlusionProcedurally generated vegetation, etc.Extra particles, better particle physics, etc.

Easy to synchronizePotentially expensive, but if the core is otherwise idle...?

Page 16: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 16

Physics?Physics?

Could cascade from update to physics to rendering

Makes use of three threadsMay be too much latency

Could run physics on many threadsUses many threads while doing physicsMay leave threads mostly idle elsewhere

Other possibilities (rendering and physics decoupled?): see Intel's talk

Page 17: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 17

Rendering ThreadRendering Thread

Overcommitted Multithreading?Overcommitted Multithreading?Physics

Rendering Thread

Animation/Skinning

Particle Systems

Game Thread

Page 18: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 18

How Many Threads?How Many Threads?No more than one CPU intensive software thread per core

3-6 on Xbox 3601-? on PC (1-4 for now, need to query)

Too many busy threads adds complexity, and lowers performance

Context switches are not freeCan have many non-CPU intensive threads

I/O threads that block, or intermittent tasks

Page 19: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 19

Simultaneous Multi-ThreadingSimultaneous Multi-Threading

Be careful with Simultaneous Multi-Threading (SMT) threads

Not the same as double the number of coresCan give a small perf boostCan cause a perf dropCan avoid scheduler latency

Ideally one heavy thread per core plus some additional intermittent threads

Page 20: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 20

Case Study: Kameo (Xbox 360)Case Study: Kameo (Xbox 360)

Started single threadedRendering was taking half of time—put on separate thread

Two render-description buffers created to communicate from update to renderLinear read/write access for best cache usageDoesn't copy const data

File I/O and decompress on other threads

Page 21: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 21

Case Study: Kameo (Xbox 360)Case Study: Kameo (Xbox 360)

File decompression1XAudio0

2

1Rendering0

1

File I/O1Game update0

0

Software threadsThreadCore

Total usage was ~2.2-2.5 cores

Page 22: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 22

Case Study: Project Gotham RacingCase Study: Project Gotham Racing

1XAudio0

2

Texture decompression1Crowd update, texture decompression0

1

Audio update, networking1Update, physics, rendering, UI0

0

Software threadsThreadCore

Total usage was ~2.0-3.0 cores

Page 23: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 23

Managing Your ThreadsManaging Your Threads

Creating threadsSynchronizingTerminating

Don't use TerminateThread()Bad idea on Windows: leaves the process in an indeterminate state, doesn't allow clean-up, etc.Unavailable on Xbox 360

Instead return from your thread function, or call ExitThread

Page 24: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 24

Creating Threads PoorlyCreating Threads Poorlyconst int stackSize = 0;HANDLE hThread = CreateThread(0, stackSize,

ThreadFunctionBad, 0, 0, 0);// Do work on main thread here.for (;;) { // Wait for child thread to complete

DWORD exitCode;GetExitCodeThread(hThread, &exitCode);if (exitCode != STILL_ACTIVE)

break;}

...

DWORD __stdcall ThreadFunctionBad(void* data){#ifdef WIN32

SetThreadAffinityMask(GetCurrentThread(), 8);#endif

// Do child thread work here.return 0;

}

CreateThread doesn't initialize C runtime

Stack size of zero means inherit parent's

stack size

Busy waiting is bad!

Don't forget to close this when done with it

Be careful with thread affinities on Windows

Page 25: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 25

Creating Threads WellCreating Threads Wellconst int stackSize = 65536;HANDLE hThread = (HANDLE)_beginthreadex(0, stackSize,

ThreadFunction, 0, 0, 0);// Do work on main thread here.// Wait for child thread to completeWaitForSingleObject(hThread, INFINITE);CloseHandle(hThread);

...

unsigned __stdcall ThreadFunction(void* data){#ifdef XBOX

// On Xbox 360 you must explicitly assign// software threads to hardware threads.XSetThreadProcessor(GetCurrentThread(), 2);

#endif// Do child thread work here.return 0;

}

_beginthreadexinitializes CRT

Specify stack size on Xbox 360

The correct way to wait for a thread to exit

Don't forget to close this when done with it

Thread affinities must be specified on Xbox

360

Page 26: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 26

Alternative: OpenMPAlternative: OpenMP

Available in VC++ 2005 (Windows and Xbox 360)Simple way to parallelize loops and some other constructsWorks best on long symmetric tasks—particles?Game tasks are short, asymmetricOpenMP is nice, but not ideal for games

Page 27: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 27

Available Synchronization ObjectsAvailable Synchronization Objects

EventsSemaphoresMutexesCritical SectionsDon't use SuspendThread()

Some title have used this for synchronizationCan easily lead to deadlocksInteracts badly with Visual Studio debugger

Page 28: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 28

Exclusive Access: MutexExclusive Access: Mutex// InitializeHANDLE mutex =

CreateMutex(0, FALSE, 0);

// Usevoid ManipulateSharedData() {

WaitForSingleObject(mutex, INFINITE);// Manipulate stuff...ReleaseMutex(mutex);

}

// DestroyCloseHandle(mutex);

Page 29: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 29

Exclusive Access: CRITICAL_SECTIONExclusive Access: CRITICAL_SECTION// InitializeCRITICAL_SECTION cs;InitializeCriticalSection(&cs);

// Usevoid ManipulateSharedData() {

EnterCriticalSection(&cs);// Manipulate stuff...LeaveCriticalSection(&cs);

}

// DestroyDeleteCriticalSection(&cs);

Page 30: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 30

Lockless programmingLockless programming

Trendy technique to use clever programming to share resources without lockingIncludes InterlockedXXX(), lockless message passing, Double Checked Locking, etc.Very hard to get right:

Compiler can reorder instructionsCPU can reorder instructionsCPU can reorder reads and writesInterlockedXxx is not a memory barrier on Xbox 360

Not as fast as avoiding synchronization entirely

Page 31: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 31

Lockless Messages: BuggyLockless Messages: Buggyvoid SendMessage(void* input) {

// Wait for the message to be 'empty'.while (g_msg.filled)

;memcpy(g_msg.data, input, MESSAGESIZE);g_msg.filled = true;

}

void GetMessage() {// Wait for the message to be 'filled'.while (!g_msg.filled)

;memcpy(localMsg.data, g_msg.data, MESSAGESIZE);g_msg.filled = false;

}

Page 32: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 32

Synchronization tips/costs:Synchronization tips/costs:

Synchronization is moderately expensive when there is no contention

Hundreds to thousands of cycles

Synchronization can be arbitrarily expensive when there is contention!Goals:

Synchronize rarelyHold locks brieflyMinimize shared data

Page 33: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 33

Beware hidden synchronization:Beware hidden synchronization:

Allocations are (generally) a synch pointConsider per-thread heaps with no lockingHEAP_NO_SERIALIZE flag avoids lock on Win32 heapsConsider custom single-purpose allocatorsConsider avoiding memory allocations!

Avoid synch in in-house profilersD3DCREATE_MULTITHREADED causes synchronization on almost every Direct3D call

Page 34: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 34

Profiling multi-threaded appsProfiling multi-threaded apps

Need thread-aware profilersProfiling may hide many synchronization stallsHome-grown spin locks make profiling harderConsider instrumenting calls to synchronization functions

Don't use locks in instrumentation—use TLS variables to store results

Windows: Intel VTune and the Visual Studio Team System ProfilerXbox 360: PIX, XbPerfView, etc.

Page 35: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 35

Page 36: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 36

PIX timing capturePIX timing capture

Page 37: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 37

Naming ThreadsNaming Threadstypedef struct tagTHREADNAME_INFO {

DWORD dwType; // must be 0x1000LPCSTR szName; // pointer to name (in user addr space)DWORD dwThreadID; // thread ID (-1=caller thread)DWORD dwFlags; // reserved for future use, must be zero

} THREADNAME_INFO;

void SetThreadName( DWORD dwThreadID, LPCSTR szThreadName) {THREADNAME_INFO info;info.dwType = 0x1000;info.szName = szThreadName;info.dwThreadID = dwThreadID;info.dwFlags = 0;

__try {RaiseException( 0x406D1388, 0, sizeof(info)/sizeof(DWORD),

(DWORD*)&info );}__except(EXCEPTION_CONTINUE_EXECUTION) {}

}

SetThreadName(-1, "Main thread");

Page 38: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 38

Other IdeasOther Ideas

Debugging tips for MTVisual Studio does support multi-threaded debugging

Use threads windowUse @hwthread in watch window on Xbox 360

KD and WinDBG support multi-threaded debugging

Thread Local Storage (TLS)__declspec(thread) declares per-thread variables

But doesn't work in dynamically loaded DLLsTLSAlloc is less efficient, less convenient, but works in dynamically loaded DLLs

Page 39: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 39

Windows tipsWindows tips

Test on multiple machines and configurations

Single-core, SMT (i.e. Hyper-Threading), Dual-core, Intel and AMD chips, Multi-socket multicore(4+ cores)

Page 40: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 40

Windows API featuresWindows API features

WaitForMultipleObjectObviously better than a series of WaitForSingleObject callsThe OS is highly optimized around multithreading and event-based blocking

I/O Completion PortsVery efficient way to have the OS assign a pool of worker threads to incoming I/O requestsUseful construct for implementing a game server

Page 41: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 41

SMT versus MulticoreSMT versus Multicore

OS returns number of logical processors in GetSystemInfo(), so a 2 could mean a SMT machine with only 1 actual core –or-2 coresDetailed Win32 APIs exposing this distinction not available until Windows XP x64, Windows Server 2003 SP1, Windows Vista, etc.GetLogicalProcessorInformation()

For now you have to use CPUID detailed by Intel and AMD to parse this out…

Page 42: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 42

Timing with Multiple CoresTiming with Multiple Cores

RDTSC is not always synced between cores!As your thread moves from core to core, results of RDTSCcounter deltas may be nonsense

CPU frequency itself can change at run-time through speed step technologies

See Power Management APIs for more informationBest thing to do is use Win32 API QueryPerformanceCounter / QueryPerformanceFrequencySee DirectX SDK article Game Timing and Multiple Cores

Page 43: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 43

Thread MicromanagementThread Micromanagement

Use SetThreadAffinityMask with caution!

May be useful for assigning ‘heavy’ work threadsThis mask is technically a hint, not a commitmentRDTSC-based instrumenting will require locking the game threads to a single coreOtherwise let the Windows scheduler do the right thingCreateDevice/Reset might have a side-effect on the calling thread’s affinity with software vertex processing enabled

Page 44: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 44

Thread Micromanagement (cont)Thread Micromanagement (cont)

Be careful about boosting thread priorityIf the priority is too high, you could cause the system to hang and become unresponsiveIf the priority is too low, the thread may starve

Page 45: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 45

DLLs and MultithreadingDLLs and Multithreading

DllMain for every DLL is informed of thread creation/destruction

For some DLLs this is required to initialize TLSFor many this is a waste of time, so call DisableThreadLibraryCalls() from your DllMain during process creation (DLL_PROCESS_ATTACH)

The OS serializes access to the entry pointThis means threads created during DllMainwon’t start for a while, so don’t wait on them in the DLL startup

Page 46: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 46

ResourcesResources

Multithreading Applications in Win32, Jim Beveridge & Robert Weiner, Addison-Wesley, 1997Multiprocessor Considerations for Kernel-Mode Drivers

http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/MP_issues.doc

Determining Logical Processors per Physical Processorhttp://www.intel.com/cd/ids/developer/asmo-na/eng/dc/threading/knowledgebase/43842.htm

GetLogicalProcessorInformationhttp://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/getlogicalprocessorinformation.asp

Double checked lockinghttp://en.wikipedia.org/wiki/Double-checked_locking

Page 47: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 47

ResourcesResourcesGDC 2006 Presentations

http://msdn.com/directx/presentationsDirectX Developer Center

http://msdn.com/directxXNA Developer Center

http://msdn.com/xnaXbox Developer Center (Registered Devs Only)

https://xds.xbox.comXNA, DirectX, XACT Forums

http://msdn.com/directx/forumsEmail addresses

[email protected] (DirectX Feedback)[email protected] (Xbox Developers Only)[email protected] (XNA Feedback) `

Page 48: 07 Coding for Multiple Coresdeveloper.amd.com/wordpress/media/2012/10/07 Coding for... · 2013-10-24 · This presentation is for informational purposes only. Microsoft makes no warranties,

2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 48

© 2006 Microsoft Corporation. All rights reserved.Microsoft, DirectX, Xbox 360, the Xbox logo, and XNA are either registered trademarks or trademarks of Microsoft Corporation in the United Sates and / or other countries.

This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.