39

Click here to load reader

Direct3D and the Future of Graphics APIs - AMD at GDC14

Embed Size (px)

Citation preview

Page 1: Direct3D and the Future of Graphics APIs - AMD at GDC14

DIRECT3D AND THE FUTURE OF GRAPHICS APIS

Dave Oldcorn, AMDDan Baker, Oxide GamesJohan Andersson, EA / DICE

Page 2: Direct3D and the Future of Graphics APIs - AMD at GDC14

2| AMD Direct3D Futures | March 20th, 2014

NITROUS AND DX12

Dan BakerPartner, Oxide Games

Page 3: Direct3D and the Future of Graphics APIs - AMD at GDC14

3| AMD Direct3D Futures | March 20th, 2014

HAVEN’T WE BEEN HERE BEFORE?

Goal of DX9–Remember State blocks?

Goal of DX10–Large state groups

Goal of DX11–Deferred contexts

Are we actually getting faster, or are CPUs just faster? –Quite possible no perf improvements due to API features in 10 years

Maybe adding features isn’t the answer…

Page 4: Direct3D and the Future of Graphics APIs - AMD at GDC14

4| AMD Direct3D Futures | March 20th, 2014

DEEPLY ROOTED PROBLEM

Coding design philosophies clash with real world OOP, data hiding, polymorphic design clashes with task-driven, data parallel Evident in language trends, striking disconnect between what is considered good code, and what is fast Gap has always been there, but has grown in recent years

– 15 years ago, processors often bound by computation

– Now, usually bound by cache misses, serialization, pipeline stalls, etc.

– Multi-Core CPUs are ineffectively utilized ‘Heavy Iron’ , e.g. Big Object, Opaque memory is a dead end for performance The revolt is beginning in high performance graphics APIS, but will spread

Page 5: Direct3D and the Future of Graphics APIs - AMD at GDC14

5| AMD Direct3D Futures | March 20th, 2014

BUT… HOW MUCH FASTER?

Biggest problem with industry today: AcceptanceOnly 1 secret in API design: That it can be done.

–And isn’t that hard–And our code isn’t that ugly

Star Swarm already demonstrating what is possible on a PC

Page 6: Direct3D and the Future of Graphics APIs - AMD at GDC14

6| AMD Direct3D Futures | March 20th, 2014

D3D12 FEATURES THAT NITROUS USES

True de-coupled multi-core rendering– Expecting near linear thread scheduling

Manual Hazard tracking– Hazards have been resolved already

Memory Heaps– Bigger chunks of memory pool grouping make management simpler

Descriptor Tables– Table exposure allows a cheaper way of binding textures– Allows texture bindings to be shared between non-adjacent batches

Page 7: Direct3D and the Future of Graphics APIs - AMD at GDC14

7| AMD Direct3D Futures | March 20th, 2014

WHAT’S DIFFERENT NOW?

Spec Written

Spec Reviewed

API implemented

Released to public

First Engine use

Analysis done

Thenn

Page 8: Direct3D and the Future of Graphics APIs - AMD at GDC14

8| AMD Direct3D Futures | March 20th, 2014

WHAT’S DIFFERENT NOW?

Nown

Create Spec

Implement Spec

Prototype on Actual Engines

Analyze

Discuss with IHVs,

ISVsStart Here

If Ready, exit here to prep for release

Page 9: Direct3D and the Future of Graphics APIs - AMD at GDC14

9| AMD Direct3D Futures | March 20th, 2014

IN THE SPIRIT OF CONTRIBUTING

Oxide proud to announce that we have a proto-type of Nitrous running on D3D12

*PR DISCLAIMER* This is not an official announcement regarding D3D12 support

Porting from other modern APIs is much simpler than porting from D3D11 to D3D12

Page 10: Direct3D and the Future of Graphics APIs - AMD at GDC14

10| AMD Direct3D Futures | March 20th, 2014

EXPECTED RESULTS

CPU Driver overhead largely put to restHuge increases in driver reliabilityHuge decreases in frame latency, expecting median frame latency to be 1.5 frames–Increased perceptual responsiveness

Never a dropped frame or stall due to driver API issues–*Other OS events could cause stalls

Driver should be far smaller, simpler to implement, IHVs can spend more time on optimizations

Page 11: Direct3D and the Future of Graphics APIs - AMD at GDC14

DIRECT3D12 AND THE FUTURE OF GRAPHICS APIS

Dave Oldcorn, Direct3D12 Driver Architect, AMD

Page 12: Direct3D and the Future of Graphics APIs - AMD at GDC14

12| AMD Direct3D Futures | March 20th, 2014

THE PROBLEM

Page 13: Direct3D and the Future of Graphics APIs - AMD at GDC14

13| AMD Direct3D Futures | March 20th, 2014

THE PROBLEM

Mismatch between existing Direct3D and hardware capabilities

– Lots of CPU cores, but only one stream of data

– State communication in small chunks

– “Hidden” work Hard to predict from any one given call what the overhead might be Implicit memory management

– Hardware evolving away from classical register programming

Page 14: Direct3D and the Future of Graphics APIs - AMD at GDC14

14| AMD Direct3D Futures | March 20th, 2014

Metal(register level access)

API LANDSCAPE

Gap between PC ‘raw’ 3D APIs and the hardware has opened up

Very high level APIs now ubiquitous; easy to access even for casual developers, plenty of choice

Where the PC APIs are is a middle ground

Capa

bilit

y, ea

se o

f use

, dist

ance

from

3D

engi

ne

Game EnginesFrostbite

Unity

Unreal

CryEngine

BlitzTech

Flash / Silverlight

Console APIsOpportunity

D3D9

OpenGLD3D11

D3D7/8

Application

Page 15: Direct3D and the Future of Graphics APIs - AMD at GDC14

15| AMD Direct3D Futures | March 20th, 2014

WHAT ARE THE CONSEQUENCES?WHAT ARE THE SOLUTIONS?

Page 16: Direct3D and the Future of Graphics APIs - AMD at GDC14

16| AMD Direct3D Futures | March 20th, 2014

SEQUENTIAL API

Sequential API: state for given draw comes from arbitrary previous time

Some states must be reconciled on the CPU (“delayed validation”)

– All contributing state needs to be visible

GPU isn’t like this, uses command buffers

– Must save and restore state at start and end

...

Draw

Set PS CB

Draw x 5

Set VS CB

Draw x 3

Set Blend

Set PS

Set RT state

Draw

Set VS VB

Draw

...

(more, earlier)

PS CB

VS CB

Blend state

PS

RT state

Draw

State contributing to draw

API input

Page 17: Direct3D and the Future of Graphics APIs - AMD at GDC14

17| AMD Direct3D Futures | March 20th, 2014

THREADING A SEQUENTIAL API

Sequential API threading

– Simple producer / consumer model Extra latency Buffering has a cost More threading would mean dividing tasks on finer grain

– Bottlenecked on application or driver thread Difficult to extract parallelism (Amdahl’s Law)

Application simulation

PrebuildThread 0

PrebuildThread 1

Application Render Thread

GPU Execution Queue

Queued Buffer 0

QueuedBuffer 1

...

Runtime / Driver

Application

Driver Thread

QueuedBuffer 2

Page 18: Direct3D and the Future of Graphics APIs - AMD at GDC14

18| AMD Direct3D Futures | March 20th, 2014

COMMAND BUFFER API

GPUs only listen to command buffers

Let the app build them

– Command Lists, at the API level

Solves sequential API CPU issues

Application simulation

Thread 0 Thread 1

Build Cmd Buffer

BuildCmd

Buffer

GPU Execution Queue

Queued Buffer 0

QueuedBuffer 1

...

Runtime / Driver

Application

Page 19: Direct3D and the Future of Graphics APIs - AMD at GDC14

19| AMD Direct3D Futures | March 20th, 2014

BETTER SCHEDULING

App has much more control over scheduling work

– Both CPU side and GPU

Threads don’t really share much resource

Many more options for streaming assets

Driver thread

Create thread

D3D11: CB building threads tend to interfere

GPU load still added but only after queuing

Render work

Create work

GPU executes

D3D12: CB building threads more independent

Create thread

Build threads

Page 20: Direct3D and the Future of Graphics APIs - AMD at GDC14

20| AMD Direct3D Futures | March 20th, 2014

PIPELINE OBJECTS

Pipeline objects get rid of JIT and enable LTCG for GPUs

Decouple interface and implementation

We’re aware that this is a hairpin bend for many graphics engines to negotiate.

– Many engines don’t think in terms of predicting state up front

– The benefits are worth it Simplified dataflow

through pipeline

VS

PS

IndexProcess

Primitive Generation

Rasteriser

RendertargetOutput

?

?

?

Page 21: Direct3D and the Future of Graphics APIs - AMD at GDC14

21| AMD Direct3D Futures | March 20th, 2014

RENDER OBJECT BINDING MISMATCH

Hardware uses tables in video memory

BUT still programmed like a register solution

– So one bind becomes: Allocate a new chunk of video memory Create a new copy of the entire table Update the one entry

Write the register with the new table base address

SR

CB

On-chiproot table

(1 per stage) Pointer to table(here, textures)

GPU MemorySRD table

GPU Memoryresource

Pointer to table(constant buffers)

Pointer to (+ params of) resource

Page 22: Direct3D and the Future of Graphics APIs - AMD at GDC14

22| AMD Direct3D Futures | March 20th, 2014

DESCRIPTOR TABLES

Several tables of each type of resource

– Easy to divide up by frequency

Tables can be of arbitrary size; dynamically indexed to provide bindless textures

Changing a table pointer is cheap

Updating a descriptor in a table is not

SR.T[0]

SR.T[3]

SR.T[2]

SR.T[1]

UAV

CB.T[1]

CB.T[0]

Samp

SR.T[0][0]

SR.T[0][2]

SR.T[0][1]

CB.T[1][0]

CB.T[1][1]

On-chiptable Pointer to table

(textures table 0)

GPU MemorySRD table

Pointer to table(constbuf table 1)

Page 23: Direct3D and the Future of Graphics APIs - AMD at GDC14

23| AMD Direct3D Futures | March 20th, 2014

KEY INNOVATIONS

Innovation CPU-side win GPU-side win

Command buffersBuild on many threadsControl of scheduling

Lower latencySimplified state

tracking

Pipeline state objects

Link at create timeNo JIT shader compiles

Efficient batched updatesCheaper state updates

Enables LTCG

Bind objects in groups Cheap to change group Cheap to change group

Fits hardware paradigmMove work to

Create Predictability Enables optimisations

Page 24: Direct3D and the Future of Graphics APIs - AMD at GDC14

24| AMD Direct3D Futures | March 20th, 2014

KEY INNOVATIONS

Innovation CPU-side win GPU-side win

Explicit Synchronisation

EfficiencyRequired for bindless

texturesLess overhead

Explicit Memory Management

EfficiencyPredictability

Application flexibilityZero copy

Control over placement

Do lessPredictability, Efficiency

Enables aggressive scheduleFEWER BUGS

Page 25: Direct3D and the Future of Graphics APIs - AMD at GDC14

25| AMD Direct3D Futures | March 20th, 2014

NEW PROBLEMS(AND TIPS TO SOLVE THEM)

Page 26: Direct3D and the Future of Graphics APIs - AMD at GDC14

26| AMD Direct3D Futures | March 20th, 2014

NEW VISIBLE LIMITS

More draws in does not automatically mean more triangles out

– You will not see full rendering rates with triangles averaging 1 pixel each.

– Wireframe mode should look different to filled rendering

Page 27: Direct3D and the Future of Graphics APIs - AMD at GDC14

27| AMD Direct3D Futures | March 20th, 2014

NEW VISIBLE LIMITS

Feeding the GPU much more efficiently means exploring interesting new limits that weren’t visible before

10k/frame of anything is ~1µs per thing.

GPU pipeline depth is likely to be 1-10µs (1k-10k cycles).

Specific limit: context registers

– Shader tables are NOT in the context

– Compute doesn’t bottleneck on context

Page 28: Direct3D and the Future of Graphics APIs - AMD at GDC14

28| AMD Direct3D Futures | March 20th, 2014

APPLICATION IN CHARGE

Application is arbiter of correct rendering

– This is a serious responsibility

– The benefits of D3D12 aren’t readily available without this condition

Applications must be warning-free on the debug layer

Different opportunities for driver intervention

Page 29: Direct3D and the Future of Graphics APIs - AMD at GDC14

29| AMD Direct3D Futures | March 20th, 2014

APPLICATION IN CHARGE

No driver thread in play

– App can target much lower latency

– BUT implies app has to be ready with new GPU work

Driver F1

App Render Frame 1

GPU F1

Frame 2

F2

F2

Frame 3

F3

F3

D3D11: No dead GPU time after 1st frame (but extra latency)

DeadTime

First work sent to driver Driver buffers Present; no future dead time

No buffered present reveals dead time on GPU

Page 30: Direct3D and the Future of Graphics APIs - AMD at GDC14

30| AMD Direct3D Futures | March 20th, 2014

USE COMMAND BUFFERS SPARINGLY

Each API command list maps to a single hardware command buffer

Starting / ending a command list has an overhead

– Writes full 3D state, may flush caches or idle GPU

We think a good rule of thumb will be to target around 100 command buffers/frame

– Use the multiple submission API where possibleCB0 CB1 CB2CB0

Multiple applications running on system

Application 0 queue

CB0 CB1 CB2

CB0

Application 1 queue

GPU executes

Page 31: Direct3D and the Future of Graphics APIs - AMD at GDC14

31| AMD Direct3D Futures | March 20th, 2014

ROUND-UP

Page 32: Direct3D and the Future of Graphics APIs - AMD at GDC14

32| AMD Direct3D Futures | March 20th, 2014

ALL-NEW

There’s a learning curve here for all of us

In the main it’s a shallow one

– Compared at least to the general problem of multithreaded rendering Multithread is always hard.

– Simpler design means fewer bugs and more predictable performance

Page 33: Direct3D and the Future of Graphics APIs - AMD at GDC14

33| AMD Direct3D Futures | March 20th, 2014

WHAT AMD PLAN TO DELIVER

An early preview driver “soon”

Release driver for Direct3D12 launch

Continuous engagement

– With Microsoft

– With ISVs Bring your opinions to us and to Microsoft.

Page 34: Direct3D and the Future of Graphics APIs - AMD at GDC14

34| AMD Direct3D Futures | March 20th, 2014

DX12 AND FROSTBITE

Johan AnderssonTechnical Director

Page 35: Direct3D and the Future of Graphics APIs - AMD at GDC14

35| AMD Direct3D Futures | March 20th, 2014

DX12 AND FROSTBITE

PC is very important for EA and we’ve been pushing hard to improve graphics capabilities on Windows

Excited to be working with Microsoft and the IHVs on Direct3D again!

Good & very healthy collaboration between Microsoft, the IHVs and us game/engine developers

DX12 is a really big step forward from DX11 or GL4

Page 36: Direct3D and the Future of Graphics APIs - AMD at GDC14

36| AMD Direct3D Futures | March 20th, 2014

DX12 FEATURES AND FROSTBITE

Key DX12 features that are a great fit for Frostbite:

– Efficient parallel command buffers

– Descriptor tables

– Pipeline objects

– Explicit resource synchronization

– Explicit memory management

DX12 is still in development so actively working with Microsoft & the IHVs to help make sure all of it fits together and is efficient

Page 37: Direct3D and the Future of Graphics APIs - AMD at GDC14

37| AMD Direct3D Futures | March 20th, 2014

DX12 PLATFORMS

DX12 support on Windows 7 & most existing PC hardware is critical for us

– Huge user base still on Windows 7

– Gamers would see major benefits without upgrading

DX12 support on Xbox One is critical for us

– Will lead to improved performance & quality for future Xbox One titles

– Almost all of our games are cross platform Gen4/PC

– Easier development – renderer is shared between Windows & Xbox One

Looking forward to DX12 on mobile/tablets

– Power efficiency & low overhead is really key

– Need larger user base to target on Windows for mobile

Page 38: Direct3D and the Future of Graphics APIs - AMD at GDC14

38| AMD Direct3D Futures | March 20th, 2014

DX12 AND FROSTBITE

We are building a DX12 renderer for Frostbite!

– Will work on GPUs from all vendors – benefits a wide set of gamers

Expected benefits over DX11:

– More stable and consistent performance

– Higher overall performance

– Move our design target – more richer & more detailed game worlds

– Thinner drivers – easier to work with / less of a black box

– More control for us developers – new techniques & optimizations

Really happy that the full Windows & Xbox eco systems are moving to low-level graphics API!

Page 39: Direct3D and the Future of Graphics APIs - AMD at GDC14

39| AMD Direct3D Futures | March 20th, 2014

QUESTIONS