91

Performance OpenGL Platform Independent Techniques Dave Shreiner Brad Grantham Dave Shreiner Brad Grantham

Embed Size (px)

Citation preview

Performance OpenGLPerformance OpenGLPlatform Independent TechniquesPlatform Independent Techniques

Performance OpenGLPerformance OpenGLPlatform Independent TechniquesPlatform Independent Techniques

Dave ShreinerDave Shreiner

Brad GranthamBrad GranthamDave ShreinerDave Shreiner

Brad GranthamBrad Grantham

Performance OpenGL 3

What You’ll See Today What You’ll See Today ……What You’ll See Today What You’ll See Today ……

•An in-depth look at the OpenGL pipeline from a performance perspective

•Techniques for determining where OpenGL application performance bottlenecks are

•A bunch of simple, good habits for OpenGL applications

•An in-depth look at the OpenGL pipeline from a performance perspective

•Techniques for determining where OpenGL application performance bottlenecks are

•A bunch of simple, good habits for OpenGL applications

Performance OpenGL 4

Performance Tuning Performance Tuning AssumptionsAssumptionsPerformance Tuning Performance Tuning AssumptionsAssumptions

•You’re trying to tune an interactive OpenGL application

•There’s an established metric for estimating the application’s performance– Consistent frames/second

– Number of pixels or primitives to be rendered per frame

•You can change the application’s source code

•You’re trying to tune an interactive OpenGL application

•There’s an established metric for estimating the application’s performance– Consistent frames/second

– Number of pixels or primitives to be rendered per frame

•You can change the application’s source code

Performance OpenGL 5

Errors Skew Errors Skew Performance Performance MeasurementsMeasurements

Errors Skew Errors Skew Performance Performance MeasurementsMeasurementsOpenGL Reports Errors OpenGL Reports Errors

AsynchronouslyAsynchronously•OpenGL doesn’t tell you when something

goes wrong– Need to use glGetError() to determine if something

went wrong

– Calls with erroneous parameters will silently set error state and return without completing

OpenGL Reports Errors OpenGL Reports Errors AsynchronouslyAsynchronously•OpenGL doesn’t tell you when something

goes wrong– Need to use glGetError() to determine if something

went wrong

– Calls with erroneous parameters will silently set error state and return without completing

Performance OpenGL 6

Checking a single Checking a single commandcommandChecking a single Checking a single commandcommand

Simple MacroSimple Macro

• Some limitations on where the macro can be used– can’t use inside of glBegin() / glEnd() pair

Simple MacroSimple Macro

• Some limitations on where the macro can be used– can’t use inside of glBegin() / glEnd() pair

#define CHECK_OPENGL_ERROR( cmd ) \ cmd; \ { GLenum error; \ while ( (error = glGetError()) != GL_NO_ERROR) { \ printf( "[%s:%d] '%s' failed with error %s\n", \

__FILE__, __LINE__, #cmd, \ gluErrorString(error) ); \ }

Performance OpenGL 7

The OpenGL PipelineThe OpenGL Pipeline(The Macroscopic View)(The Macroscopic View)The OpenGL PipelineThe OpenGL Pipeline(The Macroscopic View)(The Macroscopic View)

Ap

plic

ati

on

Tra

nsf

orm

ati

on

Rast

eri

zati

on

Fram

eb

uff

er

Performance OpenGL 8

Performance Performance BottlenecksBottlenecksPerformance Performance BottlenecksBottlenecks

BottlenecksBottlenecks are the performance are the performance limiting part of the applicationlimiting part of the application•Application bottleneck

– Application may not pass data fast enough to the OpenGL pipeline

•Transform-limited bottleneck– OpenGL may not be able to process vertex

transformations fast enough

BottlenecksBottlenecks are the performance are the performance limiting part of the applicationlimiting part of the application•Application bottleneck

– Application may not pass data fast enough to the OpenGL pipeline

•Transform-limited bottleneck– OpenGL may not be able to process vertex

transformations fast enough

Performance OpenGL 9

Performance Performance Bottlenecks (cont.)Bottlenecks (cont.)Performance Performance Bottlenecks (cont.)Bottlenecks (cont.)

•Fill-limited bottleneck– OpenGL may not be able to rasterize primitives fast

enough

•Fill-limited bottleneck– OpenGL may not be able to rasterize primitives fast

enough

Performance OpenGL 10

There Will Always Be A There Will Always Be A BottleneckBottleneckThere Will Always Be A There Will Always Be A BottleneckBottleneck

Some portion of the application will Some portion of the application will always be the limiting factor to always be the limiting factor to performanceperformance• If the application performs to expectations, then the

bottleneck isn’t a problem

• Otherwise, need to be able to identify which part of the application is the bottleneck

• We’ll work backwards through the OpenGL pipeline in resolving bottlenecks

Some portion of the application will Some portion of the application will always be the limiting factor to always be the limiting factor to performanceperformance• If the application performs to expectations, then the

bottleneck isn’t a problem

• Otherwise, need to be able to identify which part of the application is the bottleneck

• We’ll work backwards through the OpenGL pipeline in resolving bottlenecks

Performance OpenGL 11

Fill-limited BottlenecksFill-limited BottlenecksFill-limited BottlenecksFill-limited Bottlenecks

System cannot fill all the pixels System cannot fill all the pixels required in the allotted timerequired in the allotted time•Easiest bottleneck to test

•Reduce number of pixels application must fill– Make the viewport smaller

System cannot fill all the pixels System cannot fill all the pixels required in the allotted timerequired in the allotted time•Easiest bottleneck to test

•Reduce number of pixels application must fill– Make the viewport smaller

Performance OpenGL 12

Reducing Fill-limited Reducing Fill-limited BottlenecksBottlenecksReducing Fill-limited Reducing Fill-limited BottlenecksBottlenecks

The Easy FixesThe Easy Fixes•Make the viewport smaller

– This may not be an acceptable solution, but it’s easy

•Reduce the frame-rate

The Easy FixesThe Easy Fixes•Make the viewport smaller

– This may not be an acceptable solution, but it’s easy

•Reduce the frame-rate

framepixels

secondframes

secondpixels

M3.1306

M800frame

pixels

secondframes

secondpixels

M7.1075

M800

Performance OpenGL 13

A Closer Look at A Closer Look at OpenGL’s OpenGL’s Rasterization PipelineRasterization Pipeline

A Closer Look at A Closer Look at OpenGL’s OpenGL’s Rasterization PipelineRasterization Pipeline

TextureMappingEngine

ToFragment Tests

PointRasterization

LineRasterization

TriangleRasterization

Pixel Rectangle

Rasterization

BitmapRasterization

FogEngine

Color Sum(Sep.

Specular Color)

Performance OpenGL 14

Reducing Fill-limited Reducing Fill-limited Bottlenecks (cont.)Bottlenecks (cont.)Reducing Fill-limited Reducing Fill-limited Bottlenecks (cont.)Bottlenecks (cont.)

Rasterization PipelineRasterization Pipeline• Cull back facing

polygons

– Does require all primitives have same facediness

• Use per-vertex fog, as compared to per-pixel

Rasterization PipelineRasterization Pipeline• Cull back facing

polygons

– Does require all primitives have same facediness

• Use per-vertex fog, as compared to per-pixel

TextureMappingEngine

ToFragment Tests

PointRasterization

LineRasterization

TriangleRasterization

Pixel Rectangle

Rasterization

BitmapRasterization

FogEngine

Color Sum

Performance OpenGL 15

A Closer Look at A Closer Look at OpenGL’s OpenGL’s Rasterization Pipeline Rasterization Pipeline (cont.)(cont.)

A Closer Look at A Closer Look at OpenGL’s OpenGL’s Rasterization Pipeline Rasterization Pipeline (cont.)(cont.)

Fragments(from previous

stages)

PixelOwnership

Test

ScissorTest

AlphaTest

StencilTest

DepthBufferTest

Blending DitheringLogical

Operations

Fram

eb

uff

er

Performance OpenGL 16

Reducing Fill-limited Reducing Fill-limited Bottlenecks (cont.)Bottlenecks (cont.)Reducing Fill-limited Reducing Fill-limited Bottlenecks (cont.)Bottlenecks (cont.)

Fragment PipelineFragment Pipeline•Do less work per pixel

– Disable dithering

– Depth-sort primitives to reduce depth testing

– Use alpha test to reject transparent fragments saves doing a pixel read-back from the framebuffer in the

blending phase

Fragment PipelineFragment Pipeline•Do less work per pixel

– Disable dithering

– Depth-sort primitives to reduce depth testing

– Use alpha test to reject transparent fragments saves doing a pixel read-back from the framebuffer in the

blending phase

PixelOwnership

Test

ScissorTest

AlphaTest

StencilTest

DepthBufferTest

Blending DitheringLogical

Operations

Performance OpenGL 17

A Closer Look at A Closer Look at OpenGL’s Pixel OpenGL’s Pixel PipelinePipeline

A Closer Look at A Closer Look at OpenGL’s Pixel OpenGL’s Pixel PipelinePipeline

Pixel Scale&

BiasClamp

PixelRectangle

PixelUnpacking

TypeConversion

FramebufferTextureMemory

Pixel Map

Performance OpenGL 18

Working with Pixel Working with Pixel RectanglesRectanglesWorking with Pixel Working with Pixel RectanglesRectangles

Texture downloads and BltsTexture downloads and Blts•OpenGL supports many formats for storing

pixel data– Signed and unsigned types, floating point

•Type conversions from storage type to framebuffer / texture memory format occur automatically

Texture downloads and BltsTexture downloads and Blts•OpenGL supports many formats for storing

pixel data– Signed and unsigned types, floating point

•Type conversions from storage type to framebuffer / texture memory format occur automatically

Performance OpenGL 19

Pixel Data ConversionsPixel Data ConversionsPixel Data ConversionsPixel Data Conversions

0

5

10

15

20

25

30

Machine 1 Machine 2 Machine 3

GL_BYTE GL_UNSIGNED_BYTE GL_SHORT GL_UNSIGNED_SHORT

GL_INT GL_UNSIGNED_INT GL_FLOAT

Performance OpenGL 20

Pixel Data Conversions Pixel Data Conversions (cont.)(cont.)Pixel Data Conversions Pixel Data Conversions (cont.)(cont.)

0

0.5

1

1.5

2

2.5

Machine 1 Machine 2 Machine 3

GL_UNSIGNED_SHORT_4_4_4_4 GL_UNSIGNED_SHORT_4_4_4_4_REV GL_UNSIGNED_SHORT_5_5_5_1

GL_UNSIGNED_SHORT_1_5_5_5_REV GL_UNSIGNED_INT_8_8_8_8 GL_UNSIGNED_INT_8_8_8_8_REV

GL_UNSIGNED_INT_10_10_10_2 GL_UNSIGNED_INT_2_10_10_10_REV

Performance OpenGL 21

Pixel Data Conversions Pixel Data Conversions (cont.)(cont.)Pixel Data Conversions Pixel Data Conversions (cont.)(cont.)

ObservationsObservations•Signed data types probably aren’t optimized

– OpenGL clamps colors to [0, 1]

•Match pixel format to window’s pixel format for blts– Usually involves using packed pixel formats

– No significant difference for rendering speed for texture’s internal format

ObservationsObservations•Signed data types probably aren’t optimized

– OpenGL clamps colors to [0, 1]

•Match pixel format to window’s pixel format for blts– Usually involves using packed pixel formats

– No significant difference for rendering speed for texture’s internal format

Performance OpenGL 22

Fragment Operations Fragment Operations and Fill Rate and Fill Rate Fragment Operations Fragment Operations and Fill Rate and Fill Rate

The more you do, the less you getThe more you do, the less you get•The more work per pixel, the less fill you get

The more you do, the less you getThe more you do, the less you get•The more work per pixel, the less fill you get

Performance OpenGL 23

Fragment Operations Fragment Operations and Fill Rate (cont’d)and Fill Rate (cont’d)Fragment Operations Fragment Operations and Fill Rate (cont’d)and Fill Rate (cont’d)

Percentage of Peak Fill

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

1 4 7 10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58

61

64

Machine 2Machine 2 Machine 4Machine 4

Percentage of Peak Fill

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

1 4 7 10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58

61

64

Machine 3Machine 3

Percentage of Peak Fill

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

1 4 7 10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58

61

64

Pecentage of Peak Fill

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

1 4 7 10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58

61

64

Machine 1Machine 1

Performance OpenGL 24

Texture-mapping Texture-mapping ConsiderationsConsiderationsTexture-mapping Texture-mapping ConsiderationsConsiderations

Use Texture ObjectsUse Texture Objects•Allows OpenGL to do texture memory

management– Loads texture into texture memory when appropriate

– Only convert data once

•Provides queries for checking if a texture is resident– Load all textures, and verify they all fit simultaneously

Use Texture ObjectsUse Texture Objects•Allows OpenGL to do texture memory

management– Loads texture into texture memory when appropriate

– Only convert data once

•Provides queries for checking if a texture is resident– Load all textures, and verify they all fit simultaneously

Performance OpenGL 25

Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)

Texture Objects (cont.)Texture Objects (cont.)•Assign priorities to textures

– Provides hints to texture-memory manager on which textures are most important

•Can be shared between OpenGL contexts– Allows one thread to load textures; other thread to

render using them

•Requires OpenGL 1.1

Texture Objects (cont.)Texture Objects (cont.)•Assign priorities to textures

– Provides hints to texture-memory manager on which textures are most important

•Can be shared between OpenGL contexts– Allows one thread to load textures; other thread to

render using them

•Requires OpenGL 1.1

Performance OpenGL 26

Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)

Sub-loading TexturesSub-loading Textures•Only update a portion of a texture

– Reduces bandwidth for downloading textures

– Usually requires modifying texture-coordinate matrix

Sub-loading TexturesSub-loading Textures•Only update a portion of a texture

– Reduces bandwidth for downloading textures

– Usually requires modifying texture-coordinate matrix

Performance OpenGL 27

Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)

Know what sizes your textures Know what sizes your textures need to beneed to be•What sizes of mipmaps will you need?

•OpenGL 1.2 introduces texture level-of-detail– Ability to have fine grained control over mipmap stack

Only load a subset of mipmaps Control which mipmaps are used

Know what sizes your textures Know what sizes your textures need to beneed to be•What sizes of mipmaps will you need?

•OpenGL 1.2 introduces texture level-of-detail– Ability to have fine grained control over mipmap stack

Only load a subset of mipmaps Control which mipmaps are used

Performance OpenGL 28

What If Those Options What If Those Options Aren’t Viable?Aren’t Viable?What If Those Options What If Those Options Aren’t Viable?Aren’t Viable?

•Use more or faster hardware

•Utilize the “extra time” in other parts of the application– Transform pipeline

tessellate objects for smoother appearance use better lighting

– Application more accurate simulation better physics

•Use more or faster hardware

•Utilize the “extra time” in other parts of the application– Transform pipeline

tessellate objects for smoother appearance use better lighting

– Application more accurate simulation better physics

Performance OpenGL 29

Transform-limited Transform-limited BottlenecksBottlenecksTransform-limited Transform-limited BottlenecksBottlenecks

System cannot process all the System cannot process all the vertices required in the allotted vertices required in the allotted timetime• If application doesn’t speed up in fill-limited

test, it’s most likely transform-limited

•Additional tests include– Disable lighting

– Disable texture coordinate generation

System cannot process all the System cannot process all the vertices required in the allotted vertices required in the allotted timetime• If application doesn’t speed up in fill-limited

test, it’s most likely transform-limited

•Additional tests include– Disable lighting

– Disable texture coordinate generation

Performance OpenGL 30

A Closer Look at A Closer Look at OpenGL’s OpenGL’s Transformation Transformation PipelinePipeline

A Closer Look at A Closer Look at OpenGL’s OpenGL’s Transformation Transformation PipelinePipeline

TextureCoordinateGeneration

VertexCoordinates

LightingNormals

TextureCoordinates

Color

ModelViewTransform

ProjectionTransform

Lighting Clipping

TextureCoordinateTransform

Performance OpenGL 31

Reducing Transform-Reducing Transform-limited Bottleneckslimited BottlenecksReducing Transform-Reducing Transform-limited Bottleneckslimited Bottlenecks

Do less work per-vertexDo less work per-vertex•Tune lighting

•Use “typed” OpenGL matrices

•Use explicit texture coordinates

•Simulate features in texturing– lighting

Do less work per-vertexDo less work per-vertex•Tune lighting

•Use “typed” OpenGL matrices

•Use explicit texture coordinates

•Simulate features in texturing– lighting

Performance OpenGL 32

Lighting Lighting ConsiderationsConsiderationsLighting Lighting ConsiderationsConsiderations

•Use infinite (directional) lights– Less computation compared to local (point) lights

– Don’t use GL_LIGHTMODEL_LOCAL_VIEWER

•Use fewer lights– Not all lights may be hardware accelerated

•Use infinite (directional) lights– Less computation compared to local (point) lights

– Don’t use GL_LIGHTMODEL_LOCAL_VIEWER

•Use fewer lights– Not all lights may be hardware accelerated

Performance OpenGL 33

Lighting Lighting Considerations (cont.)Considerations (cont.)Lighting Lighting Considerations (cont.)Considerations (cont.)

•Use a texture-based lighting scheme– Only helps if you’re not fill-limited

•Use a texture-based lighting scheme– Only helps if you’re not fill-limited

Performance OpenGL 34

Reducing Transform-Reducing Transform-limited Bottlenecks limited Bottlenecks (cont.)(cont.)

Reducing Transform-Reducing Transform-limited Bottlenecks limited Bottlenecks (cont.)(cont.)Matrix AdjustmentsMatrix Adjustments

•Use “typed” OpenGL matrix calls

– Some implementations track matrix type to reduce matrix-vector multiplication operations

Matrix AdjustmentsMatrix Adjustments•Use “typed” OpenGL matrix calls

– Some implementations track matrix type to reduce matrix-vector multiplication operations

glLoadMatrix*()glMultMatrix*()

glRotate*()glScale*()glTranslate*()glLoadIdentity()

“Untyped”“Typed”

Performance OpenGL 35

Application-limited Application-limited BottlenecksBottlenecksApplication-limited Application-limited BottlenecksBottlenecks

When OpenGL does all you ask, and When OpenGL does all you ask, and your application still runs too slowyour application still runs too slow•System may not be able to transfer data to

OpenGL fast enough

•Test by modifying application so that no rendering is performed, but all data is still transferred to OpenGL

When OpenGL does all you ask, and When OpenGL does all you ask, and your application still runs too slowyour application still runs too slow•System may not be able to transfer data to

OpenGL fast enough

•Test by modifying application so that no rendering is performed, but all data is still transferred to OpenGL

Performance OpenGL 36

Application-limited Application-limited Bottlenecks (cont.)Bottlenecks (cont.)Application-limited Application-limited Bottlenecks (cont.)Bottlenecks (cont.)

•Rendering in OpenGL is triggered when vertices are sent to the pipe

•Send all data to pipe, just not necessarily in its original form– Replace all glVertex*() and glColor*() calls with glNormal*() calls

glNormal*() only sets the current vertex’s normal values Application transfers the same amount of data to the

pipe, but doesn’t have to wait for rendering to complete

•Rendering in OpenGL is triggered when vertices are sent to the pipe

•Send all data to pipe, just not necessarily in its original form– Replace all glVertex*() and glColor*() calls with glNormal*() calls

glNormal*() only sets the current vertex’s normal values Application transfers the same amount of data to the

pipe, but doesn’t have to wait for rendering to complete

Performance OpenGL 37

Reducing Application-Reducing Application-limited Bottleneckslimited BottlenecksReducing Application-Reducing Application-limited Bottleneckslimited Bottlenecks

•No amount of OpenGL transform or rasterization tuning will help the problem

•Revisit application design decisions– Data structures

– Traversal methods

– Storage formats

•Use an application profiling tool (e.g. pixie & prof, gprof, or other similar tools)

•No amount of OpenGL transform or rasterization tuning will help the problem

•Revisit application design decisions– Data structures

– Traversal methods

– Storage formats

•Use an application profiling tool (e.g. pixie & prof, gprof, or other similar tools)

Performance OpenGL 38

The Novice OpenGL The Novice OpenGL Programmer’s View of Programmer’s View of the Worldthe World

The Novice OpenGL The Novice OpenGL Programmer’s View of Programmer’s View of the Worldthe World

SetState

Render

Performance OpenGL 39

What Happens When What Happens When You Set OpenGL StateYou Set OpenGL StateWhat Happens When What Happens When You Set OpenGL StateYou Set OpenGL State

• The amount of work varies by operation

• But all request a validation at next rendering operation

• The amount of work varies by operation

• But all request a validation at next rendering operation

Turning on or off a Turning on or off a feature (feature (glEnable()))

Set the feature’s enable Set the feature’s enable flagflag

Set a “typed” set of Set a “typed” set of data data ((glMaterialfv()))

Set values in OpenGL’s Set values in OpenGL’s contextcontext

Transfer “untyped” Transfer “untyped” data data ((glTexImage2D()))

Transfer and convert Transfer and convert data from host format data from host format into internal into internal representationrepresentation

Performance OpenGL 40

A (Somewhat) More A (Somewhat) More Accurate Accurate RepresentationRepresentation

A (Somewhat) More A (Somewhat) More Accurate Accurate RepresentationRepresentation

SetState

Render

Validation

Performance OpenGL 41

ValidationValidationValidationValidation

OpenGL’s synchronization processOpenGL’s synchronization process• Validation occurs in the transition from state setting

to rendering

• Not all state changes trigger a validation

– Vertex data (e.g. color, normal, texture coordinates)

– Changing rendering primitive

OpenGL’s synchronization processOpenGL’s synchronization process• Validation occurs in the transition from state setting

to rendering

• Not all state changes trigger a validation

– Vertex data (e.g. color, normal, texture coordinates)

– Changing rendering primitive

glMaterial( GL_FRONT, GL_DIFFUSE, blue );glEnable( GL_LIGHT0 );glBegin( GL_TRIANGLES );

Performance OpenGL 42

What Happens in a What Happens in a ValidationValidationWhat Happens in a What Happens in a ValidationValidation

•Changing state may do more than just set values in the OpenGL context– May require reconfiguring the OpenGL pipeline

selecting a different rasterization routine enabling the lighting machine

– Internal caches may be recomputed vertex / viewpoint independent data

•Changing state may do more than just set values in the OpenGL context– May require reconfiguring the OpenGL pipeline

selecting a different rasterization routine enabling the lighting machine

– Internal caches may be recomputed vertex / viewpoint independent data

Performance OpenGL 43

The Way it Really Is The Way it Really Is (Conceptually)(Conceptually)The Way it Really Is The Way it Really Is (Conceptually)(Conceptually)

SetState

Render

Validation

DifferentRenderingPrimitive

Performance OpenGL 44

Why Be Concerned Why Be Concerned About Validations?About Validations?Why Be Concerned Why Be Concerned About Validations?About Validations?

Validations can rob performance Validations can rob performance from an applicationfrom an application• “Redundant” state and primitive changes

•Validation is a two-step process– Determine what data needs to be updated

– Select appropriate rendering routines based on enabled features

Validations can rob performance Validations can rob performance from an applicationfrom an application• “Redundant” state and primitive changes

•Validation is a two-step process– Determine what data needs to be updated

– Select appropriate rendering routines based on enabled features

Performance OpenGL 45

How Can Validations How Can Validations Be Minimized?Be Minimized?How Can Validations How Can Validations Be Minimized?Be Minimized?

Be LazyBe Lazy• Change state as little as possible

• Try to group primitives by type

• Beware of “under the covers” state changes– GL_COLOR_MATERIAL

may force an update to the lighting cache ever call to glColor*()

Be LazyBe Lazy• Change state as little as possible

• Try to group primitives by type

• Beware of “under the covers” state changes– GL_COLOR_MATERIAL

may force an update to the lighting cache ever call to glColor*()

Performance OpenGL 46

How Can Validations How Can Validations Be Minimized? (cont.)Be Minimized? (cont.)How Can Validations How Can Validations Be Minimized? (cont.)Be Minimized? (cont.)

Beware of Beware of glPushAttrib() / glPopAttrib()• Very convenient for writing libraries

• Saves lots of state when called– All elements of an attribute groups are copied for later

• Almost guaranteed to do a validation when calling glPopAttrib()

Beware of Beware of glPushAttrib() / glPopAttrib()• Very convenient for writing libraries

• Saves lots of state when called– All elements of an attribute groups are copied for later

• Almost guaranteed to do a validation when calling glPopAttrib()

Performance OpenGL 47

State SortingState SortingState SortingState Sorting

Simple technique … Big payoffSimple technique … Big payoff•Arrange rendering sequence to minimize

state changes

•Group primitives based on their state attributes

•Organize rendering based on the expense of the operation

Simple technique … Big payoffSimple technique … Big payoff•Arrange rendering sequence to minimize

state changes

•Group primitives based on their state attributes

•Organize rendering based on the expense of the operation

Performance OpenGL 48

State Sorting (cont.)State Sorting (cont.)State Sorting (cont.)State Sorting (cont.)

Texture Download

Modifying LightingParameters

Matrix Operations

Vertex Data Least Expensive

Most Expensive

Performance OpenGL 49

A Comment on A Comment on EncapsulationEncapsulationA Comment on A Comment on EncapsulationEncapsulation

An Extremely Handy Design An Extremely Handy Design Mechanism, however …Mechanism, however …•Encapsulation may affect performance

– Tendency to want to complete all operations for an object before continuing to next object

limits state sorting potential may cause unnecessary validations

An Extremely Handy Design An Extremely Handy Design Mechanism, however …Mechanism, however …•Encapsulation may affect performance

– Tendency to want to complete all operations for an object before continuing to next object

limits state sorting potential may cause unnecessary validations

Performance OpenGL 50

A Comment on A Comment on Encapsulation (cont.)Encapsulation (cont.)A Comment on A Comment on Encapsulation (cont.)Encapsulation (cont.)

• Using a “visitor” type pattern can reduce state changes and validations– Usually a two-pass operation

Traverse objects, building a list of rendering primitives by state and type

Render by processing lists

– Popular method employed by many scene-graph packages

• Using a “visitor” type pattern can reduce state changes and validations– Usually a two-pass operation

Traverse objects, building a list of rendering primitives by state and type

Render by processing lists

– Popular method employed by many scene-graph packages

Performance OpenGL 51

““Scene Graph” Scene Graph” LessonsLessons““Scene Graph” Scene Graph” LessonsLessons

Can you use pre-packaged software?Can you use pre-packaged software?•Save yourself some trouble

Lessons learned from scene graphsLessons learned from scene graphsInventor OpenRM Performer

DirectModel Cosmo3D OpenSG

DirectSceneGraph/Fahrenheit

Can you use pre-packaged software?Can you use pre-packaged software?•Save yourself some trouble

Lessons learned from scene graphsLessons learned from scene graphsInventor OpenRM Performer

DirectModel Cosmo3D OpenSG

DirectSceneGraph/Fahrenheit

Performance OpenGL 52

Scene Graph LessonsScene Graph LessonsScene Graph LessonsScene Graph Lessons

Organization is the Organization is the keykey•Organize scene data for performance

•E.g. by transformation & bounding hierarchy

All about balance (like all high-perf All about balance (like all high-perf coding)coding)•Speed versus convenience

•Portability versus speed

Organization is the Organization is the keykey•Organize scene data for performance

•E.g. by transformation & bounding hierarchy

All about balance (like all high-perf All about balance (like all high-perf coding)coding)•Speed versus convenience

•Portability versus speed

Performance OpenGL 53

Performance GoalsPerformance GoalsPerformance GoalsPerformance Goals

Identify performance techniques and Identify performance techniques and goalsgoals•Sort to eliminate costly state changes

•Evaluate state lazily– Don’t set state if won’t draw geometry

•Eliminate redundance– Don’t set “red” if material is already “red”

Identify performance techniques and Identify performance techniques and goalsgoals•Sort to eliminate costly state changes

•Evaluate state lazily– Don’t set state if won’t draw geometry

•Eliminate redundance– Don’t set “red” if material is already “red”

Performance OpenGL 54

Feature GoalsFeature GoalsFeature GoalsFeature Goals

Identify algorithms and requirementsIdentify algorithms and requirements•E.g. Sort alpha/transparent shapes back-to-

front

Identify multiple rendering passesIdentify multiple rendering passes•Shadow texture

•Environment Map

Will you need threading?Will you need threading?

Identify algorithms and requirementsIdentify algorithms and requirements•E.g. Sort alpha/transparent shapes back-to-

front

Identify multiple rendering passesIdentify multiple rendering passes•Shadow texture

•Environment Map

Will you need threading?Will you need threading?

Performance OpenGL 55

Scene Graph LessonsScene Graph LessonsScene Graph LessonsScene Graph Lessons

Put units of work in nodes/leavesPut units of work in nodes/leaves•Speed of traversal vs. convenience

– Instancing

•e.g. Vertex Arrays, Material, Primitives

Do work where you have the Do work where you have the information neededinformation needed•e.g. don't require leaf info before traversal

Put units of work in nodes/leavesPut units of work in nodes/leaves•Speed of traversal vs. convenience

– Instancing

•e.g. Vertex Arrays, Material, Primitives

Do work where you have the Do work where you have the information neededinformation needed•e.g. don't require leaf info before traversal

Performance OpenGL 56

Scene Graph LessonsScene Graph LessonsScene Graph LessonsScene Graph Lessons

Data (“Nodes”) versus operation Data (“Nodes”) versus operation (“Action”)(“Action”)VertexSet::Draw() might be too inflexible

Renderer::draw(Node*)allows more flexibility

•Write new Renderer without changing nodes

Data (“Nodes”) versus operation Data (“Nodes”) versus operation (“Action”)(“Action”)VertexSet::Draw() might be too inflexible

Renderer::draw(Node*)allows more flexibility

•Write new Renderer without changing nodes

Performance OpenGL 57

Scene Graph LessonsScene Graph LessonsScene Graph LessonsScene Graph Lessons

Data (“Nodes”) versus operation Data (“Nodes”) versus operation (“Action”)(“Action”)•Might take a little thought - but beneficial

•Gives you flexibility for future

•Need to keep careful eye on performance

Data (“Nodes”) versus operation Data (“Nodes”) versus operation (“Action”)(“Action”)•Might take a little thought - but beneficial

•Gives you flexibility for future

•Need to keep careful eye on performance

Performance OpenGL 58

Example Data Example Data StructureStructureExample Data Example Data StructureStructure

Directed Graph / TreeDirected Graph / Tree• Internal and leaf Nodes

•Subclassing

Render TraversalRender Traversal•Use bounding volumes

•OpenGL state management

Directed Graph / TreeDirected Graph / Tree• Internal and leaf Nodes

•Subclassing

Render TraversalRender Traversal•Use bounding volumes

•OpenGL state management

Performance OpenGL 59

ExampleExampleExampleExample GroupGroup

GroupGroup

ShapeShape

VertexSet

PixelState

VertexState

Performance OpenGL 60

Internal NodesInternal NodesInternal NodesInternal Nodes

Represent spatial organizationRepresent spatial organization•Bounding hierarchy, spatial grid, etc…

Might encode some functionalityMight encode some functionality•Animation

•Level of Detail

Represent spatial organizationRepresent spatial organization•Bounding hierarchy, spatial grid, etc…

Might encode some functionalityMight encode some functionality•Animation

•Level of Detail

Performance OpenGL 61

Internal NodesInternal NodesInternal NodesInternal Nodes

Opportunities from bounding Opportunities from bounding volumes volumes •View frustum cull (early if using a hierarchy)

•Screen space size estimation– e.g. for level-of-detail or subdivision

•Ray/picking optimizations

Opportunities from bounding Opportunities from bounding volumes volumes •View frustum cull (early if using a hierarchy)

•Screen space size estimation– e.g. for level-of-detail or subdivision

•Ray/picking optimizations

Performance OpenGL 62

Leaf NodesLeaf NodesLeaf NodesLeaf Nodes

Make leaf data approximate OpenGL Make leaf data approximate OpenGL state state •Rapid application from leaf to OpenGL

But encode some abstractionBut encode some abstraction• Inventor’s SoComplexity

•EnvironmentMap instead of assuming GL_SPHERE_MAP

•Allow optimizations on specific platforms

Make leaf data approximate OpenGL Make leaf data approximate OpenGL state state •Rapid application from leaf to OpenGL

But encode some abstractionBut encode some abstraction• Inventor’s SoComplexity

•EnvironmentMap instead of assuming GL_SPHERE_MAP

•Allow optimizations on specific platforms

Performance OpenGL 63

Example: Example: VertexSetVertexSetExample: Example: VertexSetVertexSet

class VertexSet {class VertexSet {

GLenum fmtGLenum fmt

float *verts;float *verts;

int vertCount;int vertCount;

};};

class VertexSet {class VertexSet {

GLenum fmtGLenum fmt

float *verts;float *verts;

int vertCount;int vertCount;

};};

Performance OpenGL 64

Example: Example: VertexStateVertexStateExample: Example: VertexStateVertexState

class VertexState {class VertexState {

MaterialMaterial *Mtl; *Mtl;

LightingLighting *Lighting; *Lighting;

ClipPlaneClipPlane *Planes; *Planes;

......

};};

class VertexState {class VertexState {

MaterialMaterial *Mtl; *Mtl;

LightingLighting *Lighting; *Lighting;

ClipPlaneClipPlane *Planes; *Planes;

......

};};

Performance OpenGL 65

Example: Example: NodeNodeExample: Example: NodeNode

class Node {class Node {

VolumeVolume Bounds; Bounds;

......

};};

class Node {class Node {

VolumeVolume Bounds; Bounds;

......

};};

Performance OpenGL 66

Example: Example: ShapeShapeExample: Example: ShapeShape

class Shape : Node {class Shape : Node {

VertexSetVertexSet *Verts; *Verts;

VVertexStateertexState *VState; *VState;

PPixelStateixelState *PState; *PState;

int PrimLengths[];int PrimLengths[];

Glenum PrimTypes[];Glenum PrimTypes[];

int *PrimIndices;int *PrimIndices;

......

};};

class Shape : Node {class Shape : Node {

VertexSetVertexSet *Verts; *Verts;

VVertexStateertexState *VState; *VState;

PPixelStateixelState *PState; *PState;

int PrimLengths[];int PrimLengths[];

Glenum PrimTypes[];Glenum PrimTypes[];

int *PrimIndices;int *PrimIndices;

......

};};

Performance OpenGL 67

Example: Example: GroupGroupExample: Example: GroupGroup

class Group : Node {class Group : Node {

int ChildCount;int ChildCount;

NodeNode Children[]; Children[];

......

};};

class Group : Node {class Group : Node {

int ChildCount;int ChildCount;

NodeNode Children[]; Children[];

......

};};

Performance OpenGL 68

OpenGL Context OpenGL Context EncapsulationEncapsulationOpenGL Context OpenGL Context EncapsulationEncapsulation

Bundles of OpenGL stateBundles of OpenGL state• Items commonly changed together

texture fragment ops

lighting material

vertex arrays shaders

Bundles of OpenGL stateBundles of OpenGL state• Items commonly changed together

texture fragment ops

lighting material

vertex arrays shaders

Performance OpenGL 69

Example: Example: OpenGLRendererOpenGLRendererExample: Example: OpenGLRendererOpenGLRenderer

This is where you evaluate lazily and skip This is where you evaluate lazily and skip redundant state changesredundant state changes

class OpenGLRenderer : Traversal {class OpenGLRenderer : Traversal { virtual void Traverse(Node *root);virtual void Traverse(Node *root);private:private: void Schedule (float mtx[16],void Schedule (float mtx[16],

ShapeShape * *shape);shape); void Finish(void); void Finish(void); ......};};

This is where you evaluate lazily and skip This is where you evaluate lazily and skip redundant state changesredundant state changes

class OpenGLRenderer : Traversal {class OpenGLRenderer : Traversal { virtual void Traverse(Node *root);virtual void Traverse(Node *root);private:private: void Schedule (float mtx[16],void Schedule (float mtx[16],

ShapeShape * *shape);shape); void Finish(void); void Finish(void); ......};};

Performance OpenGL 70

Example: Example: OpenGLRendererOpenGLRendererExample: Example: OpenGLRendererOpenGLRenderer

OpenGLRenderer::Traverse(root)OpenGLRenderer::Traverse(root)Recursive traversal - call from appRecursive traversal - call from appCheck bounding volumeCheck bounding volumeAccumulate transformation matrixAccumulate transformation matrix

private OpenGLRenderer::Schedule(mtx, shape)private OpenGLRenderer::Schedule(mtx, shape)Schedule a shape for renderingSchedule a shape for rendering

private OpenGLRenderer::Finish()private OpenGLRenderer::Finish()State sort by bundleState sort by bundleSort transparent objs back to frontSort transparent objs back to frontDraw all the objectsDraw all the objects

OpenGLRenderer::Traverse(root)OpenGLRenderer::Traverse(root)Recursive traversal - call from appRecursive traversal - call from appCheck bounding volumeCheck bounding volumeAccumulate transformation matrixAccumulate transformation matrix

private OpenGLRenderer::Schedule(mtx, shape)private OpenGLRenderer::Schedule(mtx, shape)Schedule a shape for renderingSchedule a shape for rendering

private OpenGLRenderer::Finish()private OpenGLRenderer::Finish()State sort by bundleState sort by bundleSort transparent objs back to frontSort transparent objs back to frontDraw all the objectsDraw all the objects

Performance OpenGL 71

Case Study: Rendering Case Study: Rendering A CubeA CubeCase Study: Rendering Case Study: Rendering A CubeA Cube

More than one way to render a cubeMore than one way to render a cube•Render 100000 cubes

More than one way to render a cubeMore than one way to render a cube•Render 100000 cubes

Render sixRender sixseparate quadsseparate quads

Render twoRender twoquads, and onequads, and one

quad-stripquad-strip

Performance OpenGL 72

Case Study: Method 1Case Study: Method 1Case Study: Method 1Case Study: Method 1

Once for each cube …

glColor3fv( color );

for ( i = 0; i < NUM_CUBE_FACES; ++i ) {

glBegin( GL_QUADS );

glVertex3fv( cube[cubeFace[i][0]] );

glVertex3fv( cube[cubeFace[i][1]] );

glVertex3fv( cube[cubeFace[i][2]] );

glVertex3fv( cube[cubeFace[i][3]] );

glEnd();

}

Once for each cube …

glColor3fv( color );

for ( i = 0; i < NUM_CUBE_FACES; ++i ) {

glBegin( GL_QUADS );

glVertex3fv( cube[cubeFace[i][0]] );

glVertex3fv( cube[cubeFace[i][1]] );

glVertex3fv( cube[cubeFace[i][2]] );

glVertex3fv( cube[cubeFace[i][3]] );

glEnd();

}

Performance OpenGL 73

Case Study: Method 2Case Study: Method 2Case Study: Method 2Case Study: Method 2

Once for each cube …

glColor3fv( color ); glBegin( GL_QUADS ); for ( i = 0; i < NUM_CUBE_FACES; ++i ) { glVertex3fv( cube[cubeFace[i][0]] ); glVertex3fv( cube[cubeFace[i][1]] ); glVertex3fv( cube[cubeFace[i][2]] ); glVertex3fv( cube[cubeFace[i][3]] ); } glEnd();

Once for each cube …

glColor3fv( color ); glBegin( GL_QUADS ); for ( i = 0; i < NUM_CUBE_FACES; ++i ) { glVertex3fv( cube[cubeFace[i][0]] ); glVertex3fv( cube[cubeFace[i][1]] ); glVertex3fv( cube[cubeFace[i][2]] ); glVertex3fv( cube[cubeFace[i][3]] ); } glEnd();

Performance OpenGL 74

Case Study: Method 3Case Study: Method 3Case Study: Method 3Case Study: Method 3

glBegin( GL_QUADS ); for ( i = 0; i < numCubes; ++i ) { for ( i = 0; i < NUM_CUBE_FACES; ++i ) { glVertex3fv( cube[cubeFace[i][0]] ); glVertex3fv( cube[cubeFace[i][1]] ); glVertex3fv( cube[cubeFace[i][2]] ); glVertex3fv( cube[cubeFace[i][3]] ); } } glEnd();

glBegin( GL_QUADS ); for ( i = 0; i < numCubes; ++i ) { for ( i = 0; i < NUM_CUBE_FACES; ++i ) { glVertex3fv( cube[cubeFace[i][0]] ); glVertex3fv( cube[cubeFace[i][1]] ); glVertex3fv( cube[cubeFace[i][2]] ); glVertex3fv( cube[cubeFace[i][3]] ); } } glEnd();

Performance OpenGL 75

Case Study: Method 4Case Study: Method 4Case Study: Method 4Case Study: Method 4

Once for each cube …

glColor3fv( color );

glBegin( GL_QUADS );

glVertex3fv( cube[cubeFace[0][0]] );

glVertex3fv( cube[cubeFace[0][1]] );

glVertex3fv( cube[cubeFace[0][2]] );

glVertex3fv( cube[cubeFace[0][3]] );

glVertex3fv( cube[cubeFace[1][0]] );

glVertex3fv( cube[cubeFace[1][1]] );

glVertex3fv( cube[cubeFace[1][2]] );

glVertex3fv( cube[cubeFace[1][3]] );

glEnd();

Once for each cube …

glColor3fv( color );

glBegin( GL_QUADS );

glVertex3fv( cube[cubeFace[0][0]] );

glVertex3fv( cube[cubeFace[0][1]] );

glVertex3fv( cube[cubeFace[0][2]] );

glVertex3fv( cube[cubeFace[0][3]] );

glVertex3fv( cube[cubeFace[1][0]] );

glVertex3fv( cube[cubeFace[1][1]] );

glVertex3fv( cube[cubeFace[1][2]] );

glVertex3fv( cube[cubeFace[1][3]] );

glEnd();

glBegin( GL_QUAD_STRIP );

for ( i = 2; i < NUM_CUBE_FACES; ++i ) {

glVertex3fv( cube[cubeFace[i][0]] );

glVertex3fv( cube[cubeFace[i][1]] );

}

glVertex3fv( cube[cubeFace[2][0]] );

glVertex3fv( cube[cubeFace[2][1]] );

glEnd();

glBegin( GL_QUAD_STRIP );

for ( i = 2; i < NUM_CUBE_FACES; ++i ) {

glVertex3fv( cube[cubeFace[i][0]] );

glVertex3fv( cube[cubeFace[i][1]] );

}

glVertex3fv( cube[cubeFace[2][0]] );

glVertex3fv( cube[cubeFace[2][1]] );

glEnd();

Performance OpenGL 76

Case Study: Method 5Case Study: Method 5Case Study: Method 5Case Study: Method 5

glBegin( GL_QUADS );

for ( i = 0; i < numCubes; ++i ) {

Cube& cube = cubes[i];

glColor3fv( color[i] );

glVertex3fv( cube[cubeFace[0][0]] );

glVertex3fv( cube[cubeFace[0][1]] );

glVertex3fv( cube[cubeFace[0][2]] );

glVertex3fv( cube[cubeFace[0][3]] );

glVertex3fv( cube[cubeFace[1][0]] );

glVertex3fv( cube[cubeFace[1][1]] );

glVertex3fv( cube[cubeFace[1][2]] );

glVertex3fv( cube[cubeFace[1][3]] );

}

glEnd();

glBegin( GL_QUADS );

for ( i = 0; i < numCubes; ++i ) {

Cube& cube = cubes[i];

glColor3fv( color[i] );

glVertex3fv( cube[cubeFace[0][0]] );

glVertex3fv( cube[cubeFace[0][1]] );

glVertex3fv( cube[cubeFace[0][2]] );

glVertex3fv( cube[cubeFace[0][3]] );

glVertex3fv( cube[cubeFace[1][0]] );

glVertex3fv( cube[cubeFace[1][1]] );

glVertex3fv( cube[cubeFace[1][2]] );

glVertex3fv( cube[cubeFace[1][3]] );

}

glEnd();

for ( i = 0; i < numCubes; ++i ) { Cube& cube = cubes[i];

glColor3fv( color[i] );

glBegin( GL_QUAD_STRIP );

for ( i = 2; i < NUM_CUBE_FACES; ++i ) {

glVertex3fv( cube[cubeFace[i][0]] );

glVertex3fv( cube[cubeFace[i][1]] );

}

glVertex3fv( cube[cubeFace[2][0]] );

glVertex3fv( cube[cubeFace[2][1]] );

glEnd();

}

for ( i = 0; i < numCubes; ++i ) { Cube& cube = cubes[i];

glColor3fv( color[i] );

glBegin( GL_QUAD_STRIP );

for ( i = 2; i < NUM_CUBE_FACES; ++i ) {

glVertex3fv( cube[cubeFace[i][0]] );

glVertex3fv( cube[cubeFace[i][1]] );

}

glVertex3fv( cube[cubeFace[2][0]] );

glVertex3fv( cube[cubeFace[2][1]] );

glEnd();

}

Performance OpenGL 77

Case Study: ResultsCase Study: ResultsCase Study: ResultsCase Study: Results

0.00.20.40.60.81.01.21.41.61.82.0

Machine 1 Machine 2 Machine 3

Method 1

Method 2

Method 3

Method 4

Method 5

Performance OpenGL 78

Rendering GeometryRendering GeometryRendering GeometryRendering Geometry

OpenGL has four ways to specify OpenGL has four ways to specify vertex-based geometryvertex-based geometry• Immediate mode

•Display lists

•Vertex arrays

• Interleaved vertex arrays

OpenGL has four ways to specify OpenGL has four ways to specify vertex-based geometryvertex-based geometry• Immediate mode

•Display lists

•Vertex arrays

• Interleaved vertex arrays

Performance OpenGL 79

Rendering Geometry Rendering Geometry (cont.)(cont.)Rendering Geometry Rendering Geometry (cont.)(cont.)

Not all ways are created equalNot all ways are created equal

0

0.5

1

1.5

2

2.5

3

3.5

Machine 1 Machine 2 Machine 3

Immediate

Display List

Array Element

Draw Array

Draw Elements

Interleaved Array Element

Interleaved Draw Array

Interleaved Draw Elements

Performance OpenGL 80

Rendering Geometry Rendering Geometry (cont.)(cont.)Rendering Geometry Rendering Geometry (cont.)(cont.)

Add lighting and color material to Add lighting and color material to the mixthe mix

Add lighting and color material to Add lighting and color material to the mixthe mix

0

1

2

3

4

5

6

Machine 1 Machine 2 Machine 3

Immediate

Display List

Array Element

Draw Array

Draw Elements

Interleaved Array Element

Interleaved Draw Array

Interleaved Draw Elements

Performance OpenGL 81

Case Study: Case Study: Application DescriptionApplication DescriptionCase Study: Case Study: Application DescriptionApplication Description

•1.02M Triangles

•507K Vertices

•Vertex Arrays– Colors

– Normals

– Coordinates

•Color Material

•1.02M Triangles

•507K Vertices

•Vertex Arrays– Colors

– Normals

– Coordinates

•Color Material

Performance OpenGL 82

Case Study: What’s the Case Study: What’s the Problem?Problem?Case Study: What’s the Case Study: What’s the Problem?Problem?

Low frame rateLow frame rate•On a machine capable of 13M

polygons/second application was getting less than 1 frame/second

•Application wasn’t fill limited

Low frame rateLow frame rate•On a machine capable of 13M

polygons/second application was getting less than 1 frame/second

•Application wasn’t fill limited

secondframes

frametrianglessecond

polygons

12 M1.02

M1.13

Performance OpenGL 83

Case Study: The Case Study: The Rendering LoopRendering LoopCase Study: The Case Study: The Rendering LoopRendering Loop

•Vertex Arrays

•glDrawElements() – index based rendering

•Color MaterialglColorMaterial( GL_FRONT, GL_AMBIENT_AND_DIFFUSE );

•Vertex Arrays

•glDrawElements() – index based rendering

•Color MaterialglColorMaterial( GL_FRONT, GL_AMBIENT_AND_DIFFUSE );

glVertexPointer( GL_VERTEX_POINTER );

glNormalPointer( GL_NORMAL_POINTER );

glColorPointer( GL_COLOR_POINTER );

Performance OpenGL 84

Case Study: What To Case Study: What To NoticeNoticeCase Study: What To Case Study: What To NoticeNotice

•Color Material changes two lighting material components per glColor*() call

•Not that many colors used in the model– 18 unique colors, to be exact

– (3 * 1020472 – 18) = 3061398 redundant color calls per frame

•Color Material changes two lighting material components per glColor*() call

•Not that many colors used in the model– 18 unique colors, to be exact

– (3 * 1020472 – 18) = 3061398 redundant color calls per frame

Performance OpenGL 85

Case Study: Case Study: ConclusionsConclusionsCase Study: Case Study: ConclusionsConclusions

A little state sorting goes a long wayA little state sorting goes a long way•Sort triangles based on color

•Rewriting the rendering loop slightly

•Frame rate increased to six frames/second– 500% performance increase

A little state sorting goes a long wayA little state sorting goes a long way•Sort triangles based on color

•Rewriting the rendering loop slightly

•Frame rate increased to six frames/second– 500% performance increase

for ( i = 0; i < numColors; ++i ) { glColor3fv( color[i] ); glDrawElements( …, trisForColor[i] );

}

Performance OpenGL 86

SummarySummarySummarySummary

Know the answer before you startKnow the answer before you start•Understand rendering requirements of your

applications– Have a performance goal

•Utilize applicable benchmarks– Estimate what the hardware’s capable of

•Organize rendering to minimize OpenGL validations and other work

Know the answer before you startKnow the answer before you start•Understand rendering requirements of your

applications– Have a performance goal

•Utilize applicable benchmarks– Estimate what the hardware’s capable of

•Organize rendering to minimize OpenGL validations and other work

Performance OpenGL 87

Summary (cont.)Summary (cont.)Summary (cont.)Summary (cont.)

Pre-process dataPre-process data•Convert images and textures into formats

which don’t require pixel conversions

•Pre-size textures– Simultaneously fit into texture memory

– Mipmaps

•Determine what’s the best format for sending data to the pipe

Pre-process dataPre-process data•Convert images and textures into formats

which don’t require pixel conversions

•Pre-size textures– Simultaneously fit into texture memory

– Mipmaps

•Determine what’s the best format for sending data to the pipe

Performance OpenGL 88

Questions & AnswersQuestions & AnswersQuestions & AnswersQuestions & Answers

Thanks for comingThanks for coming•Updates to notes and slides will be available

at

http://www.plunk.org/Performance.OpenGL

•Feel free to email if you have questions

Thanks for comingThanks for coming•Updates to notes and slides will be available

at

http://www.plunk.org/Performance.OpenGL

•Feel free to email if you have questionsDave ShreinerDave Shreiner

[email protected] Brad GranthamBrad [email protected]

Performance OpenGL 89

ReferencesReferencesReferencesReferences

•OpenGL Programming Guide, 3rd EditionWoo, Mason et. al., Addison Wesley

•OpenGL Reference Manual, 3rd EditionOpenGL Architecture Review Board, Addison Wesley

•OpenGL Specification, Version 1.4OpenGL Architecture Review Board

•OpenGL Programming Guide, 3rd EditionWoo, Mason et. al., Addison Wesley

•OpenGL Reference Manual, 3rd EditionOpenGL Architecture Review Board, Addison Wesley

•OpenGL Specification, Version 1.4OpenGL Architecture Review Board

Performance OpenGL 90

For More InformationFor More InformationFor More InformationFor More Information

•SIGGRAPH 2002 Course # 3 - Developing Efficient Graphics Software

•SIGGRAPH 2000 Course # 32 - Advanced Graphics Programming Techniques Using OpenGL

•SIGGRAPH 2002 Course # 3 - Developing Efficient Graphics Software

•SIGGRAPH 2000 Course # 32 - Advanced Graphics Programming Techniques Using OpenGL

Performance OpenGL 91

AcknowledgementsAcknowledgementsAcknowledgementsAcknowledgements

A Big Thank You to …A Big Thank You to …•Peter Shaheen for a number of the

benchmark programs

•David Shirley for Case Study application

A Big Thank You to …A Big Thank You to …•Peter Shaheen for a number of the

benchmark programs

•David Shirley for Case Study application