Upload
annabelle-wilkerson
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
1
KIPA Game Engine Seminars
Jonathan Blow
Seoul, Korea
December 12, 2002
Day 15
2
Bit Tricks
• Generating Bit Masks
• Is some number a power of two?
• Avoiding ‘if’ statements (branch prediction)
• Floating-point absolute value
• Floating-point compare
• Floating-point log2
3
Generating Bit Masks
• Suppose we want to mask the low n bits of a machine word
• We can generate that with a loop
• Show summation equation for the loop
• Identity that lets us do something faster
4
Is some number a power of two?
• The power-of-two will be a single bit somewhere in the middle of the word
• The power-of-two minus one will be a bit mask like the ones we just looked at
• ANDing them together will produce 0
5
Counting the numberof set bits in a machine word
• Slow loop version
• “Trick” O(num set bits) version
• Discussion of tree version
6
Pentium 4 “fireball”
• A 16-bit integer unit at the core of the chip that runs at very high clock speeds
• 32-bit integer operations are pipelined through the fireball as multi-stage 16-bit operations
• Pipeline is organized for bits to flow from bottom to top of the word (as with addition and subtraction)
• Right-shifts require a dependency that goes in the opposite direction (slower!)
7
“How many bits does it take to store this range of values?”
• Application: network or file i/o
• Want ceil(log2(n_max)) assuming the values go from 0 to n_max
• Slow floating-point versions
• Fast bit-extraction versions
8
Floating-Point log2
• Show slow version
• Fast version utilizing the IEEE-754 format
9
Fast absolute value
• Utilizing IEEE-754 floating point format
10
Fast floating-point compare
• Description of how x86 machines compare floating point numbers– Get at least one of them on the stack– Perform ‘fcomp’ instruction– Load the floating point control word– Bit-mask it to see if the desired field is set
11
Decision-making without branching
• (And without writing in assembly language, to use instructions like CMOV)
• Build a mask based on whether some intermediate result is negative or not
• Use that to mask values and add them, or whatever you want– Examples
12
Collision Detection
• Speedbox and Schnitzel as alternatives to the “prevent tunneling” raycast
13
Collision Detection
• Don’t forget to optimize mainly for the expected case!– To miss a lot, or to hit a lot?
• Example of Shock Force and the “early hit test”– We expect to miss usually!– So the early hit test was not so effective
14
Collision detection
• More Shock Force examples– Hierarchy of tests: bounding sphere, OBB,
simple plane divide, BSP “hard case”
15
Profiling• Motivation
– You can’t optimize unless you profile. For some reason some people think they can… they’re wrong.
• Demo of sample app
• Goals:– Know where the overall CPU is being spent
• May depend on which kind of behavior is happening!
– Know which routines are stable and which ones are not
16
Profiling
• Example of getting the current time on Windows– At different accuracy levels
• Description of how this is slow, and why– Too slow to call very often in code!
17
Profiling (2)
• Using the rdtsc instruction
• Converting this to realtime units by calling QueryPerformanceCounter once per frame
18
Profiling (3)
• Define macros that put rdtsc calls into preambles and postambles for functions
• Measure and categorize CPU time this way
• Measure “self time” and “hierarchical time”
• Code review of macros / constructors
19
Problem with rdtsc
• There’s this SpeedStep thing on Intel laptops– Change the CPU’s clock speed based on
performance / temperature demands– Does not adjust rdtsc to compensate
• May spread beyond laptops in the future– Power consumption of CPUs is becoming an
important concern for businesses
20
We can detect if rdtsc is screwing up profiling data
• But we can’t fix the profiling data
• Solution: just draw a big warning on the screen
21
Division of Profiler
• Low-Level Profiler
• High-Level Profiler
22
Walkthrough of first demo app
• How it uses the macros
• How it collects and draws the profiling data
23
Measuring varianceof profiling data
• To figure out how stable each function is
• Draw which functions are “hot” in the realtime display
24
Behaviors
• We would like some better analysis of what the different behaviors are for our program
• Just “eyeing” the results is not very scientific
• Examples of different behaviors– Fill rate limited, AI limited, etc
25
Batch Profiling vs Interactive Profiling
• Batch profiling averages a bunch of data together over a session– Maybe it provides a way to peek at individual
samples but the processing is never very convenient
• Interactive profiling is about seeing results as soon as they happen– But interactive profilers are usually hacked
together• What if we made a good one?
26
Want to detect and analyzespecific behaviors
• But without preconceived ideas of what they might be
• Treat incoming frames of profiling data as vectors, and cluster them
• Description of k-means clustering
27
Clustering algorithms tend tobe pretty slow
• And they require batch data to process– k-means needs random access to the input!
• Online k-means– Faster, non-batch. But quality?
28
Self-Organizing Map
• “Kohonen Self-Organizing Map”
• Description of the algorithm
• Much like online k-means– But with coherence in a separate space
29
Demo of SOM-enabledProfiling Tool
• Visualizations are still early
• Hopefully they will mature into something truly useful (people in other visualization fields like SOMs, so hopes are high)
30
Discussions of changes made to SOM to support online clustering