Sftware Engineering Essentials - Randy Gaul · 2. Any C++ book, preferably multiple of them since they are all terrible (in my opinion) 3. See 1. 4. See 2. 5. See 1. 6. Wikipedia

Software Engineering Essentials 1

Essentials of Software Engineering With a Game Programming Focus

2 Preface

Version 1.0

Table of Contents

Preface 3

To The Reader 3 Prerequisites 4

Memory and Hardware 4 Simplified CPU Architecture and Cache Awareness 6 Prefetching 9 Additional Topics of Interesting 9

Data Structures 10 Additional References 10

Algorithms 10 Sorting 10

Sorting: Integers 11 Sorting: Strings 11 Sorting: Auxiliary Data 12

Searching 14 Recursion 14 BFS and DFS 14 Hashing 15 Root Finding 16

Non-linear Min/Maximization 17 No Derivative 19

Linear Algebra 20 Linear Transformations 21 Affine Transformations 22 Math Library 24 Dot Product 25 Planes and Lines 27

Lines 28 Interpolation 29

Bilinear Interpolation 30

Multi-Threading 30

Language Design 31 Lexer – String Matching 31 Parsing 32 Code Generation 33

Type Reflection 34

Software Design and Architecture 35 File Dependencies and Messaging 36 API Design 37


Example Layer Problem 38

Game Architecture 39 Compilation 39 Iteration 39

Memory Storage 40 Run-time Object Model 41 Specific Solutions 41

Preface Learning software engineering takes time, patience, and never has to end. To become a solid software engineer (at least to the author’s personal standards) involves understanding one’s own limitations as a developer followed by a continuous grind to press those professional boundaries to the next level. This implies hard work as another ingredient to the time and patience recipe. The goal of this PDF is to provide readers with a trustworthy compilation of information essential to becoming a solid software engineer. Many topics referenced in this PDF are well presented by other authors; for brevity’s sake external works are often cited instead of directly presenting such topics with new materials. This PDF takes a focus on game programming, which means the language of choice is C/C++. This does not mean readers interested in other applications of computer science will not benefit from the read. On the contraire the author views problems in the game programming domain rich, plentiful, and ideal for continuous professional challenge. Anecdotally: Elon Musk has been seen recruiting engineers from the game industry for his SpaceX endeavors due to the challenges game programmers are used to facing (circa 2014).

To The Reader As a writer the author prefers to start each section with a high level overview. The overview serves to connect various concepts together in order to allow readers to understand the relationships of various topics. Readers interested only in direct application or technical knowledge may feel free to skip the overview of each section. This PDF was created upon commission. Special thanks to the commissioner, who wishes to remain anonymous, for their generous support This PDF will contain many opinionated statements without any empirical evidence. Readers will just have to deal with it. The value of this PDF is in the author’s professional merit; read: YMMV. At the time this documented was created the author had no academic degree of any kind. Readers are urged to take this document with a grain of salt and healthy dose of skepticism.

4 Memory and Hardware

Prerequisites Most of the material in this document aims to provide practical and intuitive knowledge. This can be contrasted to the references to external materials this document makes; fundamentals are largely assumed to be already familiar to the reader, such as:

1. Structs, functions, arrays, pointers, basic C++ template usage 2. Some familiarity with the C++ standard library 3. Compiling, linking 4. Classes, member functions, inheritance, virtual dispatch 5. C stack and heap 6. Processes, threads, some familiarity with operating system concepts 7. Familiarity with trigonometry and basic differential calculus

Littered throughout the document are references to external materials and sources. If ever the theory presented here becomes too abstract or doesn’t make sense, perhaps a visit to another source will be presented in a manner more suited to the reader. Here are some recommend resources for the above topics:

1. C Programming, a Modern Approach 2nd Ed. By K. N. King 2. Any C++ book, preferably multiple of them since they are all terrible (in my opinion) 3. See 1. 4. See 2. 5. See 1. 6. Wikipedia searches, google searches. These topics are fairly simple. Perhaps try to find

an online course freely viewable to the public and read through lecture notes + homeworks.

7. See 6.

Memory and Hardware Code runs on hardware and hardware contains physical limitations. The limitations imposed by hardware give rise to software engineering creativity. There are two major forms of software engineering creativity: abstraction and algorithms.

Abstraction Suppression of details by defining an alternative level of system interaction. In this case the system is meant as whatever problem solving tools the engineer has at his disposal. As an example assembly languages are abstracted by the C programming language. The purpose of C is to generate assembly. Since assembly languages are often specific to the hardware of which they operate


upon an engineer may be required to learn many various forms of assembly to write code on various pieces of hardware.

If a hardware vendor implements a C compiler, suddenly engineers writing C code can use the vendor’s compiler to generate hardware-specific assembly. In this way C abstracts the specific details of the assembly instructions behind a different layer of language constructs.

Algorithms For the purposes of this PDF algorithms are the steps, code, or rules that an engineer defines in order to solve problems. For example, C compilers often use algorithms for reading source code files, turning individual keywords into tokens. Algorithms are then used to turn tokens into data structures (like trees). These structures are then converted, via algorithms, into assembly instructions. In general algorithms are about reading data, transforming it, and outputting some new data.

Many software engineers heavily focus on abstractions and as such spend little time actually writing useful code. Abstraction should be treated as a dirty word. When a new programmer first learns of abstractions an illusion is pulled over their naïve view of power. The illusion is that if a little more time is spent making code more generic, with another layer of indirection, or more reusable it becomes 10x more powerful. Abstraction is an illusion of power. The “power” that abstraction seems to provide comes from neglect of the details an abstraction hides. If these details are not taken into consideration when they are important then the abstraction has failed. In practice failure rate of abstraction is immensely high and as such all code should be viewed through the lens of skepticism.

http://xkcd.com/676/


Skepticism of code disallows any code to exist that cannot justify its own existence through necessity. When applied heavily skepticism of code can prevent much over-engineering and harmful abstractions. There’s an old saying that “dumb code” is smarter, meaning the simplest code that gets the job done is the smartest. The engineer brain constantly attempts to solve very difficult problems, so difficult that an engineer’s focus must be zeroed in on the algorithms used to solve problems. Distractions are costly, and abstractions muddy many (often important) details of the code. The more time an engineer must sit and think “what is this code really doing”, the more likely it is that abstractions are completely getting in the way of productivity. In short, abstraction makes difficult problems impossible. If an algorithm respects the hardware it runs upon a curious phenomenon occurs: often times the code becomes simpler. How can this be? Respecting modern hardware often comes down respecting the CPU cache. This translates to using contiguous arrays, which leads to straight-forward for loops. If memory is stored in arrays engineers can reason about memory consumption and worst-case memory footprints. This can lead to pre-allocation of memory for later-to-run algorithms, resulting in overall simpler code (as opposed to a myriad of dynamically allocated and inter-coupled “objects”). The reality of hardware is that hardware likes arrays. Modern CPUs will prefetch memory automatically when code runs over arrays linearly. The CPU likes small amounts of data that can fit into its tiny cache.

Simplified CPU Architecture and Cache Awareness A CPU has cores. Each core can run a thread (and some cores on certain CPUs can run multiple threads simultaneously). See the Multi-Threading section for more details about threads. Assembly instructions are fed to the CPU to be physically executed. Instructions are executed by means of registers. A CPU register holds a piece of memory, commonly 32 or 64 bits of data. Modern SIMD registers store 128 bits of data. Each register can be used by the CPU to execute an instruction. There is a limited amount of registers within a CPU, as they are very expensive to construct. If possible CPUs would contain unlimited registers for infinitely fast execution. To store more data than what exists within the registers the main memory (RAM) is used. RAM stands for random access memory, which means the RAM can be accessed more or less like a giant array. When the CPU puts data into a register, that data came from the RAM. Pulling data in from memory into a register is a very slow operation, often hundreds or thousands of times slower than executing simple instructions. Modern CPUs employ memory caches. Conceptually a CPU memory cache acts like a “closer to the CPU”, but smaller, piece of


RAM. Fetching memory from the cache is often an order of magnitude or so faster than fetching from main memory. The CPU loads memory into the cache via cache lines. A cache line is a small chunk of memory, often 64 bytes in size. The entire cache consists of cache lines. Whenever a program requires access to any data in main memory an entire cache line will be pulled into the cache. Cache lines are aligned to memory boundaries according to their size. This means that if the program executing requires a single byte of data from main memory, and this data does not already exist in the cache, the CPU will pull in the cache line on the 64-byte boundary the single byte lays upon. When cache lines are pulled into cache, old cache lines are evicted from memory. Which line is evicted depends on the implementation of the CPU, but will likely be the least recently used cache line (or perhaps decided by some heuristic). The reader is referred to Naughty Dog’s “Dogged Determination” presentation for more CPU information.

The implication here is that whenever a cache line is read it is up to the programmer to try to use as much of that cache line as possible. Even though we might have 8 gigabytes of RAM, if we aren’t running in the cache the CPU will be sitting there waiting. Even if a single byte is read from main memory and entire cache line will be fetched. Reading a single byte from a random


location in main memory is about the worst possible way to use memory. Try to use every single byte of every single cache line pulled in from memory. This tends toward the idea of using very compact and concise data structures. If a data structure is packed together in memory it can be operated upon by the CPU very quickly once it arrives to the CPU’s cache. The cache isn’t very big. Here’s a nice slide by Scott Meyers on the topic:

32KB of L1 data cache is tiny. You don’t even get to use all of it as the operating system does need to do stuff too! Here are some additional references:

CPU Cache and Why You care – (video) – Scott Meyers

What Every Programmer Should Know About Memory – (pdf) – Ulrich Drepper et. al.

Managing Data Relationship – (blog post) – Noel Llopis

http://vimeo.com/97337258

http://people.freebsd.org/~lstewart/articles/cpumemory.pdf

http://gamesfromwithin.com/managing-data-relationships


Prefetching Prefetching exists to try to hide the latency of fetching memory and placing it into the CPU’s cache. A prefetch is when a cache line is preemptively fetched and placed into the cache, such that when the memory is actually requested a cache hit occurs. Hardware can detect patterns in real-memory accesses, but it can only detect pretty simple patterns like array traversals. Hardware is made in such a way that it can detect iterating over arrays forwards, backwards and with variable (but constant) element step size. It can also do this for all hardware threads simultaneously. However, if you’re not looping over an array you can’t count on any intelligent prefetching. It will take two or more cache misses in a recognizable pattern to start automatic prefetching. Usually compilers provide a specific keyword to hint to the run-time to grab a specific cache line from somewhere in memory. This can be used by programmers to ease out a final bit of performance, given a proper implementation to prefetch for.

Additional Topics of Interesting

Branch Prediction

Register Pressure

Custom allocators

False sharing

Pointer aliasing

CPU pipelining

Superscalar and SIMD architecture

http://www.futurechips.org/chip-design-for-all/prefetching.html

http://www.randygaul.net/2014/07/30/memory-management/

10 Data Structures

Data Structures Data structures are incredibly important. Learn them all, implement them all. Ideally be able to implement each one on-demand (not necessarily super efficiently, but good enough to solve an interview problem). Having a working knowledge of most data structures allows an engineer to understand problems with more intellectual vocabulary. There are a plethora of freely available resources on each of these topics. I encourage readers to go forth and google search until their souls hunger no longer each of these topics: Array Linked List (link 1, link 2) Trees BST Trie Stack Queue Heap Hash Table Graph

Additional References

“C Programming – A Modern Approach” 2nd Ed. By K. N. King

Wikipedia

Bitsquid “Foundation” library

Lua the programming language (advanced hash table implementation)

Algorithms

Sorting In real life there are usually only a couple ways sorting goes down: std::sort is used or some other library function is used. In very rare cases a custom sorting routine can be employed, though honestly this probably will not be necessary 95% of the time. The downside to sorting is that for some reason many big tech firms have asked me specifically a bunch of annoying sorting algorithms. A solid software engineer would be out of his mind to go off implementing complicated sorting algorithms from scratch when battle-tested, bug free, efficient library implementations are readily available. Not only is writing from-scratch code risky in terms of bugs and efficiency, it’s also an expensive time sink. Despite these arguments I am consistently asked about sorting algorithms from companies like Microsoft, SpaceX, Google, etc.

http://www.randygaul.net/2015/08/05/freelist-concept/

http://www.randygaul.net/2015/02/01/circular-linked-lists-and-branching/


I’m not really sure what the fascination is here, especially when sorting algorithm questions are bundled with big O notation questions, but the fact is since these questions are asked by interviewers the ability to answer them is valuable.

Sorting: Integers Integer sorting is actually an extremely common task. Integers can be used to index into arrays or represent important information, and as such integers are very commonly sorted. Luckily sorting integers is one of the simplest (if not the simplest) type of memory that can be sorted. If in production code go ahead and include std::sort and sort your array of integers. For interview questions I would recommend being able to produce, on the spot:

2-3 N^2 sorting algorithms (like bubble sort, selection sort, etc.)

Quick Sort or Merge Sort

Counting Sort (and Radix Sort) These sorting algorithms are not too difficult to understand and subsequently memorize, however it can be a huge annoying pain to do so. Stick with it, grind through the slow process of

practicing these algorithms and your interviews will likely pay off✝.

Sorting: Strings String sorting is difficult to do in an efficient manner. Strings are defined as arrays of integers, and depending on the format each individual character can be various lengths (like in UTF-8 encoding). Each individual string can also be variable length. The problem becomes sorting an array of arrays of integers. If the strings are stored in nul-terminator format then each string’s length may be unknown until computed. Unfortunately the C++ std library is rather ill-equipped with a good mechanism for high purpose string sorting. std::string is capable of performing a silly amount of heap allocations and deallocations during std::sorts swap loops. Pairing std::string and std::sort may be a good general-purpose sorting routine, however when a more high-performance solution is required alternative solutions must be used.

✝ Make sure to research any companies you are applying to before-hand. Smaller companies are especially likely to ask highly a highly specific set of questions. For example I interviewed at multiple small game studios (10-30 employees) where not a single sorting algorithm question was asked. Instead these smaller companies focused on asking questions relevant to their studio culture, or founder’s style. All in all smaller companies seem to respond keenly when I portrayed myself as a product-focused candidate, interesting in creating a great product.

12 Algorithms

Often times the best solution is to stop using strings and rethink the problem. Is sorting strings really a necessity? Can the problem be transformed into a similar format requiring the sorting of integers instead? If the answers are no some potential solutions may be:

Insert strings into a trie. A depth-first search will yield a lexicographically correct sorting of the strings.

Implement a string radix/bucket sort. However it must be stressed that most problems likely do not require highly efficient string sorting algorithms. Engineers often just have a certain itch to implement complicated solutions when much easier ones suffice, resulting in wasted time of implementation, high risk of bugs, and in the end poor efficiency will likely arise either way. This is just the nature of complex algorithms! Often times a simpler and specific approach, tailored to the exact problem to solve, will end up highly efficient.

Sorting: Auxiliary Data Often times data is sorted in some type of { key, value } format. When keys must be sorted values may need to be sorted as well. Sorting key value tuples can be fairly tricky in C and C++ since C code must explicitly state, at time compile time, what kind of data will flow through the code. Some sort of code-generation mechanism can allow an engineer to use a sorting routine which can be generated on the spot for an instance of a type of data. C++ templates are the obvious choice and work quite well. Since std::sort is a template and can take an optional predicate efficient sorts can be generated as the predicate will likely be in-lined during code generation phase of compilation.


Using std::sort in this manner will swap around both the keys and values in memory together, assuming each are stored in a contiguous struct. Here’s an example:

struct Pair

{

Key key;

Value value;

};

bool SortPredicate( const Pair& A, const Pair& B )

{

return Key_A_LessThan_B( A.key, B.key );

}

Pair pairs[ N ];

std::sort( pairs, pairs + N, SortPredicate ); An alternative method can be to construct an array of indices to the array of { key, value } tuples. The predicate can use each index to lookup a { key, value } pair, like so:

bool SortPredicate( const& int iA, const& int iB )

{

Pair& A = Pair::instances[ iA ];

Pair& B = Pair::instances[ iB ];

return PairCompare( A, B );

} It should be understood that when using C++ templates often C++-styled solutions are necessary. This sorting predicate is a great example. In order to get std::sort to sort the array of indices based on the Pair structs they represent a static member variable was employed to allow the predicate to access the array of Pair instances. This solution reeks of C++-ism – not that this is necessarily a bad thing! It just ought to be understood. If a different code generation technique were used perhaps code can be generated that looks much more like so:

void QuickSort( int* indices, Pair* pairs, int pairCount )

{

// ...

if ( SortPredicate( A, B ) )

Swap( indices[ 0 ], indices[ 1 ] );

// ...

} In the above example the predicate is inlined into the sorting routine, and the routine has an understanding of the relationship between the indices array and the pairs array. Since this solution intimately understands the specific data it is sorting a simple solution arises, one without the need for static member variables, references, or any other C++-isms. Implementing

14 Searching

a code generator, or finding a suitable one, can be a time-consuming task. In the end the style of code-gen used will be best chosen based on the project at hand, along with the problems the project demand be solved.

Searching Most of the time searching involves the traversal of data structures. Depending on the project searching mathematic functions can also come up. If an engineer has solid knowledge of data structures, the problem at hand, and the relationships between data, coming up with efficient-enough search algorithms will likely be only a matter of time.

Recursion Recursive searching operations are usually good for on-the-fly debugging routines. For example say a tree structure is used to hierarchically organize some data. In games programming common use cases are for graphics scene organization, and physics scene organization. Another example could be to represent file modifications as a tree of file-sections. This would allow for colliding files together to search for merge conflicts for source control software. The reason recursion is stated as useful for debugging is: recursive search techniques are often fairly quick and easy to get up and running in a robust manner. Each recursive function call uses the stack to allocate space to push old registers, space for local variables, etc. This alleviates the need for the user to manage any memory manually. However, moving data around on the stack can consume quite a bit of memory during recursive searching, and in many cases can lead to a stability risk as stack space can be exhausted. The alternative is to use manual memory handling techniques, such as pre-allocated memory and stack-based data structures. Other times recursion is useful is for creating small parsing tools, or doing other kinds of data-preprocessing. Preprocessing steps typically do not occur when clients use shipped software and can be run off-line, and as such don’t usually need to be too performant. Very often interviewers ask recursion-related programming problems, so knowing how to search through a tree, or graph, are highly valuable skills. Common approaches to recursive searching are through breadth-first searches, and depth-first searches.

BFS and DFS BFS stands for breadth-first search, while DFS stands for depth-first search. Either can be implemented recursively or iteratively. There are a million resources on these topics, most of which explain these concepts exhaustively. A quick google search will yield all the information anyone could ever hope for on these topics.


My favorite of the searches mentioned here would be the iterative DFS. More often than not the DFS will behave in a more cache friendly way than compared to the BFS due to the orders of traversal and layouts of the traversed data structures. Here’s some pseudo code (which depicts code I have written in production-ready projects):

#define Push( node ) stack[ stack_pointer++ ] = node

#define Pop( ) stack[ --stack_pointer ]

Node* DFS( Node* graph )

{

const int N = 256;

int stack_pointer = 0;

Node* stack[ N ];

Push( graph );

while ( stack_pointer )

{

Node* seed = Pop( );

if ( Done( seed ) )

{

ClearState( graph );

return seed;

}

seed->visited = 1;

Edge* edge = seed->edge_list;

while ( edge )

{

Node* B = edge->B;

if ( !B->visited )

Push( B );

}

}

}

Hashing From Wikipedia:

“A hash function is any function that can be used to map data of arbitrary size to data of fixed size.”

The way I interpret this is: transform any data to an integer. This is useful since integers can index into arrays. Hashing strings is a very common operation, and so I’d like to recommend two of my favorite string hashing algorithms: dbj2 and FNV-1a. These two hashing algorithms

http://www.cse.yorku.ca/~oz/hash.html

http://www.isthe.com/chongo/tech/comp/fnv/

16 Searching

are straightforward, don’t require much code to implement, and can easily be inserted into pre-existing projects easily. They are also good enough for most applications. However most large code bases I’ve seen employ a CRC-32 hashing algorithm of some kind, for example the Unreal Engine can be seen to contain an implementation within crc.h where the MemCrc32 function seems to be used quite often throughout Unreal.

Root Finding Sometimes projects require simulations of some kind that involve finding the roots of two dimensional functions of the form f( x ) = y. These roots are typically defined as the points of the function that are zero valued (i.e. intersect the y axis along y = 0). Some examples of situations where this can be useful are:

Vehicle dynamics simulation

Pre-calculating shader parameters

Continuous collision detection

Statistics More often than not functions in-question are linear, or quadratic, or simple enough that the Newton-Raphson method will work well. Specifically if assumptions made in the Proof of Quadratic Convergence are met then Newton’s method will converge. Newton’s method works by using a tangent line approximation of a curve, intersecting the tangent with the y-axis at 0, and using this point as the starting point of the next iteration. Wikipedia has a wonderful animation showing the algorithm converge (shown on the right of the previous link). If the derivative is not known or hard to calculate the secant method can be used instead. A secant is a line that intersects two points of a curve. If these two points are close enough together they can themselves approximate the derivative of the function in question.

https://github.com/EpicGames

http://www.math.mtu.edu/~msgocken/ma5630spring2003/lectures/newton/newton/node4.html

http://www.math.mtu.edu/~msgocken/ma5630spring2003/lectures/newton/newton/node4.html

https://en.wikipedia.org/wiki/Linear_approximation

https://en.wikipedia.org/wiki/Newton%27s_method#Description

https://en.wikipedia.org/wiki/Secant_method


Non-linear Min/Maximization If one is lucky enough to be dealing with a non-linear function that has an easy to compute derivative finding minima/maxima can make use of searching for when the derivative function is equal to zero. In this case a root finding algorithm can be used to search for a local minima/maxima be inspecting the derivative and treating df/dx( x ) = 0 as the root. However, in order to do so the minima/maxima must first be bracketed.

Imagine our non-linear function looks like the curve above. To bracket the minimum means to find points 1 and 4 (known as sample points) such that it is known that at least one minimum is between each sample point. Bracketing algorithms are conceptually simple though care must be taken during implementation! Imagine we start with sample point 1, and move along the derivative to sample point 2. We now know our derivative increased, but is still negative; we have traveled downhill. Onward to sample point 3, derivative increased, but still negative; downhill again. Finally we arrive at sample point 4, of which the derivative is positive. It is now known that at least one minimum exists somewhere between sample points 3 and 4, and in most cases there’s likely only a single minimum. Once bracketed the sample points 3 and 4 can be handed off to a root-finding algorithm to iteratively refine the derivative towards 0. Once “close enough”, as defined by some error tolerance (typically a hard-coded value) the minimum is found.

18 Searching

The same kind of technique can be applied to finding maxima. However, how can it be known that the current minimum or maximum is the global minimum, or the global maximum? Well, the short answer is: it’s not possible to know without searching the entire non-linear space (unless special knowledge is known a-priori about the function).

Say our function looks like the above cross-section. This function is quite non-linear and contains various minima. Typically the previous techniques can be used to find any particular minimum given a starting point, however it can be difficult to determine if a minimum is the global minimum. One technique for searching for the global minimum is called simulated annealing. Annealing is a term from metallurgy. The idea is that if a metal is molten and allowed to cool down very quickly all of the molecules or atoms will be aligned in random orientations resulting in a brittle metal. If the metal cools slower the atoms have time to align themselves in a more orderly fashion, thus relieving internal stresses and resulting in less brittle metal. We can take the concept of annealing and apply it to sample points. Imagine finding a minimum like the previously mentioned bracket + search technique. This is analogue to tossing a ball randomly into the cross-section. The ball will land somewhere, probably in a local minimum (as opposed to the global minimum). Annealing is like violently shaking the cross-section so the ball bounces out of the first minimum, and lands somewhere else. As this continues we shake less and less violently, until the ball (likely) settles into the global minimum. Simulated annealing can make use of a virtual “temperature”, which would represent a radius of which sample points can be taken from. The temperature can then be dialed down over multiple searches, while keeping track of the best minimum. Once cooled completely the annealing process can be repeated as long as necessary to feel comfortable with the best found minimum, at which is then asserted as the global minimum (on good faith).


For more information readers are pointed to Timothy Master’s “Practical Neural Network Recipes” for more information on non-linear optimization, and annealing.

No Derivative If one is unlucky enough to not have the luxury of a derivative function at hand secants can be used to approximate the derivative. Care must be taken during implementation to bracket a minimum without making any false assumptions! Often times this kind of minimization/maximization is only useful if a lot of information about the function at hand is available. For example let us inspect Pacejka’s magic formula, which is useful for car tire simulation. The function takes in the slip ratio and output an expected force exerted onto the ground by the

tire. Here is a graph of a typical (simplified)✝ Pacejka function:

https://en.wikipedia.org/wiki/Hans_B._Pacejka

✝ Pacejka’s magic formula, for professional car simulation purposes, can take very many coefficient parameters. For many game-ish simulations the “simplified” model can be used which takes 4 tunable coefficients parameters (as shown in the Wikipedia image above).


https://en.wikipedia.org/wiki/Slip_ratio


20 Linear Algebra

This function is defined as:

F( x ) = D * sin( C * atan( B * x – E * ( B * x – atan( B * x ) ) ) ) If we assign b = B * x we arrive at:

F( b ) = D * sin( C * atan( b – E * ( b – atan( b ) ) ) ) It can easily be seen the minimum and maximum both reside at y = 1 and y = -1 respectively. This allows one to calculate an error from the minimum or maximum! If error can be known then the maximal points can be searched for. It is also easy to see there are only one maximum and one minimum, which makes for the initial bracketing algorithm to make various assumptions! Once bracketed the bracket interval can be iteratively refined via methods similar to bisection until convergence with respect to a tolerance is achieved.

Linear Algebra Games are made of 2D or 3D spaces. These spaces are defined by the mathematics of linear algebra. Linear algebra can provide an engineer with an intellectual vocabulary to think and reason about the space the game occupies. This ability is invaluable for defining how things move, interact with one another, and in general operate within a space. Readers are heavily urged to get the book ”Essential Mathematics” by Van Verth. This book covers all critical mathematics required for game development and is presented spectacularly! My writings here will not do the readers justice on their own, and should be thought of as supplements to a more well-rounded educational foundation.

http://www.essentialmath.com/book.htm


Linear Transformations A linear transformation is one that preserves arrangements of lines. For example, if line A and line B do not intersect and undergo a linear transformation, the results A’ and B’ will not intersect. Linear transformations in 3D can be represented as a 3x3 matrix. Often times these matrices can be used to transform points or vectors like so:

[ 𝐴 𝐵 𝐶𝐷 𝐸 𝐹𝐺 𝐻 𝐼

] ∗ { 𝑥𝑦𝑧

} = { 𝑥′𝑦′

𝑧′

}

If we define the matrix itself A and the vector as v we can rewrite the above equation as:

𝐴 ∗ 𝑣 = 𝑣′ If we have two matrices A and B, both of which are linear transformations, we can compose the matrix C such that:

𝐴 ∗ 𝐵 = 𝐶 ∴

(𝐴 ∗ 𝐵) ∗ 𝑣 = 𝑣′ = 𝐶 ∗ 𝑣 In words: C will transform v the same was as (A * B). Common linear transformation operations are (but not limited to): rotation, scaling, shear, flipping (or mirroring), etc. These properties can be used to compose trees of transformations in order to form a hierarchical animation model. In the case of animations matrices themselves can be thought of as bones in skeletal animation.

However linear transformations alone aren’t quite enough to implement full skeletal animation, or implement other interesting transformations.

https://en.wikipedia.org/wiki/Matrix_(mathematics)

https://en.wikipedia.org/wiki/Skeletal_animation

22 Linear Algebra

Affine Transformations Affine transformations can be used to linearly transform, followed by an affine transform. Affine transformations can change the positions of points through translation, whereas linear transformations preserve linear arrangements. An affine transformation in 3D has the form:

𝐴 ∗ 𝑣 + 𝑏 = 𝑣′ In the above equation A * v linearly transforms the vector (or point) v, and then adds a vector b to the result, thus resulting in a new v’ value (which is a point). If readers are confused by the difference between vectors and points a quick google search should remedy the confusion! In short: vectors are directions and points are locations. Knowing this one can, in a hand-wavy way, say: linear transformations modify vectors while affine transformations modify points. Most modern graphics software, and many older education graphics materials, use 4x4 matrices, or 4x4 matrix notation. We can view a 4x4 matrix as an affine transformation. Let A be a 3x3 matrix and b be a vector, we then have:

[ 𝐴 𝑏

{0 0 0} 1 ] = 𝑀

M is a 4x4 matrix, and can be used to transform 4-component vectors. In this 4-dimensional linear algebra notation 3D vectors are represented as:

𝑣 = {

𝑥𝑦𝑧0

}

While a point would have the number 1 in place of the 0. What is the reason for this strange notation? The answer is: a fundamental assertion of linear algebra is that a non-linear transformation in dimensions N is a linear transformation in dimensions N+1.


This is useful when trying to describe view matrices (such as a projection matrix), which are non-linear in 3D. Once lifted to the fourth dimension the problem can be treated in a linear manner, and once completed, can be then “projected” back down to the third dimension. This can be thought of as a qualitative description of homogenous coordinates. In regards to skeletal animation from the previous section:

https://en.wikipedia.org/wiki/3D_projection

24 Linear Algebra

Math Library Do not use templates in your math libraries. Templates will generate tons of code in every single translation unit that the compiler operates upon. Once all this code-gen is finished the linker has to painstakingly churn through the turmoil of duplicate symbols a bajillion times! Just write out your specific math functions as needed. Solve specific problems that the project actually demands. I recommend taking Erin Catto’s lead in his Box2D library and write a math library to model affine transformations like so (very similar code can be seen in other very popular physics engines, both professional and open-source):

struct Rotation

{

// Rows of matrix, aka orthonormal basis

Vec3 x;

Vec3 y;

Vec3 z;

};

struct Transform

{

// linear component (3x3 matrix)

// can optionally contain scaling

Rotation r;

// affine component

Vec3 p;

};

// performs and affine transformation

// A * v + b = v'

// where A = t, b = t.p

Vec3 Mul( Transform t, Vec3 v )

{

return Mul( t.r, v ) + t.p;

}

// "transposed", or inverse affine transformation

// v' = A^T * (v - b)

Vec3 MulT( Transform t, Vec3 v )

{

return MulT( t.r, v - b );

}

Here is good resource by Richard Mitton on constructing solid math libraries, which also takes into account SIMD (more on SIMD later) instructions.

http://box2d.org/

http://www.codersnotes.com/notes/maths-lib-2016/


Dot Product

The dot product comes from the law of cosines. Here’s the formula:

𝑐2 = 𝑎2 + 𝑏2– 2𝑎𝑏 ∗ 𝑐𝑜𝑠𝛾 (1)

This is just an equation that relates the cosine of an angle within a triangle to its various side

lengths a, b and c. The Wikipedia page (link above) does a nice job of explaining this.

Equation (1) can be rewritten as:

𝑐2– 𝑎2– 𝑏2 = −2𝑎𝑏 ∗ 𝑐𝑜𝑠𝛾 (2)

The right hand side equation (2) is interesting! Lets say that instead of writing the equation with

side lengths a, b and c, it is written with two vectors: u and v. The third side can be represented

as u – v. Re-writing equation (2) in vector notation yields:

|𝑢 − 𝑣|2– |𝑢|2– |𝑣|2 = −2|𝑢||𝑣| ∗ 𝑐𝑜𝑠𝛾 (3)

Which can be expressed in scalar form as:

(𝑢𝑥 − 𝑣𝑥)2 + (𝑢𝑦 − 𝑣𝑦)2

+ (𝑢𝑧 − 𝑣𝑧)2 − (𝑢𝑥2 + 𝑢𝑦

2 + 𝑢𝑧2) −

(𝑣𝑥2 + 𝑣𝑦

2 + 𝑣𝑧2) = −2|𝑢||𝑣| ∗ 𝑐𝑜𝑠𝛾

(4)

Crossing out some redundant terms, and getting rid of the -2 on each side of the equation, this

ugly equation can be turned into a much more approachable version:

𝑢𝑥𝑣𝑥 + 𝑢𝑦𝑣𝑦 + 𝑢𝑤𝑣𝑤 = |𝑢||𝑣| ∗ 𝑐𝑜𝑠𝛾 (5)

Equation (5) is the equation for the dot product. If both u and v are unit vectors then the

equation will simplify to:

𝑑𝑜𝑡( �̂�, 𝑣 ) = 𝑐𝑜𝑠𝛾 (6)

If u and v are not unit vectors equation (5) says that the dot product between both vectors is

equal to cos( γ ) that has been scaled by the lengths of u and v. This is a nice thing to know! For

example: the squared length of a vector is just itself dotted with itself.

http://en.wikipedia.org/wiki/Law_of_cosines

26 Linear Algebra

If u is a unit vector and v is not, then dot( u, v ) will return the distance in which v travels in

the u direction. Here’s a slide from a slideshow I created some time ago about this property:

This is useful for understanding the plane equation in three dimensions (or any other

dimension)✝:

𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 − 𝑑 = 0 (7)

The normal of a plane would be the vector: { a, b, c }. If this normal is a unit vector, then d

represents the distance to the plane from the origin. If the normal is not a unit vector then d is

scaled by the length of the normal.

To compute the distance of a point to this plane any point can be substituted into the plane

equation, assuming the normal of the plane equation is of unit length. This operation is

computing the distance along the normal a given point travels. The subtraction by d can be

viewed as “translating the plane to the origin” in order to convert the distance along the

normal, to a distance to the plane.

✝ More on planes and lines in the next section.


Planes and Lines The equation of a plane in 2D looks like (which can be thought of as 2D lines):

𝑎𝑥 + 𝑏𝑦 − 𝑐 = 0 In 3D we add on the z component and rename the c from 2D to d in 3D:

𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 − 𝑑 = 0 Hopefully something has jumped out at the readers… If not, here’s another slide I made some time ago:

Planes are a dot product!✝ Along with this – d term. { a, b, c } is called the plane normal. { x, y, z } is any point on the plane. If we put any numbers into { x, y, z }, and then subtract d from the dot product with the normal, we will end up with a point on the plane. In this way d can be thought of as representing distance of the plane to the origin. Since d = ax + by + dc, d is equal to the normal dotted with the inputs. By definition of the dot product, this means d is scaled by the length of the normal. If the normal is unit length d represents distance to the origin in normalized units. These facts allow us to piece together an algorithm to take any point and project it onto the surface of any plane (here’s another one of my old slides):

✝ (NSFW) https://www.youtube.com/watch?v=OoAlf0-U7EA

https://en.wikipedia.org/wiki/Plane_(geometry)

https://en.wikipedia.org/wiki/Normal_(geometry)

https://www.youtube.com/watch?v=OoAlf0-U7EA

28 Linear Algebra

The above slide is describing how to compute a vector (shown in red) that can be used to take P straight to the plane by a translation. If we subtract this vector from P, P is translated to the plane (first equation in the slide). If the reader has been diligently following along the reader ought to be able to construct an algorithm to rotate and translate a plane! Translating a plane can be done by adjusting d, and rotating a plane can be done by adjusting the normal. In this way the normal can be thought of as the “linear” component, and d can be thought of as the “affine” component. Finding the intersection point of two 2D planes involves taking two 2D plane equations, setting them equal to one another, and solving for the point of intersection. It’s a straight-forward two equations two unknown scenario, and can be solved in-code directly via algebraic substitution.

Lines In 3D lines commonly take on the vector form. Vector form is nice since a vector can be used to describe the line’s direction. A point in 3D space on the line (any point) can be used to fix the line’s direction to a location. In this way the line’s direction vector becomes bound. Let the aforementioned point be named P and the direction vector by name D. We then have:

𝐿 = 𝐷 ∗ 𝑡 + 𝑃 Where L is the line and t is a scalar parameter (or in other words, just a floating point number). If we multiply D with t we end up stretching, shrinking, or flipping D. If we add D * t with P, we slide P along the D direction. In this way the equation can be used to describe any point on the line L.

https://en.wikipedia.org/wiki/Line_(geometry)#Vector_equation

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=bound+vector

https://en.wikipedia.org/wiki/Scalar_(mathematics)


Interpolation Interpolation is documented all over the place! Some good terms to google search for are: LERP, bilinear-interpolation, SLERP, Bezier curves, splines, etc. Here’s an old post of mine describing splines, lerp, and cubic Bezier curves. Unfortunately I couldn’t find a good resource on linear interpolation (lerp) from a quick google search, so here goes a quick explanation: Recall from the earlier section Planes and Lines under the Lines section, vector form of a 3D line looks like:

𝐿( 𝑡 ) = 𝐷 ∗ 𝑡 + 𝑃 Where L is the line, D is a direction vector, P is a point fixed on L, and t is the scalar parameter. Now let’s say we have two points A and B instead of just one single point, and let’s restrict t to the (open) interval [0, 1]. We then have:

𝑃 = 𝐴 ∗ 𝑡 + 𝐵 ∗ (1 − 𝑡) This equation can be rearranged to form:

𝑃 = (𝐵 − 𝐴) ∗ 𝑡 + 𝐴 In the above equation we see the expression B – A, which results in a vector. This vector points from A to the direction of B (when we fix it to the point A via + operator), and is scaled by t. The t value determines how far along the interval of A to B the point P resides upon. The last two equations are called lerp, which stands for linear interpolation. Lerp lets an engineer define points between a range of two other points. Points are not the only type of data that can be lerp’d, scalars, integers, and pretty much whatever else the heart desires can be lerp’d. Lerp returns values between a range, which is a form of interpolation. Whereas extrapolation can return a value outside of a range. If we use lerp with a value of t outside the interval [0, 1] we would be performing an extrapolation. When I mentioned the concept of “tangent line approximation” in the root finding section, this approximation is a form of extrapolation.

http://www.randygaul.net/2015/05/20/interactive-cubic-bezier-splines/

30 Multi-Threading

Bilinear Interpolation Bilinear interpolation done by:

Lerp from A to B, call the result P

Lerp from C to D, call the result Q

Lerp from P to Q, this is the result of bilinear interpolation Bilinear interpolation is used in modern graphics cards to pull colors out of a rectangular texture. The idea is to lerp along the x axis of the texture, then the y axis. Then lerp between these two values to produce an intermediary color. Usually the lerp operations are performed on a 2x2 square of pixels. This is called bilinear filtering. Modern GPUs achieve blazing fast filtering speeds by doing each of these bilinear interpolations all in parallel to one another, of which are likely implemented in hardware directly. By understanding this one can implement a quick and dirty bilinear filtering scheme for pixel-art games in 2D. Say a game is setup to have a perfect 1:1 texel to pixel ratio. In a shader it can be quite easy to offset each pixel by { 0.5f, 0.5f } to trigger bilinear filtering across the entire screen, which will happen “for free” since the GPU will be doing bilinear filtering in the hardware itself. { 0.5f, 0 } and { 0. 0.5f } can be used to trigger filtering along a specific axis.

Multi-Threading In my highly opinionated opinion there are no good resources available anywhere on good ways to go about multi-threading. There just aren’t. Why? Who knows! Perhaps the dudes who are good at multi-threading are so dang tired after work every day they have no energy to write documents like this one. Or perhaps they get so tired and rich (rich because they are paid oodles of money for being good at something so ridiculously hard) they just retire and leave the rest of us in the dust. Jokes aside I firmly believe there’s only one good way to multi-thread code, and that’s through a job system. Forget locks, semaphores, turnstiles, condition variables, critical sections, atomics, or anything else related to multi-threading. The job pool (aka thread queue, or job system) is by far my preferred method (at least for now until someone comes up with something better). Rather than poor into the details, yet again, I will leave the readers with a link to a presentation on this topic that I gave in the past: link here. Implementation probably requires a semaphore, so here’s the best resource on semaphores.

https://en.wikipedia.org/wiki/Texture_filtering#Bilinear_filtering

https://en.wikipedia.org/wiki/Texel_(graphics)

http://www.randygaul.net/2014/09/24/multi-threading-best-practices-for-gamedev/

http://greenteapress.com/semaphores/downey08semaphores.pdf


Language Design I’ve found that language design has shaped my viewpoint on what good code is. I would define language design as the concepts/algorithms revolved around compiling code into machine language, as well as the design the language’s grammar. Von Neumann architecture is about taking machine instructions, feeding them to a processor which executes them and produces an output. In this way code (instructions) is just a type of data, freely modifiable. As long as the correct sequence of bits is fed to the CPU, the program will execute accordingly. Understanding this can allow an engineer to increase productivity by thinking about the code they are writing, and how to leverage its patterns. For example, an engineer could write a code parser that interprets code. Once achieved, the parser transform the code into another format (perhaps an abstract syntax tree), one of which shows useful or interesting information not readily available from the raw code format. In turn the parser output can then be transformed back into new code. This type of code generation can be used to fuel very implementation-heavy tasks that might normally take a lot of tedious and monotonous work on part of the engineer.

Lexer – String Matching A lexer is a program that performs lexical analysis, and may also be referred to as a tokenizer. The job of the lexer is to produce tokens. A token is a meaningful representation of a lexeme. A lexeme is just a piece of raw code, like a keyword or function name. Tokens are usually implemented as an integer, or enumeration. Therefore a good way to implement a lexer is to scan source code and output integral tokens that represent each chunk of meaningful code-data. Writing a lexer can range from very simple to very complicated depending on the features required for a given language. For C-style languages Sean T. Barret has implemented an absolutely wonderful lexer, perfect for use as a reference when constructing a production-ready lexer. Here is the source to Sean’s lexer.

There exist tools to generate lexers, such as the tool lex (which is commonly used with the parser generator yacc. However writing a custom lexer can be quite straight-forward and not too time consuming, if the language of choice is a C-style language.

For Sean’s implementation style the concept is to store a small lex_state struct that keeps track of an offset into the source file, along with a couple other small data members. A single function, perhaps called Lex_Next( lex_state* state ) is used to grab the next token from the source file.

https://en.wikipedia.org/wiki/Von_Neumann_architecture

https://github.com/nothings/stb/blob/master/stb_c_lexer.h

32 Language Design

Parsing Parsing is a topic people typically tend to get very passionate about, and so I won’t say too much here – there are infinite resources available at the tip of a google search. I will say my preferred method if parsing is with a recursive descent parser. In short a recursive descent parser will look at a single token at a time, along with the “next” token coming directly after the current token. The language being parsed should a-priori be designed such that at any given moment the meaning of the two tokens is enough to deduce exactly where in the language grammar the parser resides. A typical hand-written recursive descent parser will be very efficient (likely much more so than a generated parser), and can be extremely fun (and educational) to implement. I recommend staying away from backtracking under the argument that it just isn’t necessary, along with any other kind of parsing scheme. Again, these are just my own personal conclusions as I don’t see more complicated or exotic grammars any better or necessary compared to a straight-forward language design that can be parsed with an LL(1) recursive descent parser. One of my favorite resources of what is, in my opinion, a pretty good parser to study is the one lua has equipped itself with. Here is a paper describing the implementation of Lua 5.0. Lua’s source code is freely viewable online. One exceptionally practical implementation of lexing and parsing is within the Handmade Hero video series. Search for the videos about meta-programming. If anyone wants more reading on implementing compilers I would suggest “Compilers” by Aho et. al. I highly recommend getting the older edition from 1988 that looks like this image on the right. The older edition contains C code (unlike the newer ones with verbose Java), and I find it overall much easier to read and more practical. I’ve read maybe 6 or 7 other compiler books and honestly none of the others even come close! One word of practicality, if implementing a custom parser for a C-style language I’ve found that parsing expressions to be the most difficult thing about it. I’ve written my experiences on this topic here.

https://en.wikipedia.org/wiki/Context-free_grammar

https://en.wikipedia.org/wiki/Recursive_descent_parser

https://www.lua.org/doc/jucs05.pdf

https://handmadehero.org/

http://www.randygaul.net/2015/06/15/parsing-c-style-expressions/

http://www.randygaul.net/2015/06/15/parsing-c-style-expressions/


Code Generation Once card is parsed the real juicy information can be known. Information about types, the form of the code, and pretty much anything else of interest is readily available. The matter of how to store this information is not a very easy matter to decide, however. Should the parser output new code directly? Should an abstract syntax tree be formed? Should some kind of symbol table be present? What data structures should be used to represent the code? How should these data structures be transformed? These are all open questions that I myself do not have the answers to. However I can share my own practical experiences in hopes someone else can figure out something more on this topic than myself.

34 Language Design

Type Reflection Without a doubt type reflection is a must-have for any large project. The ability to for code to intimately understand the data it operates upon allows for automation of very tedious and repetitive tasks, the big one being serialization. Unfortunately type reflection isn’t really a thing in C or C++ and we are all filled with hatred and lament at this fact. Ideally it would be invaluable to be able to write this in C:

struct Player

{

float x;

float y;

};

member_t* m = Player.members;

for ( int i = 0; i < countof( m ); ++i ) { /* ... */ }

The reason being is that writing a general purpose serializer (albeit a simple one without many features) could be as easy as:

void Serialize( void* memory, type_t* t, FILE* fp )

{

if ( t->members )

{

member_t* m = t->members;

for ( int i = 0; i < countof( m ); ++i )

{

member_t* m0 = m + i;

Serialize( offset( memory, m0 ), m0->type, fp );

}

}

else

{

switch ( t->typeid )

{

case _INT_T:

fprintf( fp, "%d", memory );

/* other cases ... */

}

}

}

https://en.wikipedia.org/wiki/Serialization

http://www.codersnotes.com/notes/cpp-rant-2/


Where the member_t and type_t could look something like this:

struct type_t

{

char* name;

int size;

member_t* members;

};

struct member_t

{

type_t* type;

char* name;

int offset;

};

Equipped with a lexer and parser an engineer could quite easily generate the above code definitions and instantiate various type_t and member_t instances for all the data types present within the code. New types of data can even be defined and described at run-time! Converting between different versions of data can also make sense. An algorithm to do so would loop over the member_t’s of version A (older) and mutate them into an instance of version B’s members. I’ve seen these tactics implemented at various game studios, often with custom technology, ranging from fairly simple to way over-engineered. Here is the best introduction to this topic I know of.

Software Design and Architecture File dependencies are often the bane of large projects. Each individual file will likely be compiled as a separate translation unit. This sucks as each file probably pulls in a ton of common code shared across all translation units by means of headers. stdio.h is a good example. All these headers do is setup symbols for the compiler in order to perform program linkage (and to verify program correctness). This means that a precompiled header can be used as a “hacky” solution. Common headers can be packaged up into a special translation unit and compiled once. All other translation units can just copy-paste the precompiled header unit at the top of their own unit, thus massively reducing compile times for all translation units. I call this a “hack” because it’s just a band-aid solution. The real problem is that compiling code in tons of different translation units is silly! Sure, having multiple files is great, but actually separating code into different translation units should probably not happen nearly as often as it does in the modern world.

https://www.youtube.com/watch?v=hWDZ3Yy-NMA

36 Software Design and Architecture

The argument in favor of many translation units is: we have cool “build tools” that let us compile only the translation units that were modified! Tools like make, or msbuild. Well to that I say “Guphaw!” Have fun with your fancy build tools when you require a dedicated salaried engineer to maintain them. I’ll happily stick with my own unity build for as long as possible. I can say I’ve seen this type of thing happen at one studio I’ve been to, and I sure loved it. I’ve also done it on my own personal projects which gave me great personal happiness. As for a much larger codebase, unfortunately I just don’t have the experience to back myself up here.

File Dependencies and Messaging Messaging is another one of those topics people seem to get very passionate about. Unfortunately whenever a bunch of programmers get passionate about something they often obfuscate and over-engineer unwieldy solutions, and messaging is one such case. Messaging is just about sending data from part of the program to another. This can be across various, across function calls, across time, or across a network (or across anything else you can think of). That’s it. The simplest form of messaging might be considered as a plain old C-style function call, which incurs some kind of jump assembly instruction, along with some stack allocation. In larger projects with many lines of code often times it’s best to make sure lots of data can be sent over a more opaque connection compared to that of a plain function call. This allows various bits of code to not require explicit data type declarations (like those seen in a header), and instead can focus more on the algorithm of sending the data, often as a stream of bits. Here is an article I wrote a while ago with a decent introduction to the concept of messaging. Much more discussion about how to implement messaging is a bit beyond the scope of this document as the author does not really take an interesting in things much more complicated than a function call.

http://stackoverflow.com/questions/543697/include-all-cpp-files-into-a-single-compilation-unit

http://www.randygaul.net/2015/10/19/single-file-libraries-bundle-pl-incbin-pl/

http://www.randygaul.net/2013/05/20/component-based-engine-design/


API Design API design is wicked difficult. I’m not too great at it myself! Please, readers, equip yourself with some of Casey Muratori’s API design knowledge! Casey may be one of the best people on the planet at designing effective APIs. Casey implemented the Granny SDK for game development, which is wildly successful and broadly regarded as a great API. In general it seems there are a few different kinds of APIs that Casey outlined in his video (link above), take a look at one of his slides:

The “layer” portion would describe code that sits on top of a rock-solid (never changing) service, like a piece of hardware, perhaps a GPU (i.e. OpenGL or DirectX). An “engine” would be what most people are developing: software that solves a consumer’s problems, and often makes use of a bunch of pre-existing tools and paradigms. Examples would be Unity, Unreal, etc. Another example is a new engineer coming into a company with a big pre-existing codebase. That engineer will be writing “new” code in his small little box, whereas the huge “reused” box will be the company’s code. Finally we have component which is defining an input and output to the user. Examples could be physics middleware, or the Granny SDK itself. In all instances it can be extremely difficult to decide what goes where, and to degree of control to expose things to the user. Please, watch Casey’s video! He will do much more justice than I, as I am a noob compared to him. If possible I recommend readers getting a hold of tried and tested APIs like Granny SDK, or Havok physics SDK. Unreal is open source and also a great way to learn about the Unreal API.

http://stackoverflow.com/questions/7440379/what-exactly-is-the-meaning-of-an-api

http://www.randygaul.net/2015/10/19/single-file-libraries-bundle-pl-incbin-pl/

https://www.youtube.com/watch?v=ZQ5_u8Lgvyk

http://www.radgametools.com/granny/sdk.html

https://github.com/RandyGaul/qu3e

38 Software Design and Architecture

Study the sources of these products (or other similar products) and try to form opinions. Another good example is the Maya SDK.

Example Layer Problem In an attempt to provide readers with some food for thought below is an example scenario of implementing a layer API. Say we are implementing an OS layer that abstracts underlying hardware, and exposes sys calls to the user. Perhaps we have written the file abstraction and implemented posix sys calls such as open, close, flush, and write. Our job is to write some documentation on these sys calls after implementing them. Here is some pseudo documentation:

int open( const char *pathname, int flags )

o returns a file descriptor given a path

void close( int filedes )

o closes a file given a file descriptor And so on and so forth for the other functions. One day a user submits a bug report saying they cannot figure out WTF is up with this code. It crashes on the close syscall:

int fd = open( path, flags );

write( fd, memory, count );

// ...

close( fd ); // assert( fp != 0 ); !!!

What might the problem be? The user clearly opened, wrote and closed a file. All seems well, right? close is crashing with a fairly cryptic assert message of fp != 0. Can you imagine what the problem may be? In this case the user forgot to call flush to submit the changes to the file, and as such the OS file is asserting due to their internal pointer residing at 0. The user couldn’t quite figure out what the problem was because the internal code had a fairly cryptic assert, and the user didn’t know how close was implemented. However, the implementer of close (us) was completely happy with the use the assert. It immediately caught the user bug, and was also valuable during development to assert implementation correctness. Should an exception have been thrown by close, so the user could catch the exception and read the error themselves? If that’s the case then exceptions need to be enabled by all users wishing to use close. Should a more verbose assert have existed? Perhaps. Should an error message be returned by close? If so these error messages need to be documented, and this requires the

https://en.wikipedia.org/wiki/System_call


user to read the documentation and then correctly implementing retrieving + decrypting the error message. All in all there is no best solution as they all have tradeoffs. This is API design. It’s difficult.

Game Architecture In this section I will attempt to take readers on a small thought-experiment of designing the architecture for a pretend platformer game. Personally I am not a fan of game engine architectures. Currently my opinion is that “architecture” gets in the way of making a game. To that end I don’t even really believe in game engines. I believe in writing a custom codebase (while reusing some common pieces) tailored to solve specific problems of the game. This preference only works well if the core design of the game is known from project start, and rapid iteration is achieved (somehow) in the early stages of the project. The rapid iteration needs to have a smooth transition into “shipping mode” as the project matures as at some point iteration and shipping will become mutually exclusive. Given these constraints I will attempt to walk readers through a brief design process of mentally constructing a valid game engine for a single-player platformer style game.

Compilation Right off the bat lets setup the game engine to have fast compilation time, and make it easy for people to add more files to the project at-will. Let’s use the unity build style (described earlier in this document), where all files are bundled together to form a single translation unit for the compiler to churn through at lightning speed. Next up let’s use a hand-made script to compile the project. A batch file, python script, bash script, or anything else will suffice; anything that can execute the compiler and pass along the main-translation unit (along with any compiler flags) will work just fine. Compiling is as simple as executing the script, and should take less than a second throughout project development.

Iteration Ensuring fast iterations can be done through three main methods: reading data off disk during run-time (data files) that determine program behavior, scripting language integration, run-time compilation of C++. The first option is known as data driving the program, and can be implemented in a manner as simple as reading values from a text file upon program startup. The benefits here are tunable

40 Game Architecture

parameters can be added as necessary, however the bulk of the work is in the C++ code that interprets the data files. This can be thought of as level editors! A level editor is likely creating these data files so the game can read them, and adjust execution, as necessary. A level editor is a good example of building a tool to generate the data that is driving the program. The downsides to this style is that it can take a very long time to create these kinds of asset creation tools. The second option would be to incorporate a scripting language, such as Lua, into the game project. The benefits here are lua files can be reloaded at run-time in order to adjust program logic at run-time. This takes the whole “data driven” idea a step farther! Not only are game assets and tunable parameters up for change, but so is the game’s own logic. The downsides here are scripting languages are not as efficient as native C code. The final option is similar to incorporating a scripting language, but is not quite as flexible. This tradeoff in flexibility comes with the benefit of native C code for blazing fast speed! Let’s choose this option. One way to implement run-time compiled C++ is to place all game logic within a dynamic library, such as a DLL or (or .a file on linux). The OS can execute the compiler while the program is running, and write out a dynamic library. The program can then notice a new library is available, unload the one it currently has, reload the new one, and continue running with the new logic. Losses in flexibility with this style are: function address can change when a library is reloaded, thus destroying many functor implementations; virtual table addresses will certainly change if they are linked into the dynamic library; if the C run-time was statically linked to the DLL all heap memory allocated by the DLL will be lost upon reload; all static memory (bss, data, code section, etc.) of the process space for the DLL will be overwritten and lost upon reload; perhaps more limitations I have forgotten to list here. Casey Muratori of Handmade Hero has shown a great way to implement this style of dynamic code compilation in his various videos.

Memory Storage Clearly defining where memory is stored and where it comes from becomes integral to the success of the dynamic library reloading. There are a couple options:

The main executable owns all memory. This memory is handed to the dynamic library to operate upon, but the dynamic library does not own this memory. When the library is reloaded no program memory is lost. This requires the memory to handed to the library over an interface, and the library must store a pointer to this memory. The main executable and the library can both statically link all project code definitions thus




simplifying data definitions. Or, the main executable need not know about the dynamic libraries data definitions at all, hand opaque memory to the dynamic library.

The dynamic library owns whatever memory it needs. Upon reloading the dynamic library serializes all run-time objects into temporary memory held by the main executable. Once reloaded all objects are serialized back into the dynamic libraries run-time memory.

I highly recommend the first option due to ease of implementation. However, the second option may be preferred! For example, if the user wishes to use virtual dispatch the second method can gracefully handle patching of virtual tables upon serialization back into the run-time. In order to support polymorphism with the first style the user may need to implement their virtual table, either through actual function points and an actual table, or through C switches (or if-else combos). These options do not break down like C++ virtual tables since all of these methods reside in the process space of the dynamic library itself, and when reloaded will be set to new and appropriate values. Once a scheme is chosen and implemented developers will likely want to modify code while the game is running via an in-game hotkey. In this case it would be wise to implement a mechanism to do so within the game itself.

Run-time Object Model Jason Gregory’s book Game Engine Architecture has a great section on all the various choices of the run-time object model. I highly suggest reading his book! Whatever model is chosen can be incorporated into our pretend design.

Specific Solutions The rest of the platformer game can be tackled on a per-feature basis. Since I stated we were making a platformer some questions immediately arise:

Can the player jump

Are there typical “Mario-style” enemies

How does the player interact with obstacles, or enemies

Are there tiles in the game

Advanced physics, such as bridges, joints?

What kind of animations do we want, skeletal, 2D framed-based, etc.?

What do levels look like

How are the level transitions defined

http://www.randygaul.net/2015/08/06/the-concept-of-polymorphism/

http://www.gameenginebook.com/

42 Game Architecture

And the list can go on and on. These questions help to define the features the game requires. Each feature can be solved with a specific solution! Development in this pretend project would consist of picking features to implement, while trying to write only the code necessary to solve the given feature. In this way the “architecture” of the game is all the implementation details involved in achieving each feature. From here on lies the realm of game design…

Documents

Sftware Engineering Essentials - Randy Gaul · 2. Any C++ book, preferably multiple of them since they are all terrible (in my opinion) 3. See 1. 4. See 2. 5. See 1. 6. Wikipedia