Tuesday November 26 (Sessions 1 and 2)

1. #include dependencies vs. forward declarations

Include dependencies:

#include “random_number_generator.h”

struct Something {

Random_Number_Generator generator;

};

Forward declaration:

struct Random_Number_Generator;

struct Something {

Random_Number_Generator *generator;

};

Forward declaration provides a tremendous increase in speed when used consistently in a large project. The disadvantage is that the compiler doesn’t know the size of the forward-declared type, so you can’t use that type directly in the header. You must instead use a pointer to the type. So the semantics of your program are affected (you need to dynamically allocate things sometimes when you might prefer not to). This way in which compile-speed concerns bleed into the semantics of your program can be viewed as a general weakness of C++.

2. Compiler timing experiments performed by Casey Muratori of RAD Game Tools

Date: Wed, 20 Sep 2000 19:09:24 -0400 (EDT)

From: Casey Muratori <>

Recently, I decided to run some very simple tests on my machine to get a baseline for just how bad the STL (and templates in general) can be. Even on my 1gHz machine with a supposedly super-fast memory subsystem and all that mumbo jumbo, I was able to bring the compiler to its knees with some very simple STL statements. Simple things like including and instantiating a std::string bloated compile time by a half a second per file, which can really add up for large projects. Instantiating something like std::map<std::string, int> was far worse, bloating compile time by more than a second per file. And, needless to say, over-the-top STL statements like std::list<std::vector<std::deque<... >, etc., were pure

death. In my most extreme case, I was able to get the compiler to spend over ten seconds per file instantiating just five such variables.

Disk activity is also seriously affected by template usage. Instantiating a single std::map can add 40k per obj with -Zi, and a std::map<std::string, ...> type instantiation adds another 35k or more. In the extreme cases, the complex five-variable STL declarations made each obj bloat by over 400k, even without debug symbols.

Debug symbols do not make as big a difference as I thought they would, although they do affect the file size dramatically (as one would expect, since the ungodly complicated names of the templates must be stored). Compiling with -Zi generally added about 1/4th more compile time, although it usually at least doubled all the file sizes.

Changing from the STL to my own template implementations did generally reduce all figures, since my templates are much more simplified versions (ie, I don't use traits and those sorts of things). However, not by enough to consider it acceptable for use in large projects.

Also, as a sidenote, I finally took some samples to figure out how much time including windows.h eats. It's actually quite considerable – over two seconds per file. With WIN32_LEAN_AND_MEAN that drops to a half a second, but WIN32_EXTRA_LEAN does not reduce it further.

- Casey

PS. For those interested, here is a concise listing of some of the test runs. Each snippet was copied to 20 separate cpps, all of which were compiled together to form an .exe. The time is for the complete compilation, including linking, as is given as MM:SS. The numbers in parentheses are file sizes, in kilobytes, for the exe, ilk, pdb, and obj files produced, respectively. Note that the obj size is for a single obj, so multiply by 20 if you want the total k used by all objs. The compiler was MSVC 12.00.8804 (6.0 sp4), and the only compiler switch was -Zi, so that the pdb and ilk would be produced.

---- Time: 00:01 (61, 104, 123, 0)

#include <stdio.h>

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:34 (61, 104, 189, 9)

#include <windows.h>

#include <stdio.h>

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:10 (61, 104, 173, 4)

#define WIN32_LEAN_AND_MEAN

#include <windows.h>

#include <stdio.h>

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:09 (61, 104, 173, 4)

#define WIN32_LEAN_AND_MEAN

#define WIN32_EXTRA_LEAN

#include <windows.h>

#include <stdio.h>

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:08 (61, 116, 132, 4)

#include <string>

#include <stdio.h>

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:08 (65, 133, 140, 9)

#include <string>

#include <stdio.h>

static std::string X;

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:08 (73, 143, 173, 11)

#include <string>

#include <stdio.h>

static std::string X[1024];

static std::string Y[1024];

static std::string Z[1024];

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:03 (61, 104, 123, 1)

#include <map>

#include <stdio.h>

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:08 (77, 258, 181, 40)

#pragma warning(disable:4786) // debug symbol truncated to 255 characters

#include <map>

#include <stdio.h>

static std::map<int, char> X;

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:08 (90, 264, 205, 42)

#pragma warning(disable:4786) // debug symbol truncated to 255 characters

#include <map>

#include <stdio.h>

static std::map<int, char> X[1024];

static std::map<int, char> Y[1024];

static std::map<int, char> Z[1024];

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:10 (61, 116, 132, 4)

#pragma warning(disable:4786) // debug symbol truncated to 255 characters

#include <map>

#include <string>

#include <stdio.h>

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:24 (81, 300, 214, 69)

#pragma warning(disable:4786) // debug symbol truncated to 255 characters

#include <map>

#include <string>

#include <stdio.h>

static std::map<std::string, std::string> X;

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:13 (61, 116, 132, 6)

#pragma warning(disable:4786) // debug symbol truncated to 255 characters

#include <map>

#include <string>

#include <list>

#include <vector>

#include <deque>

#include <set>

#include <stdio.h>

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:27 (90, 530, 304, 145)

#pragma warning(disable:4786) // debug symbol truncated to 255 characters

#pragma warning(disable:4503) // decorated length exceeded

#include <map>

#include <string>

#include <list>

#include <vector>

#include <deque>

#include <set>

#include <stdio.h>

static std::map<std::string,

std::list<std::vector<std::deque<std::set<int> > > > > X;

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 00:29 (98, 530, 361, 148)

#pragma warning(disable:4786) // debug symbol truncated to 255 characters

#pragma warning(disable:4503) // decorated length exceeded

#include <map>

#include <string>

#include <list>

#include <vector>

#include <deque>

#include <set>

#include <stdio.h>

static std::map<std::string,

std::list<std::vector<std::deque<std::set<int> > > > > X;

static std::map<std::string,

std::list<std::vector<std::deque<std::set<int> > > > > Y;

static std::map<std::string,

std::list<std::vector<std::deque<std::set<int> > > > > Z;

static std::map<std::string,

std::list<std::vector<std::deque<std::set<int> > > > > Q;

static std::map<std::string,

std::list<std::vector<std::deque<std::set<int> > > > > R;

static std::map<std::string,

std::list<std::vector<std::deque<std::set<int> > > > > S;

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

---- Time: 02:28 (151, 2122, 975, 836)

#pragma warning(disable:4786) // debug symbol truncated to 255 characters

#pragma warning(disable:4503) // decorated length exceeded

#include <map>

#include <string>

#include <list>

#include <vector>

#include <deque>

#include <set>

#include <stdio.h>

static std::map<std::string,

std::list<std::vector<std::deque<std::set<int> > > > > X;

static std::list<std::vector<std::deque<std::set<std::map<int,

std::string> > > > > Z;

static std::vector<std::deque<std::set<std::map<std::string,

std::list<int> > > > > Q;

static std::deque<std::set<std::map<std::string,

std::list<std::vector<int> > > > > R;

static std::set<std::map<std::string,

std::list<std::vector<std::deque<int> > > > > S;

static void foo(void)

{

{for(int x = 0;

x < 100;

++x)

{

printf("%d\n", x);

}}

}

3. Books

Recommended books:

Linear Algebra Done Right (Sheldon J. Axler)

Matrix Analysis (Horn and Johnson)

Hypercomplex Numbers (Kantor and Solodovnikov)

Clifford Algebras and Spinors (Pertti Lounesto)

A Digital Signal Processing Primer (Ken Steiglitz)

Digital Filters (R.W. Hamming)

Graphics Gems books (various editors)

Jim Blinn’s Corner books (Jim Blinn)

The Mythical Man-Month (Fred Brooks)

Calculus books, statistics books

Various academic research papers (search on citeseer.nj.nec.com/cs)

The thing about research papers is: there are a lot of bad ones. For example, you should not take a paper seriously just because it’s published somewhere like SIGGRAPH (most siggraph papers are not very good).

Not Very Recommended books:

“Learn DirectX” kinds of books

These don’t tell you anything you can’t learn on the Web for free.

Game Programming Gems series

There are a few good things in Game Programming Gems, but there are also

a lot of bad things. Useful mostly for getting some context about the kinds

of things game industry programmers think about.

Lecture Summary / Notes:

Game programming these days is a huge task. Teams can be large and have many programmers. We generally divide programmers into three types: engine, gameplay, and tools programmers.

Engine programmers work on core services that are used by the game: implementing character animation, collision detection, audio streaming, stencil shadows, whatever. Even if the team has licensed the engine for a high-tech complete game (like the Unreal or Doom engines) some engine work will probably need to be done. Tweaks will need to be done to existing features, and new features added – and this takes significant work, since code just is not magically modular like many OO Programming books would have you believe. (It is beyond the state of the art in software technology to create efficient modular code; this is not anyone’s fault, it is just a hard problem).

Gameplay programmers use the features provided by the engine programmers to make things happen for a specific game. They program UI elements (example: the light gem from Thief), game-specific AI (like squad-based enemy cooperation in Half-Life), game-specific special effects (like making the particle system in the engine do something particular), etc.

Tools programmers create software that lets game designers / map builders / 3D modelers and animators put their work into the game and try it out. Examples of tools are: animation compressor, portal generator, lighting analyzer.

Tools code can be relatively slow, since tools are often batch-oriented. This makes internal tool code easier to write than engine code. But tools need to have a very good user interface, which can take a lot of work.

Gameplay code can be slower than engine code, since typically it acts as a wrapper to the engine code; the engine code will tend to take the most CPU time.

Engine code must be very fast and very robust.

Demos always tend to look more impressive than full games. This is because the feature sets of games are larger and more complex, and features interfere with each other. Example of triangle stripping, vertex caching, and LOD all fighting with each other. Demos are free to isolate a small set of features and exploit them, ignoring things that don’t play well with those features. Games typically cannot do this to the same extent.

One of the most important tasks in engine programming is choosing a set of features that combines together well, to maximize the impact of the final game while minimizing the amount of programming work required to create that game.

Engine programmers need to keep in mind that the jobs of the level designers / animators / etc have to be as easy as possible for good looking content and good gameplay to make it into the final product. Try not to add engine features that make it more difficult to build content, unless those engine features are extremely compelling.

For artists to work efficiently, they need a build/test/modify cycle that is as short as possible. There definitely should not be a programmer in the loop, and ideally they don’t even need to run a separate program.

For programmers to work efficiently, the compile/edit/debug loop needs to be as tight as possible. Scripting languages are good for this, but only if they are strongly typed and have good debuggers. No scripting languages are strongly typed and have good debuggers. So we are stuck probably using C++ for almost everything. C++ compile times for large games tend to be terribly slow (can be as high as 20 minutes on state-of-the-art machines). Keeping compile times down is crucial to keeping programmers productive and code quality high. There are some simple techniques that can help tremendously, like forward-declaration and template avoidance (see above). Unfortunately even so, C++ has a syntax that inherently bloats compile time (like the need to make private things public). Being built around the mistaken concept of “1-pass file processing”, C++ actually ends up processing much of the source code many, many times over. It might be a good idea to fix these problems if you ever think about making your own language.

Wednesday November 27 (Sessions 3 and 4)

Fast sine/cosine for spinning particles in a particle system (don’t use a generalized rotation function, it’s too slow!)

From: Olli Niemitalo ()
Subject: Trick: Simultaneous parabolic approximation of sin and cos
Newsgroups: comp.dsp
Date: 2001-06-30 05:15:09 PST
Name: Simultaneous parabolic approximation of sin and cos
Category: Algorithmic
Application: When you need both sin and cos at once, and you need 'em fast,
and using multiplications and parabolic approximation is OK, try this.
Possible applications are audio panning, mixing fader, maybe even realtime
filter coefficient calculations and 2D/3D rotation transformations.
Advantages: Cheap, only one or two multiplies per approximation pair. No
discontinuities.
Introduction:
A quarter of a sinusoid cycle is approximated with a parabola. The remaining
quarters are horizontally and vertically flipped versions of the
approximated quarter. Instead of creating two different approximations, the
same polynomial is used for both sin and cos. The positive crossing point of
sin(angle) and cos(angle), at angle = pi/4, is made to correspond to x = 0
in the polynomial. This shifts the center of a quarter at x = 0 and turns
switching between quarters (and between sin and cos) into x and y sign
flips.
The Trick:
The range that is approximated is:
x = -1/2 .. +1/2, angle = 0 .. pi/2
Angle and x are related by:
x = angle 2 / pi - 1/2
The approximation: ("~=" stands for "approximately equals")
sin(angle) ~= sinapprox(x) = (2 - 4 c) x^2 + c + x
cos(angle) ~= cosapprox(x) = (2 - 4 c) x^2 + c - x
As you see, the only difference is in the last sign, so you can calculate
the part left from that first and then add or sub x to get sin or cos. For
different angle ranges, you have to flip signs of the whole equation or the
sign of the x term. See the example source code.
c is a constant that can be fine-tuned for different applications. With c
yet undefined, the polynomials hit 0 and 1 in the right places. Here are
some differently optimized (and approximated) values for c:
A) c ~= 0.7035
B) c ~= 0.71256755058
C) c = sqrt(2)/2 ~= 0.70710678118654752440
D) c = 3/4 = 0.75 (one mul choice!)
To compare the errors produced by different choices of c, we look at the
maximums and minimums of the following error measures, for different c:
error = sinapprox(x) - sin(angle)
Min: A) -0.0213 B) -0.0150 C) -0.0187 D) 0
Max: A) +0.0212 B) +0.0271 C) +0.0235 D) +0.0560
error = sqrt(sinapprox(x)^2 - cosapprox(x)^2) - 1
Min: A) -0.0261 B) -0.0155 C) -0.0214 D) 0
Max: A) 0 B) +0.0155 C) 0 D) +0.1250
The different c were optimized as follows:
A) was optimized for minimum maximum absolute difference to sin.
B) was optimized for minimum maximum absolute difference of
sqrt(sinapprox(x)^2+cosapprox(x)^2) to 1. This is important for example in a
rotation transformation, where you want the dimensions to stretch as little
as possible from the original. Note, though, that C) gives the lowest total
magnitude "wobbling" range.
C) was optimized similarly to B but never lets
sqrt(sinapprox(x)^2+cosapprox(x)^2) exceed 1. This is useful if you
calculate filter coefficients and don't want unstable poles. Also,
sinapprox(0) and cosapprox(0) give sqrt(2)/2, which is the correct
value.
D) was optimized for continuous differential, which reduces high harmonics.
This is good if you are creating sine and cosine signals rather than doing
"random access" to the function. Also, it eliminates the other
multiplication, making the algo one-mul and low bit-depth:
x = -1/2 .. +1/2
sinapprox(x) = 3/4 - x^2 + x
cosapprox(x) = 3/4 - x^2 – x

Friday November 29 (Sessions 5 and 6)

References:

“GeForce 256 and RIVA TNT Combiners”; white paper.

“Tool Postmortem: Climax Brighton’s Supertools”, Chris Hargreaves.

ATI RenderMonkey documentation and release notes.

Saturday November 30 (Sessions 7 and 8)

Lecture Notes:

Scenes tend to be divided into two different categories, “indoor” and “outdoor”. Games like first-person shooters are usually built with “indoor” engines; when they want a scene in an outside setting, they fake being outside, placing rocks and trees and such in the scene. But there are inevitably high “walls” separating one area from another. Counter-Strike does this frequently (see maps like cs_militia, de_nuke, de_dust). These are all indoor maps, though they appear to be outdoors since many areas don’t have ceilings.

The definition of a “true” outdoor scene is that you can stand in an area (perhaps a high area) and see for a large distance across the map in most directions. Flight simulators use outdoor engines. Games like Starsiege: TRIBES have hybrid engines that switch between outdoor and indoor.

Basic methods of reducing rendered geometry are: PVS, portals, occluders (also known as antiportals).

For true outdoor scenes, PVS and portals are not very useful. Occluders are the only reliable technique; but because of the definition of an outdoor scene, there are many times when you won’t have suitable candidates for occluders. Because of consistency-of-framerate arguments, this means maybe you probably shouldn’t try to do occlusion detection at all for a pure outdoor scene. I will spend a lot of time during the lecture presenting this argument because it is an important thing to understand and applies to many kinds of LOD (though analyzing outdoor terrain rendering is the easiest way to understand the issue).