Warning: The post contains a bit of technical stuff, along with some source code. Read on if you're interested.
I don't claim to be an expert, or even an intermediate in the vast field of graphic programming, but here are a few cool optimizations/replacements that I have been using for common math functions. This list is by no means complete, and by no means are most of the optimizations my inventions. If anyone wishes to suggest any more, please feel free to do so.
Graphic programming, 2D as well as 3D, involves extensive use of math, especially Vectors and Matrices. And it therefore extensively uses several math functions, like trigonometric ratios, squares and square-roots etc. Vectors also extensively involve "inverse-square-root" i.e. 1/sqrt(x) i.e. one divided by square root. This invSqrt is used in everything from finding magnitudes to finding angles between vectors, projections etc. The main point of this paragraph is that graphics heavily utilize math.
Now before we can make the next Doom3 engine, we need to first get the engine up-to-the-mark. Its no good having a fantastic engine intended for games, only to find it running like a slide-show on even high-end systems. The reason
John Carmack is famous is because he has written all the game engines of id software, famous for the Doom, Wolf3D, Quake-series, Doom3 engines/games, and reading his code reveals some amazing tricks and optimizations. All the code upto Quake-3 (inclusive) has been made Open-Source under GPL, so you can try
downloading and taking a peek if you're interested. I recommend that. So it turns out that most math functions can be hacked a lot to gain a great deal of optimization and speed.
A)
Inverse Square RootFirst I look at the famous "Inverse-Square-Root" function that was first seen in Quake-3 and is based on the
Newton-Raphson method for approximating functions. The function in this case is 1/sqrt(x). To read in detail about how this works, you can see
this and
this (more technical). The code:
float InvSqrt(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x; // get bits for floating value
i = 0x5f375a86- (i>>1); // gives initial guess y0
x = *(float*)&i; // convert bits back to float
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
return x;
}
Test Result: I had tested this before using it in my code. For calculating the inverse square roots of 500000 numbers, the standard compiler method of 1/sqrt(x) took on an average
132 milliseconds, while the above code took on an average
18 milliseconds.
B)
Square Root:Then digging around Quake's source, I found an optimization for the simple square root function, which is essentially the same thing as above with a little modification. The Code:
float fSqrt(float number) {
long i;
float x, y;
const float f = 1.5F;
x = number * 0.5F;
y = number;
i = * ( long * ) &y;
i = 0x5f3759df - ( i >> 1 );
y = * ( float * ) &i;
y = y * ( f - ( x * y * y ) );
y = y * ( f - ( x * y * y ) );
return number * y;
}
Test Result: To calculate the square root of 500000 numbers, the standard compiler function took on an average
110 milliseconds, while the above code took on an average
26 milliseconds.
C)
Square, Cube etc.This I found on my own. This method is best used if you know the exponent (the power to raise to) beforehand. There are two ways of raising a number to a specific power. One is to use the simple compiler pow(x, n) function which raises x to the power n. I've often observed people using this method if 'n' is large.. above 5 or 6, even if they know beforehand (while coding) the power to raise to. So they use the pow() function. The other way is to simply multiply that number as many times as required by itself. So if I want to raise x to 6, I'll do: [x = x*x*x*x*x*x]. Now it might seem odd, since this method looks so ugly (imagine doing that for 15th power), but believe it or not, the speed difference is amazing. Use the second method always, even if you're raising to the 15th power.
Test Result: I performed two separate tests. In each case, as usual, I took 500000 test numbers. I tried simple "squaring" using the pow() compiler function, and it took on an average
45 milliseconds. I tried the other ugly method of x*x, and it took
2 millseconds. I then tried raising 500000 numbers to the 9th power, and the pow() compiler method took on an average
67 milliseconds, while the [x*x*x*x*x*x*x*x*x] method took just
3 milliseconds. This proves this uglier way to be much more efficient.
D)
Trigonometric RatiosThis is quite a common trick, and I therefore won't be putting up a result sheet for this one. The idea lies in making a 'lookup table' beforehand for all those trigo functions you are using. This is because the compiler trigo functions are VERY expensive (take time) and its bad to use them while rendering. So suppose we use cosine in our program, what we do is, make a table (array) of all the angles that the program might use and a table (array) of the cosines of those angles. So the speed boost is immediate, as picking a random element in the array (random access) is a very fast operation. Now generally we have an idea of the range of angles that we might use, and the smallest required intervals between each angle, for example, we might just need whole number angles, not fractional angles, thus we need to make, say, just 90 cosines. Or if the smallest step between angles is 0.5 degrees, we need to calculate the cosine of 0.0, 0.5, 1.0, 1.5 and so on.. while the program is starting up. After that we need to just look up the table and get the cosine very quickly. Incase the angle required is not *exactly* present in the table, like we need 13.7 degrees and we have 13.5 deg in the table, it makes sense to pick the cosine of 13.5 and be happy with it, since in Graphic programming, for real-time work it is smart work to make small compromises here and there to gain an great deal of speed.
PS: I think I need to change the blog theme. I tend to write long entries and this theme squeezes the text into a narrow column making it look dauntingly long.