cmdr2's blog: maths

Showing posts with label maths. Show all posts

Sunday, May 25, 2008

Implementation of the 'Threshold Whole Square Method'

So here's the 'C' implementation of the 'Threshold Whole Square Method' I talked about (here). This one is only for 'long' data-type, but I've written a generic one that can deal with float types too, but I'll post that one later.

Code:

long lsqr(long n) {
if (n > 1)
{
long twoBase = 1, i;
for (i = 1; twoBase <= n; i++)
{
twoBase <<= 1;
}
twoBase >>= 1;
long k = (long)(n - twoBase);
long sqrN = (long)(twoBase << i - 2);
return (k == 0 ? sqrN : sqrN + lsqr(k) + ((k * twoBase) << 1));
}
else
return n;
}

Basically, the variable twoBase is the nearest threshold which is a power of 2, so that it can be used as 'n' in the formula mentioned in that other post. This is done as the whole square of this number can be calculated very cheaply by a simple left shift operation. After that, the difference between the given number and this threshold is calculated to determine 'k', i.e. the offset. Then a simple application of the formula gives the result. The advantage of this method is that the we need to effectively calculate the whole-square of just a very small number (the offset 'k'), instead of the conventional calculation of the entire number 'n'. This make this method marginally faster than the compiler's method, but this one can only be used for 'long' data types and should be used carefully.

Quick calculation of whole squares using "Threshold Method"

Not a big deal, millions might have already guessed it or might be using it. Just found it useful to mentally calculate the square of the next number quickly, if you know the square of a basic 'check-point' number like 10, 15, 20, 25, 30 etc.

The standard formula I use for such a technique is:

sqr(n+k) = sqr(n) + sqr(k) + 2*k*n
For e.g.: sqr(29) = sqr(25+4) = 625 + 16 + 2*4*25 = 625 + 216 = 841

Thus the above sqr(n+k) method of finding squares is very useful for quick calculation of squares using the left shifts in even multiples to find the value of the nearest power of 2 and then adding 'k' to determine the desired whole square quickly for computer algorithms as well as human calculations.

For humans, it is easy to calculate with the nearest multiple of 10 or 5 as 'n' and 'k' as the extra required to achieve the number (like in the previous example). For computers, it will be quick to take the nearest power of 2 as 'n', as they can quickly calculate the square of that n by left shifting. I did write an 'C' implementation of such a "Threshold Whole Square" algorithm, which can be found here.

Shashank

PS: The ball in the picture is called a "Spherical Cube". Its made of 6 squares. Yes, they are squares, even though they don't look like one. :-)

Saturday, May 17, 2008

The Lore of the Inverse Square Root function

This one is for all the 3D geeks!

I had posted some useful optimizations for Graphic programmers earlier, and one of the optimizations I had suggested was one I saw in the Quake 3 source code, which performs the inverse square root, i.e 1 / sqrt(x). In 3D Graphics, the question of who came up with the 'magic' constant 0x5f3759df continues to be a question that 3D geeks still seek to answer! Most attribute it to the legendary programmer John Carmack, the man behind the Quake, Doom and all id Software Game Engines. However there are several other people attributed to it, and John himself refused to take claim for the creation of the constant in the mail:

-----Original Message-----
From: John Carmack
Sent: 26 April 2004 19:51
Subject: Re: Origin of fast approximated inverse square root

At 06:38 PM 4/26/2004 +0100, you wrote:

>Hi John,
>
>There's a discussion on Beyond3D.com's forums about who the author of
>the following is:
>
>float InvSqrt (float x){
> float xhalf = 0.5f*x;
> int i = *(int*)&x;
> i = 0x5f3759df - (i>>1);
> x = *(float*)&i;
> x = x*(1.5f - xhalf*x*x);
> return x;
>}
>
>Is that something we can attribute to you? Analysis shows it to be
>extremely clever in its method and supposedly from the Q3 source.
>Most people say it's your work, a few say it's Michael Abrash's. Do
>you know who's responsible, possibly with a history of sorts?

Not me, and I don't think it is Michael. Terje Matheson perhaps?

John Carmack

Why this is so important could be understood by the fact that Chris Lomont, a professor at Purdue took up the task of deriving that constant by mathematical means for the initial approximation of the Newton Raphson method (that the magic code uses), and his 'mathematically and theoretically' brilliant approximation constant turned out to perform WORSE than the constant found in the Quake III source code. So the question that arises is how John Carmack, or the author who originally wrote the code managed to figure out the magic approximation. That could be pretty useful I think for designing other fast mathematical functions.

So this article on Beyond3D.com does a great job of tracing down the original author, or atleast a major contributor, after a great deal of digging around. Check it out:
Origin of Quake III's Fast Inverse Sqrt()

Sunday, June 25, 2006

Effective optimizations for graphic programmers

Warning: The post contains a bit of technical stuff, along with some source code. Read on if you're interested.

I don't claim to be an expert, or even an intermediate in the vast field of graphic programming, but here are a few cool optimizations/replacements that I have been using for common math functions. This list is by no means complete, and by no means are most of the optimizations my inventions. If anyone wishes to suggest any more, please feel free to do so.

Graphic programming, 2D as well as 3D, involves extensive use of math, especially Vectors and Matrices. And it therefore extensively uses several math functions, like trigonometric ratios, squares and square-roots etc. Vectors also extensively involve "inverse-square-root" i.e. 1/sqrt(x) i.e. one divided by square root. This invSqrt is used in everything from finding magnitudes to finding angles between vectors, projections etc. The main point of this paragraph is that graphics heavily utilize math.

Now before we can make the next Doom3 engine, we need to first get the engine up-to-the-mark. Its no good having a fantastic engine intended for games, only to find it running like a slide-show on even high-end systems. The reason John Carmack is famous is because he has written all the game engines of id software, famous for the Doom, Wolf3D, Quake-series, Doom3 engines/games, and reading his code reveals some amazing tricks and optimizations. All the code upto Quake-3 (inclusive) has been made Open-Source under GPL, so you can try downloading and taking a peek if you're interested. I recommend that. So it turns out that most math functions can be hacked a lot to gain a great deal of optimization and speed.

A) Inverse Square Root
First I look at the famous "Inverse-Square-Root" function that was first seen in Quake-3 and is based on the Newton-Raphson method for approximating functions. The function in this case is 1/sqrt(x). To read in detail about how this works, you can see this and this (more technical). The code:

float InvSqrt(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x; // get bits for floating value
i = 0x5f375a86- (i>>1); // gives initial guess y0
x = *(float*)&i; // convert bits back to float
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
return x;
}
Test Result: I had tested this before using it in my code. For calculating the inverse square roots of 500000 numbers, the standard compiler method of 1/sqrt(x) took on an average 132 milliseconds, while the above code took on an average 18 milliseconds.

B) Square Root:
Then digging around Quake's source, I found an optimization for the simple square root function, which is essentially the same thing as above with a little modification. The Code:

float fSqrt(float number) {
long i;
float x, y;
const float f = 1.5F;
x = number * 0.5F;
y = number;
i = * ( long * ) &y;
i = 0x5f3759df - ( i >> 1 );
y = * ( float * ) &i;
y = y * ( f - ( x * y * y ) );
y = y * ( f - ( x * y * y ) );
return number * y;
}
Test Result: To calculate the square root of 500000 numbers, the standard compiler function took on an average 110 milliseconds, while the above code took on an average 26 milliseconds.

C) Square, Cube etc.
This I found on my own. This method is best used if you know the exponent (the power to raise to) beforehand. There are two ways of raising a number to a specific power. One is to use the simple compiler pow(x, n) function which raises x to the power n. I've often observed people using this method if 'n' is large.. above 5 or 6, even if they know beforehand (while coding) the power to raise to. So they use the pow() function. The other way is to simply multiply that number as many times as required by itself. So if I want to raise x to 6, I'll do: [x = x*x*x*x*x*x]. Now it might seem odd, since this method looks so ugly (imagine doing that for 15th power), but believe it or not, the speed difference is amazing. Use the second method always, even if you're raising to the 15th power.
Test Result: I performed two separate tests. In each case, as usual, I took 500000 test numbers. I tried simple "squaring" using the pow() compiler function, and it took on an average 45 milliseconds. I tried the other ugly method of x*x, and it took 2 millseconds. I then tried raising 500000 numbers to the 9th power, and the pow() compiler method took on an average 67 milliseconds, while the [x*x*x*x*x*x*x*x*x] method took just 3 milliseconds. This proves this uglier way to be much more efficient.

D) Trigonometric Ratios
This is quite a common trick, and I therefore won't be putting up a result sheet for this one. The idea lies in making a 'lookup table' beforehand for all those trigo functions you are using. This is because the compiler trigo functions are VERY expensive (take time) and its bad to use them while rendering. So suppose we use cosine in our program, what we do is, make a table (array) of all the angles that the program might use and a table (array) of the cosines of those angles. So the speed boost is immediate, as picking a random element in the array (random access) is a very fast operation. Now generally we have an idea of the range of angles that we might use, and the smallest required intervals between each angle, for example, we might just need whole number angles, not fractional angles, thus we need to make, say, just 90 cosines. Or if the smallest step between angles is 0.5 degrees, we need to calculate the cosine of 0.0, 0.5, 1.0, 1.5 and so on.. while the program is starting up. After that we need to just look up the table and get the cosine very quickly. Incase the angle required is not *exactly* present in the table, like we need 13.7 degrees and we have 13.5 deg in the table, it makes sense to pick the cosine of 13.5 and be happy with it, since in Graphic programming, for real-time work it is smart work to make small compromises here and there to gain an great deal of speed.

PS: I think I need to change the blog theme. I tend to write long entries and this theme squeezes the text into a narrow column making it look dauntingly long.