Wednesday, June 28, 2006

Of internet communities and hair..


Browsing around the 'net, I found some really wierd community websites/boards. Though now it makes perfect logical sense, but I really found this particular board pretty interesting: Link


I find amazing comradeship among long hairs. Walk into a bus, you'll immediately be noticed by another ponytail, not that I'm making one yet. Though of course, I find the "super groomed", "super silky" hair that kids try to emulate really gross. Hair should be allowed to grow as it wants to, what's the kick in spending 45 mins a day trying to get your hair silky smooth and steady?

PS: That guy in the picture is, unfortunately, not me. :-)
PPS: I know this sort of breaks the tradition of semi-technical posts that I have set with the last few ones.. Its unlikely I will post anything this idiotic in the near future.. :-)
PPPS: I really had nothing to post about, and was feeling generally bored. :-)

Sunday, June 25, 2006

Effective optimizations for graphic programmers


Warning: The post contains a bit of technical stuff, along with some source code. Read on if you're interested.

I don't claim to be an expert, or even an intermediate in the vast field of graphic programming, but here are a few cool optimizations/replacements that I have been using for common math functions. This list is by no means complete, and by no means are most of the optimizations my inventions. If anyone wishes to suggest any more, please feel free to do so.

Graphic programming, 2D as well as 3D, involves extensive use of math, especially Vectors and Matrices. And it therefore extensively uses several math functions, like trigonometric ratios, squares and square-roots etc. Vectors also extensively involve "inverse-square-root" i.e. 1/sqrt(x) i.e. one divided by square root. This invSqrt is used in everything from finding magnitudes to finding angles between vectors, projections etc. The main point of this paragraph is that graphics heavily utilize math.

Now before we can make the next Doom3 engine, we need to first get the engine up-to-the-mark. Its no good having a fantastic engine intended for games, only to find it running like a slide-show on even high-end systems. The reason John Carmack is famous is because he has written all the game engines of id software, famous for the Doom, Wolf3D, Quake-series, Doom3 engines/games, and reading his code reveals some amazing tricks and optimizations. All the code upto Quake-3 (inclusive) has been made Open-Source under GPL, so you can try downloading and taking a peek if you're interested. I recommend that. So it turns out that most math functions can be hacked a lot to gain a great deal of optimization and speed.

A) Inverse Square Root
First I look at the famous "Inverse-Square-Root" function that was first seen in Quake-3 and is based on the Newton-Raphson method for approximating functions. The function in this case is 1/sqrt(x). To read in detail about how this works, you can see this and this (more technical). The code:

float InvSqrt(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x; // get bits for floating value
i = 0x5f375a86- (i>>1); // gives initial guess y0
x = *(float*)&i; // convert bits back to float
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
return x;
}
Test Result: I had tested this before using it in my code. For calculating the inverse square roots of 500000 numbers, the standard compiler method of 1/sqrt(x) took on an average 132 milliseconds, while the above code took on an average 18 milliseconds.

B) Square Root:
Then digging around Quake's source, I found an optimization for the simple square root function, which is essentially the same thing as above with a little modification. The Code:

float fSqrt(float number) {
long i;
float x, y;
const float f = 1.5F;
x = number * 0.5F;
y = number;
i = * ( long * ) &y;
i = 0x5f3759df - ( i >> 1 );
y = * ( float * ) &i;
y = y * ( f - ( x * y * y ) );
y = y * ( f - ( x * y * y ) );
return number * y;
}
Test Result: To calculate the square root of 500000 numbers, the standard compiler function took on an average 110 milliseconds, while the above code took on an average 26 milliseconds.

C) Square, Cube etc.
This I found on my own. This method is best used if you know the exponent (the power to raise to) beforehand. There are two ways of raising a number to a specific power. One is to use the simple compiler pow(x, n) function which raises x to the power n. I've often observed people using this method if 'n' is large.. above 5 or 6, even if they know beforehand (while coding) the power to raise to. So they use the pow() function. The other way is to simply multiply that number as many times as required by itself. So if I want to raise x to 6, I'll do: [x = x*x*x*x*x*x]. Now it might seem odd, since this method looks so ugly (imagine doing that for 15th power), but believe it or not, the speed difference is amazing. Use the second method always, even if you're raising to the 15th power.
Test Result: I performed two separate tests. In each case, as usual, I took 500000 test numbers. I tried simple "squaring" using the pow() compiler function, and it took on an average 45 milliseconds. I tried the other ugly method of x*x, and it took 2 millseconds. I then tried raising 500000 numbers to the 9th power, and the pow() compiler method took on an average 67 milliseconds, while the [x*x*x*x*x*x*x*x*x] method took just 3 milliseconds. This proves this uglier way to be much more efficient.

D) Trigonometric Ratios
This is quite a common trick, and I therefore won't be putting up a result sheet for this one. The idea lies in making a 'lookup table' beforehand for all those trigo functions you are using. This is because the compiler trigo functions are VERY expensive (take time) and its bad to use them while rendering. So suppose we use cosine in our program, what we do is, make a table (array) of all the angles that the program might use and a table (array) of the cosines of those angles. So the speed boost is immediate, as picking a random element in the array (random access) is a very fast operation. Now generally we have an idea of the range of angles that we might use, and the smallest required intervals between each angle, for example, we might just need whole number angles, not fractional angles, thus we need to make, say, just 90 cosines. Or if the smallest step between angles is 0.5 degrees, we need to calculate the cosine of 0.0, 0.5, 1.0, 1.5 and so on.. while the program is starting up. After that we need to just look up the table and get the cosine very quickly. Incase the angle required is not *exactly* present in the table, like we need 13.7 degrees and we have 13.5 deg in the table, it makes sense to pick the cosine of 13.5 and be happy with it, since in Graphic programming, for real-time work it is smart work to make small compromises here and there to gain an great deal of speed.

PS: I think I need to change the blog theme. I tend to write long entries and this theme squeezes the text into a narrow column making it look dauntingly long.

Friday, June 23, 2006

What I've been up to.. procedural programming.


Alright, as usual, I was too lazy to actually carry through my idea of that "Interesting Projects.." column, which I described in my last post, and in another previous post which is also refered to in that previous post of mine. Anyhow, the point is that I probably won't be writing about the Morse Code-based Wireless Network I was developing from scratch, because I've been recently swept off my feet.

Not in 'The Pool', which I should tell you I have discontinued swimming in for a variety of reasons, but in my mind. I was hanging out at a friend's, and he showed me the video of a game by Will Wright (of The Sims fame) called Spore. The video was actually Wright's presentation at the Annual Game Development Conference where he presented his (and his team's) demo. I've never actually *awaited eagerly* the release of any game, or any movie, but this time, I do believe we are about to see a really revolutionary game which I just can't wait to get my hands on. I am probably not interested too much in the game, but in its source code and the concept that it uses to do the incredible things it does. Now that probably would be expected from a devoted game programmer, but how would you react if you could play a game where you see the entire universe in real-time, zoom in to any of the trillion^n stars with full detail, land on any of the near-infinite planets with full detail, control the evolution of species right from the stage of a single-celled organism to a developed animal, blow up planets with a *Death-Ray*, basically do whatever you want? All in real-time. Just imagine the amount of processing that would be required to simulate each and every cell/living-being, every planet, asteroid, comet, star, galaxy in the whole universe, every 50 milliseconds? Imagine the amount of storage required to store the data of all those things, all those cells and animals etc etc.. let's take a guess.. 40 Gigabytes? 40 Terabytes? Much more? Absolutely correct. Much more is indeed the amount it would take to store all that. So what am I talking about? What if I tell you that there exists a method to store all that data in about 10-15 bytes? That method is called Procedural Programming, where all the data required is generated on the fly.

Ever wondered why your games are always growing in size? Unreal 2000 came in about 1 CD, Unreal 2003 came in 3 CDs, Unreal 2004 came in 6 CDs, GTA San-Andreas takes about 6 GB of hard-disk space, and it goes on.. Why? The reason lies in the fact that the gamers are always demanding more realistic environments, we're not satisfied by barriers inside games, where you hit an invisible wall and can't go explore outside it. So they're making more detailed games, generating each and every building inside a city so that a player can explore all of them, generating textures on walls, basically generating everything beforehand. So if GTA-SA with just the measly amount of detail it has, takes 6 GB of space for code and data, imagine what the game I described in the above paragraph would take. In comes procedural programming. Procedural programming, as the name suggests, generates all the content on-the-fly. Each and every building is generated on the fly if required, all those textures are rendered on-the-spot if required. Now here is the important point, it doesn't simply generate *all* the buildings. It only generates what is *visible* to the player. How do you know if the universe still exists behind your back? This is the fundamental beauty of procedural programming. The time taken to render a single building or stuff like that is really quite small. As you may realise, procedural universes make a few compromises, but give you back much more. For example, with the simplest model I've described, its not possible to allow the player to *hear* a very loud bang at a place that he can't see, but neat programming tricks and hacks mostly tackle all those limitations.

At the heart of a procedural engine lies a seeded (or pseudo) random number generator, which essentially takes in a number and gives out a random number, except that the same random number is returned if the same original number is entered. So if I enter 5, it might return, say, 3.5. If I enter 23, it might return, say, 75. But every time I enter 5, it will always return 3.5, every time I enter 23, it will always return 75. That's why its a *seeded* random generator. The entered number, called the seed, gives us the same final number each time. So here a starting seed number is used, say A, and passed to a random number generator. The output number, say B, might be used to give us the number of buildings in the city. Now we iterate (loop) as many times as B (no. of buildings) and increment a particular variable, say 'i', by one each time. Inside each loop, we can pass the sum of B and I to the random generator, to get a particular number, say C, whose first two digits might give us the X coordinate of the building, and next two digits give the Y coordinate, and the next two digits, say D, are the seed for each building. Now we pass D into the random gen and get another number that might describe the properties of the building etc. And using a straight-forward technique we check which all points are visible to the player, and we generate only those buildings using the properties generated from D. All from a simple single number. As you can see, this need not end at just buildings, we can keep using the numbers after each random generation for more details, number of windows, number of people working in it, type of flooring, and the level of detail can be infinite. All from a single, say, two-digit number.

Obviously GTA-SA cannot be made using this method because the developers want a particular scripted story to play out, and the city generated in the above example is randomly generated. So trying to get a *particular* city, say, New York, using simple procedural generation would be worse than looking for a grain of sand in all the beaches of the universe. The entire city is generated from a single seed number, if we change the starting seed number (A) then the entire city changes, in all aspects. Big games have begun to use this kind of approach for small things, like making the textures on the walls, making rocks etc. But Spore is the first game that uses this approach for the entire game at the level it works on i.e. the universe.

Read up Procedural Generation on Wikipedia to get a better idea (and links to better descriptions). So what I'm upto is a universe-in-a-nutshell (literally!) thing, in the most efficient way. BTW, just as a side-info, procedural programming is almost as old as computers are. Ever heard of demoscene games (called demo-games) that work and look as good as Unreal 2003 and are all packed in just 96 KB (including the executable)? Look at .kkrieger. Demoscenes have been around ever since computers were first found as little electronic things like the Commodore, probably earlier.

I've been doing nothing else with my life, just programming, staring at TV since it has nothing interesting to watch, and sleeping.