Tuesday, October 1, 2013

Space Tyrant: Multithreading lessons learned on SMP hardware

From:  http://librenix.com/?inode=6861 :




There is much to report in this update of Space Tyrant. Before getting into the new features and functions, I’ll dispense with the crisis of The Bug.

For a couple of weeks, we had been noticing odd anomalies with Space Tyrant (ST) running on the virtual server at Ioresort.com (now offline -Ed.). We never saw the problem on any other box -- and it was tested on at least four other Linux boxes and a Mac OS X system. We did all manner of stress testing, locally and over the Internet, script based and even feeding the game the output of /dev/random. Nothing caused the anomaly on any other box.

At first, I suspected that it might just be an obscure problem with the virtual server itself; after all, I had been forced to modify the TLR code to get it to run properly there. That problem turned out to be merely a limitation of NFS, not a bug with the virtual server software. However, the environment was clearly different from any other system I had used which raised my suspicions -- and reduced my urgency about looking for the bug.

While the bug wasn’t frequent, it was persistent. The bug appeared to be related to corrupted buffers or corrupted buffer indexes. Out of idle curiosity, I lowered the number of buffers used by ST to see if that affected the bug. Somewhat counter-intuitively, it substantially raised the frequency of the problem.

Brian Estabrooks (the hero of this release) and I spent more and more of our efforts hunting this incredibly elusive bug until that was all we were doing. I implemented various diagnostic routines hunting for clues. The all seemed to point to buffer indexes being changed incorrectly. Both Brian and I audited the code. It seemed impossible for the indexes to be changed improperly. Brian even went so far as to replace the ring buffer scheme with a high watermark approach but to no avail.

While I continued to suspect it to be a simple logic error in the code, Brian turned his efforts elsewhere. What he came up with was quite interesting. It seems that on many hardware architectures (most? all?), modifying a bit field can temporarily modify other bit fields in the same word! Now, this isn’t a problem on a single-CPU system; it repairs the damage in the same operation, making it, effectively, atomic. On an SMP machine, however, two different CPU’s working on different bit fields of the same word simultaneously create havoc. The operation isn’t really atomic and it doesn’t work.

Did I mention that the virtual server is a 4-way Xeon system?

The ring buffer indexing in ST relies on unsigned integer bit fields to automate wrapping back around to the first buffer after using the last one. My parsimonious programming, of course, packed all the bit fields together, several to a word. Brian’s test version of ST added a pad after each buffer index to round it out so that each bit field lived alone in its own complete word. We abused the new version for nearly an hour before either of us would dare say it. The bug was gone.

Yay!

So, the moral of this story is: Operations on sub-word fields affect other bits in that word (at least on many hardware architectures). Tread very carefully if multiple threads are accessing different bits in shared words. It may appear to work perfectly, only to crumble into a pile of smoldering rubble the first time it's loaded on a multiple CPU system!

Other than the primary lesson, some other good things came out of (the search for) the bug. Several other latent bugs were found and fixed and Brian and I are both much more intimate with the code.

And, on to the enhancements. ST is starting to look like an actual playable game. The following functions implement the new major features.

players(): We now have player rankings. It works by adding all the players’ ship resources to an integer array. Then it scans the universe looking for deployed fighters and adds those to the array as well. Currently, those two items comprise the total strength of a player.

It then sorts the array with a recursive bit-plane sort that I wrote for Starship Traders in 1998. The qsort() function in the C library was plenty fast, but took too much memory for my taste. Memory was a bit scarcer in those days and, worse, the SST software model gave each player his own copy of the server.

The sort reorders the array in place as follows. It scans the high-order bit in each element of the array. It then moves all elements starting with ‘1’ bits to the top and all starting with ‘0’ bits to the bottom. Next, it calls itself twice to reorder the first and second chunks of the array on the second bit. Each of those two instances of the sort then call the sort twice again, now giving 4 new sorts for the third bit, and so on. When all 32 bits are accounted for, the array is in the correct order with the top player on top, etc.

Scanning the entire universe can be expensive with a large map. Therefore, the player rankings function keeps the result and time stamps it. If another player asks for a player ranking within five seconds, the system just gives them the old one. After five seconds, however, any new request triggers a fresh listing.

autopilot(): We’ve added an autopilot to let a player find a specific sector -- or to locate the nearest planet. If you type a ‘0’ (zero), you’ll be prompted for a sector number within 1000 of the sector you’re currently in. You then will have the option of pressing ‘/’ to automatically warp to the destination sector.

If you’re looking for a planet, type the ‘L’ command that you would normally use to land on a planet in your sector. In the absence of a planet, the L key will engage the autopilot which will search for the nearest planet and give you a ‘/’ command to autowarp to it.

The new autopilot function consists of two other functions in addition to autopilot(), which is merely a control function. I had intended to use the old shortest path algorithm function from TLR but it was big and complicated. I decided to try to write a simpler, recursive shortest path algorithm instead. The new recursive function is much simpler but not quite as efficient as the giant for loop in TLR.

The actual algorithm is implemented in two functions called pathdepth() and pathcalc(). The pathdepth() function repeatedly calls pathcalc() with an increasing ‘depth’ parameter. ‘Depth’ tells pathcalc() how many levels deep to search before giving up.

The pathcalc() function simply looks to see if the sector it is looking at is the target sector. If not, it calls itself for each new sector that the current sector connects to. If the current sector is the target sector, it starts filling in an array for the autowarp() function to follow to reach the target sector. As the previous recursive calls to the pathcalc() function exit, they fill in the remainder of the path array.

And, yes, I seem to like reinventing the wheel. ;-)

The other interesting addition to the code is the backup thread. It is implemented by a function called backupdata() and works as follows: It scans the player data, the map data, and the history data looking for ‘dirty’ flags. (Whenever any persistent data is changed anywhere in the game, a dirty flag is set to tell the backup thread to write it out to disk.) This process is quite fast for a small game, but for a game with millions of sectors, it’s a significant waste of resources to scan the dirty flag array frequently.

Therefore, for the map and history data, I’ve implemented a ‘dirty block’ scheme as well. When a dirty flag is set, its corresponding dirty block flag is set too. Then, the backup thread need only scan the dirty block arrays, typically only about one percent the size of the arrays it represents. When a dirty block is found, only the hundred or so records it points to are scanned to find the actual dirty records for backup.

The backup file, named ‘st.9999.dat’ -- where ‘9999’ varies with the port number you run the game on -- goes into the current working directory from where you start the daemon. If the file doesn’t exist, a new game is started. Also, if you’ve modified the game in a way that changes the size of the data -- by increasing the map size, for example -- it will start a new game upon startup.

The game can be shut down from the command line by sending a signal 15 (kill -15 pid) or by the admin with the ^ command. Note that the first player to create an account in a new game automatically becomes the admin of the game!

makehistory(): The storing of historical data is new as well. Whenever another player attacks your ship while you’re logged off, you’ll get a report of the action and any losses when you next log on. Also, for remote deployed fighters, you never get immediate notification, so that information is stored in the history log even if you're logged on when it happens. You can view any accumulated event information since your login time by pressing the ‘e’ key.

deploy(): This simple function allows a player to deploy, or retrieve, guard fighters in a sector. Those fighters will not let another player pass through or view any of the contents of that sector. Any ships parked under the fighters are automatically protected against all attacks except for an attack by the fighters’ owner. Once the fighters are destroyed, of course, all ships there are visible and can be attacked.

There is also a newly implemented time limit in the game to limit the total online time of a day’s sessions to 4 hours. Like most other parameters, it can be changed by modifying a #define statement near the top of the code.

command(): The help page, a menu of available commands that a player can perform, has been redesigned and rewritten. This menu is attached to the '?' key.

The old debugger thread is gone, replaced by an in-game command function called showdata(). Press the ‘z’ key to see information on buffers, buffer indexes, and the backup thread’s state and history. Only if you’re serious about modifying the code will this information be useful.

The section of the gameloop thread that broadcasts radio and news messages has been modified to show only one of each type of message per pass. That way, replaying a long radio history won’t flood the output buffers and longer radio and news histories can therefore be retained.

The old jumprtn() movement function has been consolidated into the warprtn() function. It’s only slightly more complicated than having them separate.

The current source code can be downloaded from http://librenix.com/st/st.158.c. and the original article in this series is here. As usual, the compile script is embedded in the comments at the top of the source file. You’ll have to rename the source st.c for the script to work unchanged.

[A Space Tyrant home page has been created as a central index to the various ST articles, links, and files.]

No comments:

Post a Comment