Path-finding, architecture and more.

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

The flu sucks, and sadly I succumbed to it right after writing that I'd avoided it in last week's Sharing Saturday. I'm moving around again, but I've temporarily lost most of my hearing; this has made me a generally cranky individual, so I focused on cleaning up things in Nox Futura that annoyed me.

It's funny, but I have a long history of writing bizarre code when I'm sick (particularly if I have a fever, and take medication) - and then not really remembering how it works afterwards. For some reason, when I'm sick I just don't write code comments. Years ago, I woke up after a nasty bout of sickness and discovered that I'd written a complete SMTP proxy with virus checking facilities - and had no real memory of how it worked (it ended up in production for a while, after I spent a week figuring out my code!). This time was no different; I started to feel better on Wednesday, and marveled at the ray-marching rendering engine I'd written - and really don't understand. It's not sufficiently finished to use, so I stored it away in a git branch for study. Strange.

I'm also fighting the urge to make a Boulderdash roguelike (more specifically *Repton*, if anyone remembers that series of BBC Micro masterpieces).

New Path-Finding Engine

I'd known for a while that I'd eventually hit the point at which path-finding was my major bottleneck. Nox Futura generates a *lot* of paths - most tasks require an idea of where to go, and there's an ever-increasing number of tasks to evaluate as the game progresses. This is especially bad when you set a bunch of jobs, unpause and sit back to watch your settlers scurry to work; they mill around, regularly looking for things to do - and how to get there. So path-finding is a really critical part of the game.

I started out with profiling, and was surprised to see that my Dijkstra map generation was causing serious problems. Basically, it was walking the whole map too frequently - and generating the maps was sucking up nearly 20% of the game's CPU budget. There was also a *nasty* race condition going on, a result of a tiny bit of sloppy code in the asynchronous dijkstra walker. Fixing it turned out to be problematic; it basically needed its own copy of the region map, and that is too much data (256x256x128 = 8,388,608 cells; that's a lot of map data to copy around!). So, I massively reduced the number of Dijkstra maps the game uses, made the update synchronous, and optimized the heck out of the generation process. A lot of things that used to use a Dijkstra now store a set of "targets" instead. When a settler wants to get to a target, it sorts the targets by weighted distance (basically z-level changes cost more than x/y; it's rare to be standing on a staircase when searching for work) and runs the generic pathfinder on each in turn until one works. There are a few optimizations here; impossible targets are never added (first a check for "can I even stand on this tile?" and then a more generic "can this be reached from anywhere?" check that still needs a little tweaking). That greatly reduced CPU usage (down to about 10%), but I decided that this needs to be bullet-proof as the game progresses.

Fortunately, I've been good and all path-finding checks go through a `find_path` function (that accepts a start and end point, plus a bunch of flags for things like "is it ok to end up adjacent to the destination, rather than on it?" and placeholders for things like "am I flying?"). So I started there.

* The first optimization was to do a quick distance check on the path. If it's short enough, and both z-levels match, then it casts a line between the two points. If every point on the line has the `CAN_STAND_HERE` flag set, then it returns the line rather than calling an expensive pathing function. This cut CPU usage substantially, since a lot of movement is short and direct. (Thanks to u/Kyzrati for this one!).

* I then played a bit with caching paths, but it was so infrequent to need the same path twice that I ditched the idea.

* I added 10-directional movement (been meaning to do that anyway) (NW, N, NE, E, SE, S, SW, U, D). This shortened all the paths, and I was pleasantly surprise at how little broke when I did it.

So... I rewrote my A-star code. I was using this - and still am in RLTK (I'm planning a big RLTK update soon). It's good and fast, but the code isn't very readable, doesn't conform to what I expect from modern C++, and has a custom allocator that confuses the heck out of me. It also had a bad habit of stalling on very long paths.

So, I took a psuedocode description of how A-star works, and wrote my own. Mine is only 280 lines of code (including whitespace), and I was pleasantly surprised how simple it turned out to be - and how easy it was to optimize it to go faster than the previous one.

I know a lot of people wrestly with writing A-Star, so here is a run-down of the process I used, and what helped the most. Heavy benchmarking resulted in a few things defying conventional wisdom, which really surprised me. Note that this is optimized for my use-case; navigating in 3D across a dynamic world (the dynamic nature makes things like jump-point optimization *really hard*, since I'd have to also dynamically decide where "rooms" are, even after the player decides to build a castle with a twisty mine underneath it...).

Within a node, `f` is the sum of `g` and `h`. `g` is the distance from the previous node, and `h` is the distance to the goal.

* Nodes are a simple POD (Plain Old Data) type. The operator `<` if overridden to compare with the `f` field, but otherwise it's a dumb struct - nothing to stop the "trivial type" optimizations in the compiler. This made a surprising difference in general, and the assembly shows that VS goes to great lengths to make manipulations go at super speed (right down to keeping entire nodes in registers if it can).
* I took a look at how Boost does a-star (it's in there, but you practically need a Ph.D. to figure out how to use it). I really didn't like what I saw; in particular it uses exception handling to break out of the main loop. In my experience, using an exception for expected behavior is asking for trouble. So I went the other way, and made the whole thing `noexcept`. That only made a small difference to speed, but made the resultant code *much* smaller.
* Just about everything I've read says "your open list should be a heap". I tried it, and it worked - but looking at how it worked under the hood, I couldn't shake the feeling that it wasn't as efficient as I'd like. I tried `std::priority_queue` (which is basically a heap, under the hood) - and it performed about the same. Looking for a baseline, I wrote a super-trivial version that stores the open list in a vector. After successor-nodes are added to the current node (`q`), it uses simple-old `std::sort` to sort the vector. Well, it turns out that `sort` is super-smart, especially with POD types. On a mostly-sorted list (which the open list is, after the first couple of nodes) it is so fast that I found myself asking what black magic they were using! It appears to be at least partly cache-related. Anyway, for short paths the `vector` version is about 10% faster than the heap version. For long (200+ steps, open list of 500 or more) the `vector` version was 20-30% faster! So instead of a base-line, I accidentally ended up with really usable code.
* When you add a successor, you only add it if `f` is lower than any `f` (of the same target) in the open list. So rather than traverse the whole open vector, it bails out if `f` in the vector is higher than the successor's `f`. This uses a `goto`, which I'm not proud of (it's the first one!). Since the open list is guaranteed to be sorted at the time of addition, this saves a bunch of time (especially given my discovery that the `sort` is so fast I can pretend it's free!). This gave another good speedup, especially on big paths.
* The closed list is also checked based on `f`. I tried a few variations, and ended up storing it as a `boost::container::flat_map`, indexed on the tile ID and storing the lowest `f` we've seen (it turns out that there's no point in storing higher ones). This gave another small speedup.
* The parents list (used to reassemble the path at the end) also ended up being a `flat_map`. I tried a few variations, and this out-performed everything by a good margin.
* I carefully limited what each traversal checks. First, it grabs a reference to the whole bitset of flags for a tile (a `uint32_t`, with my own comparison code; `std::bitset` is awesome, but heavier than I need). Then it checks (for 10 directions) for each navigation flag on that tile (`CAN_GO_NORTH` and so on). It relies on the region map keeping these uptodate (they update when the map changes), but this kept the succession code small/really fast.
* Added some "bail out" code for paths that get too long. This probably means that not every path is solvable, but it prevents the node explosion (when you get pretty much the entire map in your open list) for paths that just aren't working out. I'll probably have to tweak the maximum path length; right now it's working fine.
* I played with the heuristic, and adjusted the distance cost for z-level changes. Most of the time, if you are pathing to something on another z-level, it isn't really 1 tile away when you are above it - you have to go to the stairs, climb down, and then path to it. So I multiply `z` by 50 on the input to the distance calculation, causing it to optimize on `z` before it optimizes on `x` and `y`. This made a big difference on some paths. It also uses distance squared, rather than distance - to avoid calling square root a lot.
* Experimented with using ints vs. floats for distances, and discovered that floats are actually faster on my system. Another surprise. (This shouldn't really surprise me, since I discovered a while ago that it's faster on my system to generate lines with float-based vector math as opposed to Bresenham).
* Added a ton of instrumentation, and set the game running for a few hours on its own. It recorded the eventual size requirements of the open list, closed list and parents lists. I then added `reserve` calls to ensure that 90% of path-finding never needs to allocate. This gave me another big speed-up.

So at the end of all of this, I have A-star code that I understand (always a win in itself); it's small, clean, and really performant. The whole game runs *much* faster than ever before. I'd say that was a win. :-)

Other Changes

Pathfinding ate up most of my available time this week, and was definitely worth it. I still managed to get in a bunch of small fixes:

* Moved various `bool` flags out of their own arrays and into the `flags` field per tile (which grew to 32-bits). Memory usage went down by a couple of hundred megabytes (that may be a sign that I have too many flags...), and performance improved as a result. Added a `get_flags_by_reference` call that returns a reference to the flag field for routines that check a lot of flags (pathfinding!).
* Fixed all the bugs I introduced last week, notably my string descriptors in fmt (I kept typing `%s` when I meant `{0}`)!
* Re-enabled the sun.
* Hunting is now a designated task; the "jobs" window lets you pick your hunters. Nobody will hunt without a designation (but they may shoot wildlife that worries them); if hunting is enabled, then the hunters make killing things their #1 priority in life. Getting this working was actually the straw that broke the original pathing code's back; but now it works great.
* Butchering is now a task for hunters. That will change when I write the animal husbandry arc (quite a ways off). It works pretty well now; after slaying wildlife, hunters will pick up the carcasses and haul them to your butcher shop. Bones, skulls, hide and meat are extracted (and can be used in various other processes).
* Fixed a funny rendering error that rotated voxel models and didn't rotate their surface normals. People are now lit from the right direction, even when facing away from the light now. Fixed the normals on ramps while I was at it.
* Added a one-liner that darkens tiles on a lower elevation than the camera. This makes it MUCH easier to see what's going on!
* Settlers and NPCs rotate their voxel models to face their current direction of travel.
* Fixed a stupid bug that wasn't properly saving settler's schedules/shift settings. So everyone was on the early shift, leading to a period from noon to midnight in which everyone was either idle/leisure or sleeping.
* Settlers "lie down" when unconscious or sleeping. It simply rotates the voxel model to be parallel to the ground. If you look closely, unconscious settlers are actually levitating at the moment by a few pixels.
* Started modernizing and fixing up the workflow logic. It is more reliable already. There are big plans for what this should turn into, so this is very much an ongoing project.

Screenshots

First up, the new z-layer based darkening feature:

Building a castle!

Next, I decided to really stress-test the new code by building a castle. Cutting down trees requires a path to the tree (and changes the map). Turning wooden logs into wooden blocks at the sawmill requires a path to the wood, and a path back to the workshop (and also tests the locking/claiming code pretty well). Architecture then requires a path to the block, and a path to the nearest buildable tile. There's over 300 blocks involved in this project, so I'm pretty confident that I stress tested the path-finding code!

I started by designating the ground floor constructions:

and then adding some walkways on the floor above:

I then designated half the map for deforestation, built a sawmill, added a few more designations and sat back for 2 hours while the settlers built the castle. 16 days into development, and we have a basic structure:

I let it run some more. 47 days in, and I've added some watch-fires:

here's a top down view:

And an ASCII view:

This represents a huge milestone. For the first time, I built a structure like this without using any cheat codes, it ran for 3 hours (real time) or 60 days (game time) without crashing, slowing down significantly, or noticably increasing its memory footprint. There were a few instances of the embarrassing "settler is stuck, activating emergency teleport to bed!" warning, but it generally ran really, really well. It's left me with a lot of confidence; I'm only 7 systems away from hitting my "phase 0" objectives!

Oh, and the game ran for long enough that two waves of settlers teleported down from the mothership. We had quite the posse of useless individuals!