March 9th 2018 Update
Still working on moving office (it'll be a while - we have a *lot* of stuff!), but still finding time to get some things done. I put off a couple of large pieces of feature work, and mostly focused on foundational items because I've not had a lot of uninterrupted time - but lots of small time blocks. (There's one exception - a new ECS!). There's a new pre-alpha build on itch with these changes in it.
* The back-end code that handles shadows is a LOT more efficient now. It uses a geometry buffer to emit to all the cubemap layers at once, and I optimized the living daylights out of it. It's quite a lot faster as a result. It also shows somewhat better shadows, even on low shadow-map detail levels.
* Implemented a "world flags" SSBO (shader buffer object, an OpenGL 4.3 feature). When the terrain changes, it copies the maps "flags" vector into GPU memory (it's an int32 bitset). It's awesome to be able to map it and do a simple `memcpy` rather than all the stupid fiddling with vertex buffers and attributes. This is used by other shaders.
* Reworked the terrain cube generation code slightly, fixing some "winding" issues; cubes are now ALWAYS wound correctly, so enabling `GL_CULL_FACE` will correctly remove only the inivisible cube faces. This gave a good performance boost, particularly on my low-end laptop.
* Rewrote the sunlight code (big dark shadows from trees were annoying me, and while it looked good it wasn't good to play). The sun/moon shader polls the new "world flags" bitset to determine if a tile is indoors or outdoors, and lights outdoor tiles with the sun/moon. I removed sun/moon shadows, they were too confusing (and slow). It looks pretty decent right now.
* Added an exposure-based tonemapper, and some *primitive* code that determines exposure based on the average lightness of the screen. It needs work, but it already looks better on dark scenes.
* Re-enabled the bloom code (smudging bright lights), with less stupid blur code. It's very subtle (I *hate* the overdone bloom in some games - the blurriness gives me a headache), and I'm reasonably happy with it.
* The engine now knows how to load compute shaders.
* Ran into some horrible issues with game saving/loading. I ended up rewriting my ECS. See below.
* Re-enabled the "wish" command `sploosh`. It sets the top of the sky to be full of water, which then stress-tests the fluids system by having it all fall and make a mess.
* Savegames now use Cereal's binary format. I thought XML would be helpful for debugging, but it really wasn't. Starting savegames went from 1.5mb to 120k!
* Optimized the living daylights out of the fluid dynamics code. Ended up not using a compute shader for now (I'll almost certainly try it), after my "lets write this for the CPU to see how it should work" pass was successful for anything less than a truly extreme amount of water. Had a lot of fun playing with various scenarios in which people drowned themselves. Also identified a few problems with terrain generation giving me a river that wasn't water-tight. Spent some fun time adjusting worldgen to produce rivers that don't leak (I seriously love worldgen).
The New Entity-Component-System (ECS)
I've had a pretty solid ECS going for 2.5 years now (closer to 3, my ECS actually predates the game). It's worked fine in various other games, and has always been a good performer. It was based on [EntityX](https://github.com/alecthomas/entityx), which always impressed me. I ran into problems with saving/loading games that have been going for a while (new games work every time), and after some *very frustrating* debugging realized that the ECS wasn't consistent in the ID numbers it assigned to component types. Rather than force you to register every type upfront, it used a static to determine a `family_id` by component type on registration (and also looked these up when it saw a component). In a long-running game that introduces new components that aren't seen during world-gen, it becomes increasingly unlikely that they will always get the same `family_id`. That leads to *really* confusing bugs, and things generally falling apart after save/load. Oops. I spent several hours trying to fix this, before concluding that some of my underlying designs in the ECS were incorrect for this type of game (they work fine for more predictable games). I also identified some inefficiencies I could fix while I was at it.
I'm a big believer in forcing myself to use interfaces, so fortunately every access to the ECS in-game goes through one interface. So I created a test project, implemented the same interface and rebuilt the entity/component store and query mechanisms from the ground up - writing tests as I went. Once it passed every test I could throw at it, I put it into NF - compiled and ran it. Other than having to clean up after deciding to use `int` rather than `size_t` in a couple of places, it ran *really well*. Faster than before, and load/save worked every time. :-) (This took a total of about 6 hours; I didn't sleep much that night...)
So first up, I made the ECS require that you declare all the components with which it can interact up-front. NF uses a *lot* of components (98 last time I counted); this led to a truly horrific statement:
using my_ecs_t = bengine::ecs_t<position_t, designations_t, farming_designations_t, ai_tag_work_farm_plant, ai_tag_work_guarding,
ai_mode_idle_t, ai_settler_new_arrival_t, ai_tag_leisure_shift_t, ai_tag_my_turn_t, ai_tag_sleep_shift_t, ai_tag_work_architect,
ai_tag_work_building, ai_tag_work_butcher, ai_tag_work_farm_clear, ai_tag_work_farm_fertilize, ai_tag_work_farm_fixsoil,
ai_tag_work_farm_water, ai_tag_work_farm_weed, ai_tag_work_harvest, ai_tag_work_hunting, ai_tag_work_lumberjack,
ai_tag_work_miner, ai_tag_work_order, ai_tag_work_pull_lever, ai_tag_work_shift_t, architecture_designations_t,
bridge_t, building_t, building_designations_t, construct_container_t, construct_power_t, construct_door_t, construct_provides_sleep_t,
entry_trigger_t, receives_signal_t, smoke_emitter_t, turret_t, designated_farmer_t, designated_hunter_t,
calendar_t, camera_options_t, claimed_t, corpse_harvestable, corpse_settler, designated_lumberjack_t, explosion_t,
falling_t, game_stats_t, grazer_ai, health_t, initiative_t, lever_t, lightsource_t, logger_t, name_t, natural_attacks_t,
renderable_t, renderable_composite_t, riding_t, sentient_ai, settler_ai_t, sleep_clock_t, slidemove_t, species_t,
stockpile_t, viewshed_t, water_spawner_t, wildlife_group, world_position_t,
item_ammo_t, item_bone_t, item_chopping_t, item_digging_t, item_drink_t, item_farming_t, item_fertilizer_t,
item_food_t, item_hide_t, item_leather_t, item_melee_t, item_ranged_t, item_seed_t, item_skull_t, item_spice_t,
item_topsoil_t, item_t, item_carried_t, item_creator_t, item_quality_t, item_stored_t, item_wear_t, designated_miner_t,
I was afraid that this would kill compile times, but it had the opposite effect: all the component headers get parsed once, so the compiler is able to re-use them a lot. I also worried about size limits on parameter packs, but I haven't hit them.
The biggest change is in how components are stored/identified. Previously, I didn't enforce registration up-front, so the ECS did a little dance with a template `component_family` and a static counter to figure out an ID#. Since I didn't know the component types at compile-time, but didn't want to force components to adhere to an interface, a lot of things were dynamic. A `component_base` class contained `is_deleted` and the `entity_id`; a templated `component_holder` class then inherited from that base to decorate it. Finally, components were stored in a `vector` of `unique_ptr` to the template `component_holder` type (which in turn forced me to register a TON of base-types in Cereal). These were themselves stored in another `vector`, indexed by `component_family` index. It worked, was pretty fast, but relied upon a number of compiler optimizations to perform well (vtable elimination, and the unique_ptr being optimized away).
The new system determines `family_id` at compile-time, via an `std::index_sequence` (and the near-magical `std::create_index_sequence_for` function that makes a sequential ID # for each entry in a tuple). So *all* of the static incrementer is gone, and each component is *guaranteed* to get the same ID # each time. I eliminated the inheritance by making `component_holder<C>` a simple struct of `entity_id`, `is_deleted` and `C component` - so it simply "decorates" your component by adding an int ant a bool to the beginning. And since we know all of the types up front, allocating component storage becomes:
std::tuple<std::pair<size_t, std::vector<component_holder<Components>>>...> storage;
That lets me use `std::get<std::pair<size_t, std::vector<component_holder<Component>>>>` to retrieve the correct storage blck at compile time (and allocate the initial vectors at compile time!). The ECS constructor populates the `size_t` with the family_ids (helpful in some retrieval code). So the new system eliminates the `unique_ptr` indirection, guarantees that the vector objects (*not* their contents, which the `vector` will allocate on its own) are adjacent in memory for happy caches, and eliminates the need for a vtable/virtual lookup completely. When querying, a lot of the type lookups are now purely compile-time, which is more of a performance win than I expected.
This gives a number of advantages:
* Determining the ID# (`family_id`) of a given `Component` is now a compile-time task, and I can return a handy error message via `static_assert` (rather than 100+ lines of gobbledegook) if a component isn't registered. So *every* function that needs to determine the IDs of component types can do so with zero overhead.
* This lets me store components in a big `tuple` of `vector<component_holder<C>>`. Accessing it via `std::get<vector<component_holder_<C>>>` is also compile time, so there's zero overhead for finding the right component store. (It previously used a `static` counter and instantiated an empty component to find the ID, which apparently was slower than I thought!)
* Since `family_id` is determined at compile time, the `bitset` of which components an entity has is also sized automatically. I can compile-time generate a list of bit numbers to test given a variadic parameter pack of component types, so a complex query is very, very fast.
* It eliminated any allocation from query calls.
* It really cleaned up my serialization code. No more polymorphic type registrations for Cereal!
* Much to my surprise, it compiles really fast.
The net result is a big speed increase, one I really wasn't expecting!
Leave a comment
Log in with itch.io to leave a comment.