GPU Instancing

Now that the overworld generation is ported, before I go onwards to porting resource generation etc, I had a sudden realisation. I have more weird rendering requirements, one of them being GPU instancing. If you are not familiar, it's a technique where you can display lots and lots of entities very cheaply by reusing some properties, e.g. the mesh or the texture atlas, and having some other properties unique (per-instance data) e.g. sprite index, transform, etc.

So, does Godot support instancing? Yes, using MultiMesh ... but multimesh is tightly coupled with Godot's rendering infrastructure, which is possibly tied to nodes being the units of rendering. Oh-oh. I've been trying to using MultiMeshInstance2D, but I've been very unsuccessful. It looks like the class is in some sort of messy state as of now. Also, the fact that Transforms are explicitly set from C# is not promising either, as I package things up in GPU buffers. There is a useful video that shows how you can use it, but it involves a lot of manual editor steps, which I'm not a fan of, at all.

Alright, now here's a crazy idea. Why not go down to the metal? The RenderingDevice, that I familiarised myself with, is quite low level and supports easily things like instancing. What's more important, it allows for fully custom rendering, which I like the sound of. Now, I can already hear you: "Why reinvent the wheel?", "Sounds very complicated", "Sounds like a rabbit hole" and so on. But, truth is, my rendering needs are very simple. So, excluding UI which can be rendered traditionally, what are my rendering requirements?

  • Render a quad with a shader (overworld background, level background layers, post-process effects)
  • Render many instances of a quad with a shader (sprites, particles, ... everything else)

Yep, that's it. Doesn't look that complicated now, right? So, what's needed from the RenderingDevice? creating/using framebuffer/texture/render pipeline/shader/framebuffer format/vertex format/push constants etc. All the gory details.

After a while, and crashing into some bugs on the way, and hitting limitations, I got a proof of concept: using the low-level rendering engine to do some low-level work (instanced rendering) and render to a framebuffer whose results are shown on a sprite! I can't begin to express how exciting this is. The cost is 0.8ms to copy from the framebuffer to the Sprite2D texture. Fine.

Now for the final test: create and bind a texture array and use that to render sprite instances. After half an hour, success! This changes everything. Proof of concept is complete. The non-UI game visuals will be rendered with the low-level rendering engine. I'm going to write a wrapper around the Vulkan code, so that it exposes exactly what I need (which is not that much).

A low-level custom render pipeline

This might seem like a rabbit hole, but my gut feeling says otherwise. The intent here is to use the low-level Vulkan wrappers to do most of the game rendering (including camera handling), and use the regular Godot functionality to render the GUI. The low-level wrappers are more complete and allow far greater freedom in rendering, but they are far more verbose. My rendering requirements are non-standard but limited in variety, so the verbosity can be handled non-painfully.

One fresh casualty of the custom render pipeline is the use of Godot's Camera2D node. Since the game graphics will be rendered on a sprite that fills the screen, this means that the Camera should follow suit and be defined at this low-level. Thankfully, it's not that hard to make a 2D camera, especially with helpers like the GLM library that can easily set up orthographic projection and view matrices easily.

Work has been progressing on this front, slightly slower due to some holidays, but so far what's achieved:

  • Rendering a full-screen quad with the "game"'s content (there's no content, just random sprite splatting for now)
  • Rendering instanced sprites
  • 2D camera, supporting pan/zoom/screenshake, working at this low level
  • Tests using multisampled rendertargets (I've got some pixel swimming issues)
  • Utility code to build/cache texture arrays including mipmap generation: the texture arrays will be used as the sprite atlases

Closing with a video of a custom camera panning/zooming/shaking over 10 million sprite instances at 60fps. Not bad!