For my korlibs libraries I have several useful classes to be used for games where allocation-free is really important, and sometimes you have to do less appealing code to make it fast and stutter-free frames. Inline classes are already available as experimental feature in Kotlin 1.2.30, this will allow to use primitives and still be able to use concepts like TimeSpan, RGBA, Angle, Anchor and stuff or for example reuse a double vector, as an Int vector without recreating instances. And that’s pretty handy. But still, there are places where we need several fields. For example Vector2 or Rectangle. For Vector2 we could even pack two floats in a single Long, but Long in JS at the moment and before BigInt is really spread, requires allocation and packing a vector might not have the expected performance. Or maybe yes? We should try first.

At any rate, in KorMA I have an immutable Vector interface (Vector2) and a Mutable (MVector2) vector implementation. The MVector2 also have some methods to make vector operations without allocations, but you have to be explicit about the who contains the output and instead of operators, you have to use less appealing functions. So after a bit thinking about how Kotlin/Native was handling memory areas, I decided to do an experiment:

I had this code for rendering TileMaps in KorGE (that preallocates tt2, … tt8 in the TileMap instance), and calls the setToAdd and setToMul methods to compute the points for each tile fast. This code is allocation-free per frame, but is pretty hard to read:

And with the trick I implemented, the code changed to (I’m computing p0, p1, p2 and p3 using plain + and * operators), and still the code is allocation-free:

How it works?

Note: in addition to the trick itself I’m using the inline + Number trick to allow to use Int, Long, Float and Double arguments with a single signature without having to box stuff.

The idea here is to preallocate a list of points big enough. (I could grow it dynamically or create a global pool, but that would introduce a overhead I want to avoid at least until benchmarked).

And then define the Vector2 operations inside this context. It keeps a pointer to the current unused point, the invoke is in charge of making it work like an alloca (allocating in the stack, and thus dropping the stuff once the block exits).

The defined operators inside this block, allocate one point per call and can be reused once the block is done. The overhead is very low since we have a single array with a pointer, and entering the block only changes the pointer and it is inline.

Of course, you can only use the points you generate locally to that function. If you want to make those points accessible from outside, you have to copy their values. This is not controlled neither at compile-time, editing-time or runtime, so you have to be careful about it. But still, it is nicer than having to allocate, or having to write complex method names and keep and manage temporal variables like in any non-stack-based assembly language.