Input handling, compression, and I/O
ECSE's replay system is tied in many ways to its input system, so I'll cover it briefly here.
One of my goals with ECSE was to make user input as painless as possible to manage. To me, this meant designing a system that would allow easy re-binding of keys while also allowing digital inputs (e.g. keyboard keys and controller buttons) and analog inputs (e.g. analog sticks and pressure-sensitive buttons) to be used interchangeably without a lot of boilerplate code.
To set up the system, the user creates two enums: one that describes input "modes" (e.g. Keyboard, Controller, MouseAndKeyboard), and one that describes "bindings" (e.g. Jump, Attack, MoveHorizontal). For each mode, the user then assigns input "source" functions to each of the bindings. These functions may return an int, a float, or a boolean value between -1 and 1 inclusive. For instance, a user could bind MoveHorizontal to a function that returns -1 when A is pressed, 1 when D is pressed, and 0 otherwise when in Keyboard mode; then in Controller mode, they could bind the same binding to a function that returns a float between -1 and 1 based on the horizontal axis of the joystick. Generally, the user won't even have to write these functions themselves, as shortcut functions are provided to easily bind single keys, pairs of keys, joystick axes, etc.
When the user wants to poll input, they call a function with arguments specifying a binding and a type (either int or float). The source function corresponding to that binding for the current input mode is called, and its return value is converted to the requested type. For instance, if the user requests a float value and the input source returns a float, then they get its original value; if the input source is an integer, then the integer is cast to a float before being returned. Going the other way, if the user requests an integer from a floating-point source, then the float will be compared against a threshold (user configurable per binding) to convert it to either -1, 0, or 1. This is very convenient for setting up things like movement controls, such as in the example below:
void setup() {
// -1 when A is pressed, 0 when neither is pressed, 1 when D is pressed
manager.bindInput(MoveHorizontal, Keyboard, sf::Keyboard::A, sf::Keyboard::D);
// X-axis of joystick 0, scaled to [-1, 1]
manager.bindInput(MoveHorizontal, Controller, 0, sf::Joystick::X);
}
void movePlayer() {
// If the input mode is set to Keyboard, this will give us either -10.f, 0.f, or 10.f.
// If the input mode is set to Controller, then the value can vary between -10.f and 10.f depending on the joystick's position.
float xSpeed = 10.f * manager.getFloatValue(MoveHorizontal);
position.x += xSpeed;
}
Effectively, the heavy lifting of taking a variety of input methods and converting them into a single number representing player movement is handled behind the scenes by the engine, making it fairly painless to support many different control schemes.
As all input in ECSE passes through the input manager, and since the engine is set up to be entirely deterministic, playing back replays is really just a matter of playing back the same series of inputs. Further, ECSE actually stores all inputs as integers internally, including floating-point input sources. This makes it very easy to save and retrieve our input data: each frame while recording, we can write out the internal integer value corresponding to every input source. When playing back a replay, we assume that the same number of inputs are bound, read that many values from the file, and update the input sources accordingly.
Naturally, this results in large replay files: input updates occur dozens of times per second. If each binding's value is a 4-byte integer, then at 60 updates per second, a game with 10 bindings will use 2.4 KB per second. Though certainly a significantly smaller file than a video recording, we can do much better!
The first, easiest optimization is to reduce the precision of our floating-point values from 32-bit to 8-bit. This isn't really noticeable to the user, and since all our inputs use the same representation internally, this allows us to store our boolean and integer values (which only have 2 or 3 possible values respectively) in 8-bit form as well. Effectively, this reduces our file size by 75%. Not bad, but we can still do better!
The next optimization I made was based on a Gamasutra article by Cyrille Wagner: we should avoid recording every input on every frame, because much of the time, we're just writing the same numbers over and over. Instead, we can check which input sources actually changed each frame and save only their values. On each frame, we write out the number of inputs that changed, then for each input, we write its 8-bit mode ID, its 8-bit binding ID, and its 8-bit value. Obviously this adds more overhead per input, as we have to write 3 bytes instead of 1. However, unless the player is pressing and depressing every input on every frame, we will save considerably in the long term, as we only have to write those 3 bytes once for each change.
Cyrille Wagner's article also points out that floating-point values wreak havoc on a change-based replay system: even if the player is holding their analog stick in one direction, it will likely wiggle around slightly, meaning that we'll have to re-record its position many times, effectively defeating the point of only saving changes. To reduce this problem, I round floating-point inputs down to only 16 possible values rather than the 256 afforded by an 8-bit value. This means that even if the joystick moves slightly, the change won't be recorded. It does come at the expense of lower input accuracy, but I found it to be barely noticeable in playtesting — and, of course, it is easily configurable by the engine's user if they want to increase accuracy at the expense of larger replays.
At this point I made one last optimization: on frames when no inputs have changed (which, ideally, should be most of the time), we still have to write out a 0 to indicate that no inputs have changed. We can avoid this by only writing on frames when input actually changed. This requires us to indicate at which frame the change occurred. My initial thought was just to write out the frame number. But what size of integer do we use to represent the frame? ECSE uses a 32-bit number to keep track of the current frame, but this would require writing an additional 4 bytes for every frame in which a change occurred. If we reduce to a 16-bit number, then a replay can only by 2^16 frames long (at 60 FPS, that's about 18 minutes). Neither of these sounded ideal to me, so I looked for a middle ground.
I then realized that while actively playing, players tend to change the an input source's value at least every second or two, if not more. At 60 FPS, this means that the time between input changes will generally be less than 256 — enough to neatly fit in an 8-bit integer! Rather than writing the frame at which the input change occurred, we can just write the number of frames elapsed since the last change (which I call "delta time"). Of course, the value may sometimes be larger than 256 if the player isn't pressing buttons for an extended period of time. In that situation, we write 255 for the delta time, followed by the full 32-bit representation of the frame on which the input change occurred. Effectively, we fall back to the more expensive 32-bit representation only when necessary.
At this point, the replay system works very well for my purposes and produces incredibly small files. However, there are still some areas that could be improved.
One drawback of an input-based replay system is that fast-forwarding and especially rewinding the replay can be very expensive as we have to simulate every frame sequentially. The solution to this problem would be to save occasional "key frames" so we can start from the middle of a replay and simulate a few frames up to the desired position rather than starting from the very beginning. This would require a way of serializing the game's state, which is not currently built into ECSE, so it would be a big undertaking.
Another downside of this system is that it doesn't handle mouse position very nicely. The mouse position changes frequently, and unlike an analog stick, rounding its value leads to ugly consequences, so storing its 32-bit X and Y coordinates can bloat a replay's file size by quite a bit. At the moment, the best solution I've found is to disable saving the mouse's position if it's not being used, which is an option provided by ECSE out of the box. If the mouse position is needed, replays inevitably become larger for the time being. Saving the mouse's position as a delta in a similar manner to the delta time mentioned above may be one means of reducing this impact somewhat.