5/20/2013

05-20-13 - Thoughts on Data Compression for MMOs

I've been thinking about this a bit and thought I'd scribble some ideas publicly. (MMO = not necessarily just MMOs but any central game server with a very large number of clients, that cares about total bandwidth).

The situation is roughly this : you have a server and many clients (let's say 10k clients per server just to be concrete). Data is mostly sent server->client , not very much is client->server. Let's say 10k bytes per second per channel from server->client, and only 1k bytes per second from client->server. So the total data rate from the server is high (100 MB/s) but the data rate on any one channel is low. The server must send packets more than once per second for latency reasons; let's say 10 times per second, so packets are only 1k on average; server sends 100k packets/second. You don't want the compressor to add any delay by buffering up packets.

I'm going to assume that you're using something like TCP so you have gauranteed packet order and no loss, so that you can use previous packets from any given channel as encoder history on that channel. If you do have an error in a connection you have to reset the encoder.

This is a rather unusual situation for data compression, and the standard LZ77 solutions don't work great. I'm going to talk only about the server->client transmission for now; you might use a completely different algorithm for the other direction. Some properties of this situation :

1. You care more about encode time than decode time. CPU time on the server is one of the primary limiting factors. The client machine has almost no compression work to do, so decode time could be quite slow.

2. Per-call encode time is very important (not just per-byte time). Packets are small and you're doing lots of them (100k packets/sec), so you can't afford long startup/shutdown times for the encoder. This is mostly just an annoyance for coding, it means you have to be really careful about your function setup code and such.

3. Memory use on the server is a bit limited. Say you allow 4 GB for encoder states; that's only 400k per client. (this is assuming that you're doing per-client encoder state, which is certainly not the only choice).

4. Cache misses are much worse than a normal compression encoder scenario. Say you have something like a 256k hash table to accelerate match finding. Normally when you're compressing you get that whole table into L2 so your hash lookups are in cache. In this scenario you're jumping from one state to another all the time, so you must assume that every memory lookup is a cache miss.

5. The standard LZ77 thing of not allowing matches at the beginning or end is rather more of a penalty. In general all those inefficiencies that you normally have on tiny buffers are more important than usual.

6. Because clients can be added at any time and connections can be reset, encoder init/reset time can't be very long. This is another reason aside from memory use that encoder state must be small.

7. The character of the data being sent doesn't really vary much from client to client. This is one way in which this scenario differs from a normal web server type of situation (in which case, different clients might be receiving vastly different types of data). The character of the data can change from packet to packet; there are sort of a few different modes of the data and the stream switches between then, but it's not like one client is usually receiving text and another one is receiving images. They're all generally receiving bit-packed 3d positions and the same type of thing.

And now some rambling about what encoder you might use that suits this scenario :

A. It's not clear that adaptive encoding is a win here. You have to do the comparison with CPU use held constant, if you just compare an encoder running adaptive vs the same encoder with a static model, that's not fair, because the static model can be so much faster you should use a more sophisticated encoder. The static model can also use vastly more memory. Maybe not a whole 4G, but a lot more than 400k.

B. LZ77 is not great here. The reason we love LZ77 is the fast, simple decoder. We don't really care about that here. An LZP or ROLZ variant would be better, that has a slightly slower and more memory-hungry decoder, but a simpler/faster encoder.

C. There are some semi-static options. Perhaps a static match dictionary with something like LZP, and then an adaptive simple context model per channel. That makes the per-channel adaptive part small in memory, but still allows some local adaptation for packets of different character. Another option would be a switching static-model scheme. Do something like train 4 different static models for different packet types, and send 2 bits to pick the model then encode the packet with that static model.

D. Static context mixing is kind of appealing. You can have static hash tables and a static mixing scheme, which eliminates a lot of the slowness of CM. Perhaps the order-0 model is adaptive per channel, and perhaps the secondary-statistics table is adaptive per channel. Hitting 100 MB/s might still be a challenge, but I think it's possible. One nice thing about CM here is that it can have the idea of packets of different character implicit in the model.

E. For static-dictionary LZ, the normal linear offset encoding doesn't really make a lot of sense. Sure, you could try to optimize a dictionary by laying out the data in it such that more common data is at lower offsets, but that seems like a nasty indirect way of getting at the solution. Off the top of my head, it seems like you could use something like an LZFG encoding. That is, make a Suffix Trie and then send match references as node or leaf indexes; leaves all have equal probability, nodes have a child count which is proportional to their probability (relative to siblings).

F. Surely the ideal solution is a blended static/dynamic coder. That is, you have some large trained static model (like a CM or PPM model, or a big static dictionary for LZ77) and then you also run a local adaptive model in a circular window for each channel. Then you compressing using a mix of the two models. There are various options on how to do this. For LZ77 you might send 0-64k offsets in the local adaptive window, and then 64k-4M offsets as indexes into the static dictionary. Or you could more explicitly code a selector bit to pick one of the two and then an offset. For CM it's most natural, you just mix the result of the static model and the local adaptive model.

G. What is not awesome is model preconditioning (and it's what most people do, because it's the only thing available with off-the-shelf compressors like zlib or whatever). Model precondition means taking an adaptive coder and initially loading its model (eg. an LZ dictionary) from some static data; then you encode packets adaptively. This might actually offer excellent compression, but it has bad channel startup time, and high memory use per channel, and it doesn't allow you to use more efficient algorithms that are possible with fully static models (such as different types of data structures that provide fast lookup but not fast update).

I believe if you're doing UDP or some other unreliable packet scheme, then static-model compression is the only way to go (rather than trying to track the different received and transmitted states to use for a dynamic model). Also if clients are very frequently joining and leaving or moving servers, then they will never build up much channel history, so static model is the way to go there as well. If streams vary vastly in size, like if they're usually less than 1k but occasionally you do large 1M+ transfers (like for content serving as opposed to updating game state) you would use a totally different scheme for the large transfers.

I'd like to do some tests. If you work on an MMO or similar game situation and can give me some real-world test data, please be in touch.

14 comments:

Yann Collet said...

> model precondition has bad channel startup time

Are you sure ?
It seems possible to pre-build an LZ-dictionary, and then just copy/paste it for each different stream.

Yann Collet said...
This comment has been removed by the author.
Sergey Schetinin said...

With UDP you could modify compressor context when receiving confirmations of packet reception by the client. Outgoing packets would also need to include info on what confirmations were received by the server at the time that packet was compressed, so that decompressor is in sync.

Am I missing something that would stop this from working?

cbloom said...

"It seems possible to pre-build an LZ-dictionary, and then just copy/paste it for each different stream."

Well, you're not just pasting the LZ dictionary, you have to also paste the whole match acceleration structure, which is usually several X larger.

But yeah, if the precondition is small, then the time is not bad. But then you're losing the advantage of the semi-static case that the shared (static) part of the model could be much larger than the adaptive part per-channel.

cbloom said...

"With UDP you could modify compressor context when receiving confirmations of packet reception by the client. Outgoing packets would also need to include info on what confirmations were received by the server at the time that packet was compressed, so that decompressor is in sync.

Am I missing something that would stop this from working?"

It seems like a huge mess. The confirmation messages would have to contain the order of receipt (that the client saw) of the server messages. The confirmation messages themselves could go back out of order. Every message from server->client would have to indicate what state was used for the encoding of that packet.

I'm sure something is possible but I can't imagine it's worth it.

Plus UDP is pretty out of fashion these days.

Fabian Giesen said...

"Plus UDP is pretty out of fashion these days."
What gave you that impression?

Game packets are usually UDP, at least for 3D games. TCP retransmits and resulting latency spikes are completely unacceptable for any game where players can see other player's characters.

cbloom said...

I've been seeing a lot of threads around the net about games switching from UDP to TCP. It used to be always UDP back in my day of course. (and we walked uphill both ways, programmed with vacuum tubes, etc)

The reason I gather is : too many headaches with UDP through NAT and firewalls and such, and TCP latency has come way down.

I could certainly be wrong, I'm taking the pulse of the game industry with a hundred foot long pole.

Sergey Schetinin said...

I agree doing that with UDP is a hairy problem and probably not worth it, just thought it was possible.

cbloom said...

@cbloom - yeah, I think you're just wrong about that. Twitch shooter type games are all UDP still.

nothings said...

Probably it's basically PvP requires UDP, PvE can live with TCP.

But arguably it's partly because TCP people aren't passionate about the craft and don't give a shit about turning people popping around due to 100ms latencies to people popping around even worse due to 400ms latencies.

Rasmus Bording said...

I worked on an MMO with heavy focus on PvP that used TCP exclusively. In practice there are so many things that can introduce delayed positional messages in an MMO that I dont think using UDP to avoid TCP retransmissions is worth it. Your time is better spent creating a robust positional extrapolation/interpolation system. This will cover up message delays independent of the cause of the delay. UDP is ideally better but guaranteed delivery and order of delivery is awfully convenient.

We did a simple delta compression on positional data and a fast LZ77 on all data sent to clients. It practice bandwidth wasnt an issue for us so we didnt do any more work on this.

I think its better to not compress/pack data on a per player basis, but instead divide the game world in tiles/cubes and compress all character/player positions in each tile in one go. Then all the compressed tile packets that intersect with a player area of interest would be sent to the player without further compression. The worst case of all players in the same place seeing everyone else can actually be handled in this way.

SkipIntro said...

. We were atand managed to get it down tosimple binary range encoder and an adaptive order, but we will have to go all the way down to's acceptable. Haven't really tried most of the options you mention, but I doubt they would work well for ~.

cbloom said...

@nothings - I suspect in cases like WoW it's not a question of lack of passion or technical skill. I imagine they had a pro/con something like :

1. UDP is better, but 1% of users will have problems that make them return the game vs.

2. TCP is worse but doesn't make people return the game.

nothings said...

Could be.

But that still reflects a triumph of commerce over passion.

old rants