In fact, it would let you send video just more like a normal predictive context coder. For each pixel, you predict a probability for each value. That probability is done with context matching, curve fitting, motion compensation etc. It has to be reproduced in the decoder. This is a basic context coder. These kind of coders take a lot of CPU power, but are actually much simpler conceptually and architecturally than something like H264. Basically you are just doing a kind of Model-Coder paradigm thing which is very well understood. You use an arithmetic coder, so your goal is just to make more accurate probabilities in your model.
Other slightly less ambitious possibilities are just using things like 3d directional wavelets for the transform. Again you're eliminating the traditional "mocomp" step but building it into your transform instead.
Another possibility is to do true per-pixel optical flow, and frame to frame step the pixels forward along the flow lines like an incompressible fluid (eg. not only do the colors follow the flow, but so do their velocities). Then of course you also send deltas.
Unfortunately this is all a little bit pointless because no other architecture is anywhere close to as flexible and powerful, so you would be making a video format that can only be played back on LRB. There's also the issue that we're getting to the point where H264 is "good enough" in the sense that you can do HD video at near-lossless quality, and the files may be bigger than you'd like, but disks keep getting bigger and cheaper so who cares.