So I finally had a deeper look to sort it out. The short answer is that LZHAM has some sort of very long initialization (even for just the decoder) which makes its speed extremely poor on small buffers. I was seeing speeds like 2 MB/sec , much worse than LZMA (which generally gets 10-25 MB/sec on my machine). (this is just from calling lzham_lib_decompress_memory)
On large buffers, LZHAM is in fact pretty fast (some numbers below). The space-speed is very good (on large buffers); it gets almost LZMA compression with much faster decodes. Unfortunately the breakdown on small buffers makes it not a general solution at the moment IMO (it's also very slow on incompressible and nearly-incompressible data). I imagine it's something like the huffman table construction is very slow, which gets hidden on large files but dominates small ones.
Anyhoo, here are some numbers. Decode shows mb/s.
BTW BEWARE : don't pay too much attention to enwik8 results; compressing huge amounts of text is irrelevant to almost all users. The results on lzt99 are more reflective of typical use.
(lzma should beat lzham on enwik8 but I can't be bothered to fiddle with all the compress options to find the ones that make it win; this is just setting both to "uber" (and -9) parse level and setting dict size = 2^29 for both)
And some charts for lzt99. See the previous post on how to read the charts .