I do a lot of weird threading stuff so my first fear was that I had some kind of race. So I turned off all my threading, but it kept happening.
My next thought was some kind of uninitialized memory problem or out-of-bounds problem. The circumstances of failure jive with the bug only happening after I have touched a lot of memory and maybe moved into a weird part of address space, or maybe I'm writing past the end of a buffer somewhere and it doesn't show up and hurt me until much later.
So I turned on my various debug allocator features and tried a bunch of things to stress that, but still couldn't get it to fail in any kind of repeatable way.
Yesterday I saw the exact same kind of bug happen in a few of my different compressors and the lightbulb finally came on in my head : maybe I have bad RAM. Memtest86 and just a few seconds in, yep, bad RAM.
Phew. As pissed as I am to have to deal with this (getting into the RAM on my lappy is a serious pain in the ass) it's nice to not actually have a bizarro bug.
The failure rate of RAM in desktop-replacement lappies is around 100% in my experience. I've had two different desktop replacement lappies in the past 8 years and I have burned out 3 RAM chips; I've blown the OEM RAM on both of them and on this one I also toasted the replacement RAM. Presumably the problem is that it just gets too hot in there and they don't have sufficient cooling. (and yes I keep them on a screen for air flow and all that, and never actually use them on a lap or pillow or anything bad like that). (perhaps I should get one of those laptop stands that has active cooling fans).
Also, shouldn't we have better debugging features by now?
I should be able to take any range of memory, not just page boundaries, and mark it as "no access". So for example I could take compression buffers and put little no access regions at the head and tail.
For uninitialized memory you want to be able to mark every allocation as "fault if it's read before it's written". (this requires a bit per byte which is cleared on write).
You could enforce true const in C by making a true_const template that marks its memory as read-only.
I've ranted before about how thread debugging would be much better if we could mark memory as "fault unless you are thread X", eg. give exclusive access of a memory region to a thread.
I see two good solutions for this : 1. a VM that could run your exe and add these features, or 2. special memory chips and MMU's for programmers. I certainly would pay extra for RAM that had an extra 2 bits per byte with access flags. Hell with how cheap RAM is these days I would pay extra for more error-correction bits too; maybe even completely duplicate bytes. And self-healing RAM wouldn't be bad either (just mark a page as unusable if it sees failures in that page).
(for thread debugging we should also have a VM that can record exact execution traces and replay them, of course).