Which reminds me that the internet just doesn't work.
I mean, as a way of stealing money from stupid people, it works great, which I suppose is really what the people behind the modern interent are really interested in. But as a way of presenting information in a simple, efficient, permanent, archivable format, it's shit.
Whenever I go back to one of my posts from a few years ago, which I carefully linked to good info - half the links don't work anymore.
It's just worse than fucking five-hundred-year-old technology (books). I can buy a book and put it on my shelf and it doesn't disappear in the night.
Of course it's even worse if you make fancy pages that use AJAX or Flash (I don't even know what the new widget flavor of the month is) or proprietary formats like PPT or whatever, since that stuff will be a huge pain to keep working 10-20 years from now.
Anyway, pursuant to this I thought I should go and actually download some of my favorite pages. Unfortunately it's much harder than you might think.
Anything at Google Groups is a good example of the problem.
It sure looks like just a bunch of plain text. Oh no. It's running through some kind of crazy Google mumbo-jumbo. If you just use Firefox's "Save Page" you get 600k of shit for that tiny bit of text - and it fails to download it remotely correctly. (but at least it does get the primary text)
If you use HTTrack to try to mirror the whole page, it downloads about 1000k of shit and fails to get a readable page AT ALL.
OMG this should not be so difficult. Text, people, text!