10/07/2008

10-07-08 : Random file stuff I've learned

Random file stuff I've learned :

Good article on Windows file caching . The Windows file cache architecture is pretty sweet really. Files are cached in 256k blocks. The caching is also file "buffering". That is, file IO in the application doesn't get any buffer just for that app - you get access to the cache pages, that is your buffer. It means multiple apps on the same file share buffers. This is all pretty sweet. The caching is just a file mapping of pages in the system address space. When you create a memory mapped file, you are just getting direct access to those pages, which you can then lock to get pointers in your address space. You can lock & unlock the pages just like any pages using VirtualAlloc and VirtualFree. When you write to pages, you can control them getting flushed out the write cache by unlocking them.

Windows file caching is *extremely* greedy. It will basically throw away anything it has cached in order to cache the latest request. This makes it look very good in benchmarks, but can be pretty bad in practice. For example, I was testing by doing linear streaming IO across a 2 GB file. Windows caches every single byte of the 2 GB file so that future touches to it are instant; in order to do that, it will evict the baby and the bathwater from the cache (including stuff like the cache of the NTFS MFT table).

One thing I haven't been able to figure out is a way to manually control the prefetching that windows does. When you do a bunch of ReadFile calls in a row, the cache will asynchronously prefetch ahead the next 256k chunk. This means that even if you just call the synchronous ReadFile (or fread) to get bits of the file, the return is almost immediate. What I would like, however, is a way to manually say "prefetch at this byte". The only way I can do that at the moment is to issue a read at that byte on a thread, but that's not really ideal, it would be better to be able to signal the underlying cache directly.

Also it appears that if you use memory mapped files, windows stops doing the smart async prefetch ahead. Say you memory map a huge file, then you run through streaming touching pages in order. Windows uses the virtual page fault from the memory manager to interrupt and pull in pages from disk. It seems to not prefetch because you get a lot of faults and your app just stalls when you get to pages that aren't in memory yet. Memory mapped files are pretty awesome, it's cool that you just get access to the cache, but without the ability to request prefetches they're pretty worthless since the plain old ReadFile beats them so badly. (if you are in fact doing totally random file accesses on a huge file, then they are just about perfect).

Stdio in MSVC has a 4k buffer and then calls straight to ReadFile to fill it. The 4k buffer isn't really providing you any "buffering" it just hides function call overhead and makes the reads page aligned; the real buffering is inside ReadFile.

If you're linking with the multithreaded libs, stdio becomes outrageously slow. Personally I like my threading model to work such that individual objects, such as a "FILE" are not protected from concurrent use by the library. That is, once the API hands out an atomic object to the app, it is up to the app if it wants to protect that object from multiple threads accessing it or not. The library should only do protection from access to shared/global data structures. (I know they can't do this with stdio because they want to make sure it's safe, and they want to protect stdout/stdin/stdierr). Anyway, the result is that "getc" goes from being around 10 clocks to being around 170 clocks. Fortunately, you can fix that pretty easily by just bringing back the macro yourself. Just define this :

#define macrogetc(_stream)     (--(_stream)->_cnt >= 0 ? ((U8)*(_stream)->_ptr++) : _filbuf(_stream))

And use it to do your IO when you know the FILE is only being used by one thread.

Something I just discovered today : while Windows buffering is in fact very good for reads, it sucks for writes. If I write out a stream of data through a buffered file, I only get around 25 MB/sec. If I write out chunks with async non-buffered calls, I get 100 MB/sec. (conversely, buffered streaming input is actually faster than doing a bunch of async non-buffered reads).

10/06/2008

10-06-08 : Followup on the "Russian Range Coder"


10-06-08

Followup on the "Russian Range Coder". So, first of all let me stress once again that this thing is pointless and you should not use it. It's an "optimization" that gives up some compression and doesn't actually make you faster at all. You should never do approximations or optimizations unless they actually help, even if they look like they should be faster. Best to stick with full quality everywhere lest it come back to bite you.

But I was thinking about it and I realized you can improve the renorm loop. The original loop in the case of underflow just truncates the range down to the bottom part to make the top bits always equal to the top bits of "low". But that can be really bad in some cases, since you might be straddling a boundary, but most of your range could be above the straddle, not below. It's easy enough to just check whether you have more range above or below the straddle :


    while(rc->range < (1<<16))
    {
  
  int above = (rc->low + rc->range) & 0xFFFF;
  int below = (~rc->low) & 0xFFFF;
  //int above = rc->range - 1 - below;
  
  /*
  U32 hi = rc->low + rc->range;
  U32 hi_below = rc->low + below;
  U32 lo_above = hi & 0xFFFF0000;
  */
  
  if ( above > below )
  {
   rc->low += rc->range - above;
   rc->range = above; 
  }  
  else
  {
   rc->range = below;
  }
  
  *rc->ptr++ = (U8) (rc->low >> (RANGE_SIZE - 8));
  rc->range <<= 8;
  rc->low <<= 8;
    }

That is, we either push the top bits of low up to the top bits of hi, or push the top bits of hi down to the top bits of low, depending on which gives us more code space. Again, this doesn't hurt speed at all because it's rare. In fact, that's why you should just use a proper virtual-queue range coder that handles underflow right.

Just to make it super clear what's happening, here is an actual example :


    rc->low + rc->range            0x4d0043e4
    rc->low                        0x4cffa580
    above                          0x000043e4
    below                          0x00005a7f
    rc->low + below                0x4cffffff
    rc->low + rc->range - above    0x4d000000

Some numbers :


m:\ccc\book1 : inLen : 768,771

True Entropy : 435,042.6 : 4.527 bpb

radArith : encLen : 435,116 = 4.528 bpb
 bytes exceeding entropy : 73

original RRC : encLen : 435,284 = 4.530 bpb
 bytes exceeding entropy : 241

cbloom RRC : encLen : 435,216 = 4.529 bpb
 bytes exceeding entropy : 173

radArith cycles: 69.6
original rc cycles: 74.2
cbloom rc cycles: 70.3

Here "radArith" is a normal Michael Schindler range coder with virtual queue. My modified "russian range coder" has about half the waste of the original (compared to the canonical range coder, not compared to entropy). Note that the arithmetic coders here are all using a max cumulative frequency of 16384, while the "true entropy" uses the exact character counts in the file.

BTW these are the links to the authors of the "Russian Range Coder" - they seem to all be dead. If anyone has live links to these people let me know and I'll hook them up.

 

/*
  Arithmetic coder based on "carryless rangecoder" by D. Subbotin.
  * http://www.softcomplete.com/algo/pack/rus-range.txt
    (Original C++ code by Subbotin)
  * http://hem.spray.se/mikael.lundqvist/
    (C impelementation by M. Lundqvist)
  * H. Okumura: C magazine, Vol.14, No.7, pp.13-35, Jul. 2002
    (Tutorial of data compression with sample codes, in japanese).
*/

10/05/2008

10-05-08 : Rant on New Arithmetic Coders

Rant on New Arithmetic Coders

Can we fucking stop calling arithmetic coding "range coding" !? It's fucking arithmetic coding. "Range coding" differs from the ancient CACM arithmetic coder in exactly two ways :

1. It uses a "virtual queue" for the carry propagation, exactly as I described in my VirtQ paper . (warning : ancient!)

2. It introduces an approximation to the range update which lets it use more bits by changing the order of multiplication :

CACM :

range = range_high - range_low + 1;
range_high = range_low + (symhigh * range) / symtot - 1;
range_low += (symlow * range) / symtot;

"range coder" :

range = range_high - range_low;
range_high = range_low + symhigh * (range / symtot);
range_low += symlow * (range / symtot);

This small rearrangement of the math does in fact make a huge difference, but dude look, it's literally just moving where you put the parentheses.

Of course it means you save a divide, and it means range can go up to 31 bits since you never multiply by it. Getting range up to 31 bits means you can afford to shuffle out whole bytes at a time. In CACM the multiplicataion has to fit in 32 bits so your range can only be 16 bits so outputting bytes is a no go; range stays between 15 and 16 bits all the time in CACM while a "range coder" can go down to 23 bits; when it's down at 23 bits you only have 9 bits of coding precision because of the approximation of doing the divide before the multiply.

I'm sick of all these web pages that are like "LZMA uses a statistical algorithm called range coding". No, it uses a fucking entropy coder called arithmetic coding.

Okay, this was just an angry rant, but it reminds me of something interesting. I recently stumbled on some new arithmetic coders and they blew my mind for a minute until I figured out how they work. (BTW For reference, the canonical "range coder" is by Michael Schindler )

(BTW #2 it's sort of curious to me historically that I missed the range coder; I was so close to inventing it myself, but somehow got right next to it and didn't see it. All the pieces of the range coder were thought of previously by various people, including the original CACM guys, such as trying to use 32 bits, shifting out bytes, approximating the division, and the virtual queue, but on their own each of those ideas doesn't quite work or make a big difference, nobody put those ideas together and realized that together they make this "range coder" thing that's a big improvement).

The new arithmetic coders I've discovered are the fpaq0p binary range coder and "rc" the "russian range coder". FPAQ0p is by Ilia Muraviev, who's done lots of good stuff; you can find it on Matt Mahoney's PAQ page . The "russian range coder" used to have a web page but it appears to have disappeared. I found the code in MRP (Minimum Rate Predictors, a decent lossless image coder).

The "fpaq0p" arithmetic encoder is brutally simple. It's what you would write if you just heard of arithmetic encoding; in fact it was my first idea for a coder ever, but it didn't work, so everybody dropped it. Well, it's back. If you remember the super basic idea of an arithmetic coder is you're specifying a code stream by writing an infinite precision number; at any time you know your desired number is in some interval, low and high, you shrink the interval down to specify the symbol, and to do it in finite precision you shift out top bits of low & high as they become the same. (See my ancient write up of basic arithcoding )

So what would you write?

range_low = 0;
range_high = 0xFFFFFFFF; // 32 bits on

range = range_high - range_low;
range_high = range_low + symhigh * (range / symtot);
range_low += symlow * (range / symtot);

while( (range_low>>24) == (range_high>>24) )
{
 *outPtr+= = (range_low>>24);
 range_low <<= 8;
 range_high <<= 8;
 range_high += 0xFF;
}

Okay, good job, but this is badly broken. First of all, you're overflowing your 32 bit code. Second of all, it doesn't handle carry propagation, and it doesn't handle underflow. The problem with carries is that range can step up to the next bit and push a bit over code size. The problem with underflow is that we aren't handling the case of the range getting very very small but while the top bytes are different.

In particular, when range_high comes down close to range_low from above and range_low comes up close to range_high from below, but the top bytes are different, eg, they're like 0x800000AA and 0x7FFFFFBB , the top bytes are different so they don't get shifted out, but the range can get arbitrarily small. (in particular the top byte of "low" is X, the top byte of "hi" is X+1, the second byte of low is 0xFF and the second byte of hi is 0x00).

But fpaq0p works, so how does it do it?

The main trick is to only do binary coding. Then you just don't worry about underflow. With the underflow problem, your range can get as small as only 1 , with 0x01000000 and 0x00FFFFFF, but you implicitly add 1 to the top, so you still have enough range to code 2 symbols, which is what you need for your binary alphabet to be decodable. Any alphabet larger than binary is not codable there.

The other minor trick is just about where you do your "+1". The high is always implicitly +1 above what you actually store in range_high. We started it with 0xFFFFFFFF but we want an implicit FFFFF... repeating to infinity after it. That's why we shift in FF, but we also have to do it right in the range update in a way that doesn't blow the 32 bits or ever generate a carry.

So the working fpaq0p coder is :


range_low = 0;
range_high = 0xFFFFFFFF; // 32 bits on

range = range_high - range_low;
range_mid = range_low + symp0 * (range / symtot);
if ( bit )
 range_low = range_mid+1;
else
 range_high = range_mid;

while( (range_low>>24) == (range_high>>24) )
{
 *outPtr+= = (range_low>>24);
 range_low <<= 8;
 range_high <<= 8;
 range_high += 0xFF;
}

Which looks like it's doing the same trivial thing, but is subtly different.

Note that in the horrible underflow case your coding precision can get as bad as only 1 bit (that is, the probabilities are forced to 50/50 regardless of what they really are). Obviously that hurts coding, but it's actually not that bad, because it's statistically incredibly rare. In fact it happens something like 2^-24 of the time, since all spots on the code range are equally likely. And even when it does happen you only lose a few bits of coding efficiency.

BTW :


To do this faster :

 while( (range_low>>24) == (range_high>>24) )

You can do :

 while( ((x1 ^ x2)>>24) == 0 )

or

 while( ((x1 ^ x2)&0xFF000000) == 0 )

or

 while( (x1 ^ x2) < (1<<24) )

but in practice it doesn't really matter because you're stalled out on branches anyway.

Also in a binary arithmetic coder you never actually divide by symtot. You use symtot = some fixed power of 2, and you do a probability update that keeps tot the same. My code's got this old rung/ladder thing that does different styles of adaptation with a power of 2 total. Or you can just do the super simple thing that everybody seems to do these days :

#define symbits  (12) // or whatever < 14
#define symtot  (1<< symbits)
#define updshift (6) // or anything in {1,symbits-1}

void update(int & p0,bool bit)
{

 if ( bit )
 {
  p0 -= p0 >> updshift;
 }
 else
 {
  p0 += (symtot - p0) >> updshift;
 }
}

Here "updshift" controls how fast your adaptation is vs. how precise you become in the long term. This update makes no distinction between the initial rollup when you have no stats and the adapted state, which the rung/ladder does. This is an approximation of a normal n0/n1 counter with an implicit max count of (1 << updshift). Compare to an n0/n1 update :


void update(int & p0,bool bit)
{
 int n0 = p0;
 int n1 = symtot - p0;
 int increment = symtot >> updshift;
 if ( bit )
 {
  n1 += increment;
 }
 else
 {
  n0 += increment;
 }
 p0 = (n0 * symtot) / (n0 + n1);
}

We can see the equivalence and approximation by looking at the bit on case :


 I'll use := to mean assignment while I do maths :

 p0 := (n0 * symtot) / (n0 + n1);
 p0 := (p0 * symtot) / (symtot + incr);
 p0 := (p0 * symtot)*(symtot - incr) / (symtot + incr)*(symtot - incr);
  now approximate incr^2 is much less than symtot^2
 p0 := (p0 * symtot)*(symtot - incr) / (symtot^2);
 p0 := (p0)*(symtot - incr) / (symtot);
 p0 := p0 - (p0*incr)/symtot);
 p0 -= p0/(symtot/incr);
 p0 -= p0>>updshift;

BTW this approximation goes back to the ancient stuff by Howard & Vitter on fast arithmetic coding. Of course in practice you also want to merge your modelling with arithmetic coding since they do the same random branch on bit.

Most people these days are good at designing their compressor to drive the binary arithmetic coder well. A well driven binary arithmetic coder is generally coding probabilities close to 50%. That's better because it keeps precision high and keeps you in the accurate part of the coarse probability update approximation. It's also better because it means you're calling the coder many fewer times. If you only call the arithmetic coder with probablities close to 50% it means you're only doing coding roughly as many times as the number of bits you output, which is low. You accomplish this by coding symbols that are "natural" for the problem. For example, for LZ77 offsets you code in the log2 of the offset (similarly with BWT MTF codes). etc. etc.

So I also found this range coder implementation called the "Russian Range Coder". It's extremely similar to the standard Michael Schindler coder except the renorm & bit output loop is slightly different :

"Russian Range Coder" :

    while((rc->low ^ (rc->low + rc->range)) < (1<<24))
    {
  *rc->ptr++ = (U8) (rc->low >> 24);
  rc->range <<= 8;
  rc->low <<= 8;
    }
    
    while(rc->range < (1 << 16))
    {
  *rc->ptr++ = (U8) (rc->low >> 24);
  rc->range = ((~rc->low) & 0xFFFF) << 8;
  rc->low <<= 8;
    }

So what's going on here? The first part we can recognize is exactly the simple loop from fpaq0p that just outputs the top byte when it's the same. But if you just do that you're fucked for multisymbol alphabets because of underflow, so looky here, the second part handles that. It checks if the range is getting too small to resolve symbols. But normally in the virtual queue / schindler coder that would mean setting up a carry queue and such. This coder avoids that. It does it by just giving up a huge chunk of the range.

The bad case occurs when the range is small and the top bytes are off by 1. The second bytes are 0x00 and 0xFF. With a virtual queue, you would put the top byte of "low" in the virtual queue spot, and count the number of 0xFF bytes but don't write them yet. If you get a carry, you propagate the carry by writing the top of low+1, then implicitly flipping the 0xFF's to 0x00's and writing them. This is the virtual Q / schindler way.

Here we don't check for underflow and do the carry and such, we just smash down the "high" to the "low". We know the top bytes are not equal, so the next bytes must be 00 and FF, eg. we have something like 0x0B00XXXX and 0x0AFFXXXX , we just smash the high down to 0x0AFFFFFF. This bit :


 rc->range = ((~rc->low) & 0xFFFF) << 8;
 rc->low <<= 8;

is the same as :

 high = low | 0xFFFF;
 range = high - low;
 range <<= 8;
 low <<= 8;

which I think makes it clearer what's happening. You just chop down to make your top bytes the same, eliminating the bad straddle situation which caused underflow. Now you can renormalize the simple way by shifting out equal top bytes and get a big enough range back.

Now obviously this is an approximation that loses some efficiency, you're throwing away on average half your range in the cases that you have this bad underflow, but again it's pretty rare. I don't know the exact probabilities, but the measured efficiency is very close to the full coder.

In practice, however, this coder is not any faster than the full correct schindler coder, so there's no point to using it. It is however, very simple and elegant once you see how it works.

ps. the other new thing is the Jarek Duda lattice coder thing which I have yet to understand. It seems to offer no advantages and many disadvantages, but I would like to grok it for my own brain's happiness.

10/01/2008

10-01-08 : First Look at LZMA

So I looked into LZMA / 7-Zip a bit. So far as I know there's no real description of the algorithm available, but there is source code here . This is from just browsing the source code for a bit, so take it with a bit of skepticism.

LZMA is an arithmetic coded LZ + context coder. It has a couple of clever ideas that make it perform very well. In fact it compresses better than Quantum and is also faster (at compression - Quantum is faster at decompression, in fact the only disadvantage of LZMA is that its decomp is not super fast).

The match finder is very fast. I can't figure out what it's doing exactly. It makes strong hashes using the CRC. The docs claim it then goes to a patricia trie, but my god I can't see that in the code.

It does lots of binary arithcoding. In fact I don't think it has a multisymbol arithcoder at all, it just breaks larger alphabets into a series of binary codes. He does a pretty good job of breaking up the binary tree so that the common cases of short lengths and low offsets get written in very few binary coding ops (that is, the coding choices of 0 or 1 are close to equal probability).

There are some clever models. In general it's well tweaked to use few contexts but capture the salient information in those contexts. For example, the order-1 literal model only uses the top N bits of the previous char as context, since for most data the top bits are the good predictors and the bottom bits are semi-random. One clever model is using the bottom 2 bits of position as part of the literal context. This is great for capturing the very common pattern on binary data of a 0 every fourth byte, or any every-2 or every-4 bytes kind of patterns (such as when you write a bunch of 32 bit floats and the exponent part is the same on all of them).

It also does a semi-optimal parse. From what I can tell it looks like it finds the match possibilities N steps ahead, and then does a kind of semi-backward cost evaluation from the end of that sequence to pick the best choice at the current coding spot. I believe N is only 4, so this is sort of like the old "lazy matching" but it looks ahead 4, and rather than a rough heuristic like most people use in lazy matching he actually evaluates a coding cost like a true optimal parse.

On text it's very good, but on binary it really kicks ass, I'm guessing the biggest win is because of the pos-conditioned contexts which are really good for binary.

9/27/2008

09-27-08 : On LZ and ACB


09-27-08

On LZ and ACB

LZ77 coders are hugely inefficient. Everybody knows the basic idea of LZ77. You compress data using back references to previous data. You either write a match, consisting of an offset & length, or a literal, consisting of one raw character.

The inefficiencies are intuitively obvious. If the coding was perfect, then if you were the decoder, the encoded stream should be totally random, you shouldn't be able to predict the code stream, but of course you can. One way is with context modeling. If you see a context like "what the ", then it's highly likely the next sequence is "fuck" so offsets which point to that are far more likely. Similarly, certain offsets point at certain words, and there are natural likely match lengths for those as well.

Someday soon I'm going to write about how I'm doing "Optimal Parsing". But the very fact that you can do optimal parsing is a sign of the flaw of LZ77. The encoder has many choices of different ways to write the code stream, which means there is massive redundancy in the code space. There are many different LZ77 code streams which all correspond to the same input. The optimal parser tries to pick the best one of these, but it doesn't change the fact that all the ways to encode that you did not choose are just wasted probability space in the coded bits.

Let's look at how you might reduce the redundancy in LZ77 if you for some reason had to do LZ coding and wanted to go as far with it as you can :

First of all, whether or not a match will be found is highly predictable. Most high compression LZ coders do a lot of context modeling for the match flag, because you can tell by looking at the local neighborhood whether it's an area that's likely to match. There are also finite-state patterns that arise in many types of data (particularly binary data), where it may be very likely that the data goes match-literal-match-literal , or mat-lit-lit-mat-lit-lit-mat , or whatever. Apparently LZMA does this.

Modeling the redundancy in the offset and the length are more interesting and difficult. The first thing we see is that they are not really two separate parameters, they are highly correlated, but not in a trivial way. For example, say we simply find the longest possible match and send the {offset,length} in the normal LZ77 way. There were very similar shorter matches that we passed up. When we send just the {offset} , the decoder knows what the match string we want looks like, and it knows that we passed up all the other matches, so this offset must be the longest one. That means it can mostly infer the match length. The decoder gets the offset, that specifies the match string, then the decoder finds the other string in the window that has the longest prefix length with the match string. The actual match must be at least the prefix length + 1. You can then either just output that match, or you could send the amount that the lengths exceeds that.

Let me show an example for clarity :

T A B C X X A B C D T T * A B C D ...

current code pointer is at (*)

encoder send match offset "-6"

decoder sees the match string "ABCDTT"

it looks for the longest other matching prefix and finds "ABCXX" with shared length 3

therefore we must match of at least length 4

decoder outputs "ABCD"
(then possibly also an additional match length starting at 0)

Now, we're still sending the offsets raw, but of course the offset is highly predictable too. Given the current context (or whatever type of model you want to use), you're much more likely to match certain offsets than others. You shouldn't just code the offset as a backward index into the window, you should assign a probability to each. The "ROLZ" and "PMR" LZ coders of today do this to some extent.

(aside : ROLZ = reduced offset LZ; basically means only code matches that share some amount of context, perhaps also MTF the offsets within the contexts; you code many fewer matches, but that's a good thing because ROLZ coders generally use a strong context model for literals; in fact if you use a powerful PPM for literals then the fewer matches you code the better your compression is. PMR = previous match reference, just codes references to previous matches in fewer bits because they are more likely, encoder also favors choosing PMR's over other offsets).

We can code an arbitrary offset with some modeling by reordering them. Make offset 0 be the most likely, up to offset N the least likely. One way to do this is to order by context match length. Say the longest context match is L symbols, then all the offsets that match with a context of length L are first, then all the offsets that match at length (L-1), down to length (0). Within each context, order so that more recently seen & used offsets are lower. So to code an offset, you still find the longest match, then you see where that offset is in the list ordered by likelihood, and output that index. (alternatively you could just context code the offset in the normal context way).

This is very closely related to LZP, Block Sorting, and Fenwick's Symbol Ranking stuff ( for example ).

We can do something similar in a slightly different way by sorting. Consider all previous possible match spots. Sort them by their context string, backwards, like for PPM. Now to match your current string, find the longest match with your forward string. This is the offset you want to send. To send it, find your current context in the sorted list - you'll find the longest match (actually you'll be between two strings in the list). Send the difference between your position in the list and the position of the longest match. Hopefully this delta is usually small, because the best match usually comes from a context similar to yours. The length of match is like before - the offset implicility send the length, you send the additional length after the implicit.

Obviously our goal has been to make the offset and match length both very close to zero. I mentioned block sort and symbol ranking before; the statistics of the codes we're generating here should look very much like those statistics.

Well, holy crap, this is exactly ACB !! (ACB = Associative Coding by Buyanovsky). I found a paper on ACB by Tomas Skopal or you can read this or this on my site.

ACB was never given much attention for a few reasons. 1) George's english is awful and he could never write a decent paper on it, so it never got into academia, his terminology is all non-standard and very confusing, 2) it's way the hell off the optimal curve in terms of speed vs. compression. It's slower than PPM and can't compress as well. It is however, interesting to me in a theoretical sense, because it's sort of like as good as you can do with dictionary compression, and as we see it's extremely similar to LZP, block sorting, and symbol ranking. In fact all of these coders wind up sucking in the same way, because by simply turning your correlation knowledge into an index with low mean, you're giving away a lot of good local information that a plain context modeller could do more with.

old rants