11-02-11 - StringMatchTest Release

Code for my string match testbed discussed previously. I'm not gonna do the work to turn this into a clean standalone, so it's a big mess and you can take what you like out of it.

stringmatchtest.zip (45k)

Note : the stringmatchtest.vcproj project refers to some files that are not included in this distribution. Just delete them from the project.

Requires cblib.zip (633k)

You may also need STLPort (I haven't tried building with the VC STL , I use STLPort 5.1.5 or 5.2.1). (BTW I had to modify the STLPort headers to make it build on VS 2008 ; the mods should be obvious).

Tested with VC 2005 and 2008. Does not build with VC 2010 currently.

The most interesting bit is probably in test_suffixarray, which implements the three suffix-array based string searchers previously described on this blog. See previous posts :

cbloom rants 06-17-10 - Suffix Array Neighboring Pair Match Lens
cbloom rants 09-23-11 - Morphing Matching Chain
cbloom rants 09-25-11 - More on LZ String Matching
cbloom rants 09-27-11 - String Match Stress Test
cbloom rants 09-28-11 - Algorithm - Next Index with Lower Value
cbloom rants 09-28-11 - String Matching with Suffix Arrays
cbloom rants 10-02-11 - How to walk binary interval tree
cbloom rants 09-24-11 - Suffix Tries 1
cbloom rants 09-24-11 - Suffix Tries 2
cbloom rants 09-26-11 - Tiny Suffix Note
cbloom rants 09-29-11 - Suffix Tries 3 - On Follows with Path Compression

cbloom rants 09-30-11 - String Match Results Part 1
cbloom rants 09-30-11 - String Match Results Part 2
cbloom rants 09-30-11 - String Match Results Part 2b
cbloom rants 09-30-11 - String Match Results Part 3
cbloom rants 09-30-11 - String Match Results Part 4
cbloom rants 09-30-11 - String Match Results Part 5 + Conclusion
cbloom rants 10-01-11 - String Match Results Part 6

StringMatchTest includes :

 * divsufsort.c for libdivsufsort-lite
 * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved.

/* LzFind.c -- Match finder for LZ algorithms
2009-04-22 : Igor Pavlov : Public domain */

    MMC (Morphing Match Chain)
    Match Finder
    Copyright (C) Yann Collet 2010-2011

StringMatchTest like all cbloom.com software is released under zlib license (basically free for all uses).


Cyan said...

For your information, i had some trouble compiling cblib with VS2008 unfortunately. Stringmatchtest itself went fine, but obviously linking cannot be completed without cblib.

Oh well, i can understand that you have better things to do.

cbloom said...

Did you try again? I updated cblib since you sent me that report, this builds with 2008 for me.

I try to fix any build problems reported but it can take me a little while to get around to. VC 2010 looks like a huge pain in the butt.

I was gonna rant about the stupid fucking change to the Directories settings, but plenty of other people have already ranted about it :


Cyan said...

Yes, indeed. I've been trying to compile the latest release posted in your blog.
My first issue was about the .vcproj file, which seems incompatible/does not work with VS2008. Not a big issue, i've created a new project which encompasses basically all *.c and *.h files in cblib directory.

Then trying to compile cblib, i'm receiving a lot of errors. So it cannot be summarize here.

A lot of them have something to do with "W" Wide format, such as :

error C2664: 'CreateWindowExW' : impossible de convertir le param├Ętre 2 de 'const char *' en 'LPCWSTR'

I'll send a copy of the error list in email. Sorry for the french spelling.

jfb said...

You're compiling with UNICODE defined.

cbloom said...

The vcproj is 2005; for me it converts automatically without any problems.

The W problem is definitely the stupid Windows TCHAR / #define UNICODE thing.

But yeah email me any errors.

Cyan said...

Indeed, with unicode turned off (that one was difficult to find...), it works much better.
Still, not perfect. I have several errors remaining in cblib.
The list is too long to post here, so i'll send an email. But for example :

A simple one :
1> bmpimagejpeg.cpp(17) : fatal error C1083: Impossible d'ouvrir le fichier include : 'C:\src\cblib\External\libjpg/jpeglib.h' : No such file or directory

Hard-coded path. Plus the jpeglib is not in external anyway.

A more complex one :
1>c:\program files\microsoft visual studio 9.0\vc\include\xutility(315) : error C2664: 'bool cb::pair_key_bool_binary_functor::operator ()(const Pair &,const cb::Token &) const' : impossible de convertir le param├Ętre 1 de 'const cb::Token' en 'const std::pair<_Ty1,_Ty2> &'

cbloom said...

Okay, new version of cblib should work with the MSVC STL under VC 2005 and 2008. STLPort is still recommended though.

Also options are gathered in cblib_config.h so you can disable trying to pull in png & jpeglib.

old rants