5/31/2011

05-31-11 - STB style code

I wrote a couple of LZP1 implementations (see previous) in "STB style" , that is, plain C, ANSI, single headers you can just include and use. It's sort of wonderfully simple and easy to use. Certainly I understand the benefit - if I'm grabbing somebody else's code to put in my project, I want it to be STB style, I don't want some huge damn library.

(for example I actually use the James Howse "lsqr.c" which is one file, I also use "divsufsort.c" which is a delightful single file, those are beautiful little pieces of code that do something difficult very well, but I would never use some beast like the GNU Triangulated Surface lib, or OpenCV or any of those big bloated libs)

But I just struggle to write code that way. Like even with something as simple as the LZP's , okay fine you write an ANSI version and it works. But it's not fast and it's not very friendly.

I want to add prefetching. Well, I have a module "mem.h" that does platform-independent prefetching, so I want to include that. I also want fast memsets and memcpys that I already wrote, so do I just copy all that code in? Yuck.

Then I want to support streaming in and out. Well I already have "CircularBuffer.h" that does that for me. Sure I could just rewrite that code again from scratch, but this is going backwards in programming style and efficiency, I'm duplicating and rewriting code and that makes unsafe buggy code.

And of course I want my assert. And if I'm going to actually make an EXE that's fast I want my async IO.

I just don't see how you can write good code this way. I can't do it; it totally goes against my style, and I find it very difficult and painful. I wish I could, it would make the code that I give away much more useful to the world.

At RAD we're trying to write code in a sort of heirarchy of levels. Something like :


very low level : includes absolutely nothing (not even stdlib)
low level : includes only low level (or lower) (can use stdlib)
              low level stuff should run on all platforms
medium level : includes only medium level (or lower)
               may run only on newer platforms
high level : do whatever you want (may be PC only)

This makes a lot of sense and serves us well, but I just have so much trouble with it.

Like, where do I put my assert? I like my assert to do some nice things for me, like log to file, check if a debugger is present and int 3 only if it is (otherwise do an interactive dialog). So that's got to be at least "medium level" - so now I'm writing some low level code and I can't use my assert!

Today I'm trying to make a low level logging faccility that I can call from threads and it will stick the string into a lock-free queue to be flushed later. Well, I've already got a bunch of nice lockfree queues and stuff ready to go, that are safe and assert and have unit tests - but those live in my medium level lib, so I can't use them in the low level code that I want to log.

What happens to me is I wind up promoting all my code to the lowest level so that it can be accessible to the place that I want it.

I've always sort of struggled with separated libs in general. I know it's a nice idea in theory to build your game out of a few independent (or heirarchical) libs, but in practice I've always found that it creates more friction than it helps. I find it much easier to just throw all my code in a big bag and let each bit of code call any other bit of code.

8 comments:

Joel Bernstein said...

You could do it SQLite-style, where you structure your code however you like, and then a script jams it all together into one big amalgamation.

They actually claim it improves performance on certain compilers that won't do cross-file optimizations.

cbloom said...

Yeah, that winds up being what I do by hand - I write the code in full library style, and then to make an STB version I go and copy-paste out all the dependencies I need.

I could actually probably automate that without hand-writing scripts. You just need a mini compiler to go and find everything you depended on and copy it into one file.

One evil way to do it would be for me to just compile my code into OBJ's and then run a disassembler and ASM to C converter to make really nasty plain C versions.

Arseny Kapoulkine said...

You'll just need a quick'n'dirty preprocessor - really a #include handler that does not look at other preprocessing directives and skips #include <> (system includes). That's easy to write and should work in almost all cases.

cbloom said...

"You'll just need a quick'n'dirty preprocessor"

Well, not exactly. I want it to strip away the things that I don't use.

eg. if I #include cblib/Base.h just to get my definition of "uint8" I don't want the whole of Base.h to get splatted in, just that one line.

It would be a pretty nice utility to have though. Given a C file make a version which is stand-alone. Maybe it's worth spending some time to write that.

And actually the same code could be used to generate my own Browse Info, since the core thing needed is a "find definition" function.

Cyan said...

This SQLite amalgation process just looks the right way to do it. I really like the philosophy behind that construction.
Is this "amalgation utility" strictly internal to the SQLite team ?

cbloom said...

The SQLite amalgamation just takes all their files and cats them together.

They have a TCL script that you can download which just copies all the source code into the amalg file, and when it sees a #include it puts the text of the #include inline, but only the first time.

It's a decent easy way to "STB-ify" an entire library, but it doesn't do what I really want, which is to extract a piece of my big monolithic home codebase.

johnb said...

I have this same problem, and I so often end up just copying a bunch of code from one project to the next, and usually having to fix up #includes and what-not as well. It's horrible. It's because C and C++ have no good module system, no standard build system and no standard package management system.

Overall this means the overhead of creating an independently distributable module in C or C++ is just horribly, horribly high.

Haskell has Hackage, perl has CPAN, ruby has... I dunno, gems or something. Python has PyPI. Most TeX distributions have some massive and standard repository you can grab things from.

We need a good C/C++ package management tool that's cross-platform and integrates with all common build systems and version control systems.

cbloom said...

Well I just wrote an "amalgamate" but it doesn't actually work on any kind of large code base.

I'll post my code anyway and explain why it don't work.

old rants