8/19/2012

08-19-12 - Packages in Standard C

So we've talked about DLL's a few times and how fucked up they are. What you really want is something like a DLL that you can statically put in your app so it's not a separate file. We'll call this a "package". But I was reminded the other day that C libs are also fucked up.

Last week at RAD we discovered that some of our Xenon libs were several megabytes bigger than they need to be simply because we included "xtl.h" in a few files. What was happening was that xtl.h has a ton of inline functions in it, and the compiler goes ahead and compiles all of them and sticks them in your OBJ even if you don't use them (of course this is another problem with C that I'd like to see fixed - there's no need to waste all that time compiling functions that I don't use - but that's another rant).

Of course when you make a lib, all it does is cram together your obj's. It doesn't strip the uncalled functions (that's left to the linker, later on).

So DLL's are fucked and we wanted them to be "packages" ; and libs are also fucked and we want them to be packages too!

What I think a "package" in C should be :

  • 1. You provide an explicit list of exports (perhaps by adding an __export decorator). Only exported symbols are accessible from outside the package. (you may also have some explicit imports if you are not a leaf package).

  • 2. All internal references in the package are resolved at package build time. This lets you find "link errors" without having to make an exe that pulls in every obj in the lib to test for missing references.

    This also lets you strip all un-referenced and un-exported symbols.

  • 3. LTCG or anything funny like that which is delayed to the link stage could be done on the package.

  • 4. Symbols of the same name that are in different packages do not conflict. This is such a huge disaster in C and a source of very unpredictable and hard to deal with bugs. Just because some lib used a global "int x" and my lib uses a global "int x" doesn't mean I want them to be the same variable.

  • 5. If your package uses some libs, they should (optionally) be linked into the package and made "private" not exported; that way multiple packages that use different variants of libs can be put together in one app without conflict.

So that's all background material. What this post is about is this : it occurs to me that you can get most of this in standard C by making your own "libpackager" tool.

libpackager should take a lib and output a lib. You have to also provide it a list of exports (or you could use some decorator that it can parse to mark the exports). It can parse the obj's in the lib and find all the symbols and do its own "link" step to eliminate unreferenced symbols, then remake the obj's without those symbols. So this gives us #2, which is a pretty big win.

You could also do #4 by having libpackager decorate all the internal symbol names that are neither import nor export. This is roughly equivalent to if you had put all your internal symbols in some namespace.

You could even do #5 ; make libpackage go and grab the libs you reference, stuff their obj's into your lib also. Then your copy of the lib and your references get name-decorated so they don't conflict with someone else. eg. say you want to make "oodle.lib" as a package and you use "radmemset" from "radutil.lib" , packager could grab radutil.lib and stuff it in; then since radmemset is now an internal reference, it gets changed to "oodlelib_radmemset". Now when you put "oodle.lib" and "bink.lib" both into your app, if they used different versions of radmemset, they will not cross-link because the libs have been made into fake "packages". (this step should be optional because sometimes you do want cross-links).

One annoying complication is that this doesn't work with the stdlib in a straightforward way. I would very much like to be able to "package" all references to stdlib in this way, but stdlib is not just a normal lib, it also has some special cheating connections to the crt0 startup code, so you can't just go and rename all its symbols to oodlelib_memset and such. Perhaps this could be resolved, which would be nice to avoid all those garbage problems that arise because some lib was built for libc and some other lib was built for libcmtd , etc.

I think this all is pretty straightforward (other than the stdlib issues). The only hard part is parsing the lib and obj formats on every platform and build variant that you need to support.

(BTW a bit of web searching indicates that the gcc tools on some platforms (Mac) provide some of this; there seems to be some special attributes for exports from libs and perhaps a lib tool that does dead strips; it's hard to follow gcc docs)

3 comments:

won3d said...

The new(-ish) linker, gold, does dead section stripping with the flag --gc-sections. You can tell GCC to put each function in its own section using -ffunction-sections (there's also -fdata-sections).

One thing to note about GCC is that, unlike MSVC, methods are external by default, which is important if you're making a .SO.

Ji said...

I don't know how MSVC implements inline with C, but for C99 compilers can and do skip compiling unused inline functions. Actually, they're required to unless it's static inline, and for that I don't know of any C99 compilers that compile static inline when unused even with no optimizations.

I think you're right that ELF shared objects provide everything you want, except the private bundling (that sounds more like an OS X framework). But you can have versioned symbols, so you get what you wanted just without combining the two libs.

cbloom said...

I think it was OS X gcc where I saw something about manually marking up an export list.

old rants