I Hate Makefiles (and other short stories)
January 12, 2008

I Hate Makfiles.... due to long boring bad history (from 1995) supporting a giant nested 18-layer-deep rats' nest; where a simple 'make' that does nothing took 3 minutes so that the minimum edit/compile/build/debug cycle starts with a 3-minute delay - and the mess is so incomprehensible that it takes a full-time engineer to make any changes.  This on a 750KLOC project with maybe ~400 files.  I figure checking time stamps on 800 files (400 .c and 400 more .o) should take maybe a second on cold caches, and should be not much harder to read as a list of 400 lines of files.  I finally got fed-up with the mess and wrote my own flavor of make.

This 'make replacement' was a great success (from my point of view) - it was instantly quick to do an empty make, it was all in 1 file (and that file was some boiler-plate then a simple list of dependencies), it supported parallel builds (without interleaving output from the different build steps), it was cross-platform, it built cross-platform, it sang, it danced and it fit on the head of a pin. To be fair, it was well received by the project community (and especially the engineer tasked with maintaining the old stuff in his spare time) and eventually everybody moved over to the new setup.  So when high-scale-lib needed a build system I decided to ignore Ant and all other project-build systems and repeat this success in Java (yes I'm being tongue-in-cheek here with some NIH thrown in).

Here's what I did...

  • It's All One File - no hiding tedious-but-required details in files buried deep in obtuse include-file paths.  Called build.java here
  • It's A Common Programming Language - (was C, this time Java); no exciting new syntax to learn (e.g., all whitespace is ok, except for tabs which are very special - OR - piles-o-XML-brackets harkening (or is it horkening?) back to the LISP days).  The project builder itself thus comes with a complete set of debugging & performance tools.  If I need a special kind of build-step, I can hack it in.  Example: some silly tool I'm using doesn't properly set the OS status, so I have to 'grep' the output for 'error' to tell if the build-step failed
  • It's In the Obvious Place (root of the project)
  • It Stands Alone (except for a handful of well chosen tools: was 'cc', this time a JDK).  Things like 'rm', 'touch', 'cp', 'cmp', 'grep', etc are all built-in and need no to configuration
  • Up-to-date is Based on File Timestamps - and nothing else.  I can tell the project is good with nothing more than my eyeball and 'ls'.  Sometimes this invariant is annoying and may cause some small dummy files to be needed... but it's worth it in the long run because the definition of 'up-to-date' is easily understood by all
  • Dependencies Are Obvious - They are written as a data-structure.  I'm trading off some verbosity for explicit clarity here.  Searching in an editor is instantly quick (no pain for lines of nearly-replicated stuff), so finding things is fast and it's always immediately clear what they mean.  Here's the definition of a Java source file org.cliffc.high-scale-lib. NonBlockingHashMap.java and the build-step to produce the class file:

  static final Q _nbhm_j   = new Q (HSL+"/NonBlockingHashMap.java");
  static final Q _nbhm_cls = new QS(HSL+"/NonBlockingHashMap.class", javac, _nbhm_j );

The dependencies have some boilerplate (e.g. "static final Q"), but it's all obvious stuff including the boilerplate.  No new syntax.  Full editor support for writing dependencies, etc.

Makefiles change (or at least they do if you can understand what the heck is going on) - and when they do the set of rules which defined the 'goodness' of your build gets whacked.  So my uber-make has to build itself:

  • It Builds Itself - a quick check of build.java vs build.class timestamps, then apply javac to build.java, afterwards fork 'java build $*' to do the build in the New World Order.

That's the Big Picture.  After that I add features as I need 'em.  Here's what I got so far:

  • -n - list build steps without building
  • -k - keep going after errors
  • -v - verbosely list what's going on. 
  • -clean - Nuke buildable files
  • Shortcuts to build with javac, build javadocs, build jars, and build, run & verify with JUnit tests.  In short, short-cuts are easy to add.  Anytime I see I'm repeating the same kind of build-step over and over again, I make a shortcut.
  • Limit output of noisy build-steps.  Default is 1 line of output per build-step on success (just an echo of the line), and full spewage on failure.  This makes log files tidy and easy to read.  I'll probably put this on a flag someday for people who like to see a zillion lines of successful build log, but I like the 1-line-per-success as the default.
  • Sanity checks that build steps actually build what they claim to build, and do not muck with other files (common error in hacking makefiles is to get the build-step messed up and have it produce the wrong file in the wrong place, or worse whack a source-file by mistake).

Things I've had in the past and will probably get around to adding eventually:

  • Use 'gcc -M' to automatically track and gather include-file dependencies.  No user interaction required, other than adding or removing '#include' directives.
  • parallel build, caching of .classes, using a javac bean server (with auto-launching on the first compile)
  • Build slowest files first (means you don't end a 200-file parallel 'make' on an 18-machine build farm by compiling the largest file last - guaranteeing another 5 minutes of build with 17 machines idle)
  • Build recently failing files first - these are mostly like to fail again instantly (assuming you did't get all the syntax errors out in the 1st pass)

Something I'd like to add this go around:

  • Verify the build-step doesn't access any files other than it claims it needs.  This is another sanity check to cover another big source of makefile errors: forgetting a dependency.  I probably need to get 'last access time' for files to do this, and I don't know if this is portably available.

Look for build.java in high-scale-lib and tell me what you think!

Cliff

Category: Web/Tech | | TrackBack (0)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451bd7669e200e54fdc82028833

Listed below are links to weblogs that reference I Hate Makefiles (and other short stories):

 

Comments

Did you ever make either system available to the general public?

Incidentally, dependencies can be found for C files by passing a command-line option to gcc. I don't know if this is available for Java, but I guess it must be since Eclipse seems to work it all out.

Posted by: Chris Purcell | Jan 12, 2008 1:05:39 PM

 

I had the 'gcc-M'-auto-tracking-#include-files working in that older system, and I hacked my post to reflect that. I bet there's a similar thing somewhere in javac.

The older stuff belongs to Motorola; I'm sure it'll never see the light of day. This version is available now from high-scale-lib (but of course is lacking lots of features).

Cliff

Posted by: Cliff Click | Jan 12, 2008 1:47:41 PM

 

A 3-minute null make for a 400 file project is pretty egregious; I'd be curious what a profiling session turned up. My last project had nearly 50K files (~15M LOC) and the decrepit old Microsoft NMAKE took just over a minute for a null build. I did have my own dependency-generator in AWK that led the nightly build and took about 2 minutes to run on a cold cache.

Also, you forgot to mention the Amiga make that you wrote back in '89. The Amiga make was of the stupid variety, and to make things worse their filesystem stored the metadata in the sector ahead of the file data itself, presumably so a disk recovery utility would have an easier time. But this meant that a simple directory listing would pound the snot out of the drive as it ran the seek head out to every single file to read the metadata. Between the stupid make and the stupid filesystem a null make took about half as long as a full build. The bizarre filesystem was only half of the wierdness of that machine though. The main part of the OS was written in C in the US and took normal pointers, the filesystem was written in BCPL in the UK (based on the TRIPOS OS I believe) and took right-shifted pointers. It still boggles my mind that Amiga exposed this to developers instead of using a thunking layer - it's not like an extra hundred cycles per call were gonna kill filesystem performance.

Posted by: Michael Parker | Jan 18, 2008 5:29:18 AM

 

Which one of your JavaOne talks were accepted? The concurrent programming one sounds interesting to me, so I kinda hope that one got accepted :)

Posted by: Azeem Jiva | Feb 11, 2008 11:01:39 AM

 

Why do you even feel the need to use make on a 400 file project?

For medium to small sized projects, I do not believe in make/ant/whatever. Instead, I use a build process in which compiling involves gutting my class file directory and then compiling everything fresh from scratch.

This approach is simple and rock solid reliable; nothing can ever go wrong.

The sole problem, of course, is that it does not scale: if you have a big project, stick with make/ant/etc.

How big is too big for it? Well, it takes me 4 seconds on my commodity PC to build my personal code library which contains 135 .java files. Assuming linear scaling, it should take 3X that (12 seconds) to compile a 400 file java project. If you can live with a 12 second compile cycle, then 400 files is not too big. Obviously, at some point a pain threshold will be reached, and you will want a more sophisticated solution.

My build process, incidentally, is contained in a few script files (.bat or .sh) that do things like java compile, javadoc generation, jar creation, and run java programs. For a few operations, the script files also call a single java class (mainly written to overcome the brain deadness that is DOS bat files).

Lemme know if you want a copy.

Posted by: Brent | Feb 23, 2008 8:08:47 AM

 

The problem was the EDG C++ front-end parser code: 400 files, but 750KLOC of code. Each file was on average ~2000 lines. Peak single function size was >7000 lines.
This in 1995; single cpu machines took ~5 mins to compile that large function (C++ compiler, includes full optimization so does a lot more work than javac). A full-up from-scratch build was usually a 3 hr affair, and required running code on 3 different OS's (we cross compiled in all directions, some of the tools: e.g. old Mac stuff, only ran on specific platforms).

Cliff

Posted by: Cliff Click | Feb 23, 2008 10:25:26 AM

 

OK: that sounds like perfectly valid reason to fall back on make or something like that.

Speaking of compiler speed: my understanding is that javac is not multithreaded. In particular, javac does not take advantage of multi-CPUs/cores and do a lot of stuff concurrently in order to speed up compilation.

First of all, is this understanding true?

Assuming that it is, do you happen to know the guys at Sun who work on javac and whether or not they plan to add such support?

Posted by: Brent | Feb 25, 2008 7:23:25 AM

 

Heh - Secret Compiler-Writers' Lore: it's nearly impossible to get a compiler to run faster with concurrency. It's been tried repeatedly, by very smart guys over long periods of time. There Be Dragons!

On the other hand, parallel make works marvelously well. So if you got spare CPUs the Right Answer is to use a parallel make (which I did).

If you got javac then the Right Answer is to keep it warmed up & running all the time. My java/emacs/jde IDE does exactly this: javac run from a Bean; it's always warmed up and basically instantly fast to get started. If I had any Large Java Programs to compile I'd use parallel copies of a Bean'd javac spanning many machines. On Azul, I'd like a single JVM version of javac that could run multiple complete compiles concurrently: saves on overhead of replicating the JVM.

I know guys who USED to work on javac; they are now at Google. No plans to make javac concurrent that I know of although I believe you can call javac as a function call - so you ought to be able to make a clever daemon process that does parallel compiles, & maybe caches .class files, etc - all the clever stuff C build farms have been doing for years.

Cliff

Posted by: Cliff Click | Feb 25, 2008 8:16:49 AM

 

>it's nearly impossible to get a compiler
>to run faster with concurrency.

Nuts! All this modern multicore hardware lying idle...


>It's been tried repeatedly, by very smart
>guys over long periods of time.
>There Be Dragons!

Knowing nothing about compiler internals, I guess that the issue is all the complicated dependencies? (E.g. to compile class A might require compiling class B, but if you are compiling class C which also depends on B then one of A or C has to block until B is available?)

Or are there worse issues?

Looks like a fun problem to work on.

Someone told me that Jikes was a concurrent compiler, but doing a web search just now, I can find no information backing that up.

Clever idea about keeping javac warmed up--by how much can you reduce compile times versus compiling from a cold start? I would have thought that the javac startup cost is pretty small--but I have never measured it either.

I am confused by what IDE you are using, and am also not clear how "parallel copies of a Bean'd javac spanning many machines" could possibly help--unless you have somehow determined subsets of class that have no cross dependencies, and have each machine work on a distinct subset.

Posted by: Brent | Feb 26, 2008 2:31:54 AM

 

Oops - wrong meaning for "compile".

javac converts Java to bytecodes, but I'd barely classify it as a compiler.

C/C++/JIT's make machine code from some input language (be it C or bytecodes). THIS stuff is way complex, full of subtle dependencies. Any compiler that turns into a production compiler (i.e., not academic toys) tends to be structured as pass-after-pass-after-pass, and each pass is itself serial.

What javac does is some combination of parsing (also a very serial job) and 'make'. The 'make' portion can probably figure out decent parallizations, ala pmake. The C+pmake guys have long be able to do parallel compiles across many machines (a "build farm"). No reason javac could not do the same: "compiling" to bytecodes across many machines. It's only an issue if you got a big project of course.

Keeping javac warmed up: I dunno, but it's "blink compiled" fast this way. So <0.5sec. Again, the C+pmake goes have done this for years: keep the C compiler process up in a daemon. All IDE's do this. No need to load from disk, it's all hot in CPU caches, etc.

As for the IDE I am using? The One True IDE! Plain Olde Emacs!!!! With a java flavor-enhancer, of course (well, lots of java flavor-enhancers). ;-)


Cliff

Posted by: Cliff Click | Feb 26, 2008 8:33:11 AM

 

Glad to hear that javac could be parallelized just like pmake is. I suspected that it would not be that bad. Hope that Sun improves javac in the future.

As for JIT compilers, I am sure that any given method's compilation is serial as you described, but is it the case that different methods--especially if they are from different classes--can be compiled independently? If so, then multiple threads could be concurrently JIT compiling different methods.

>As for the IDE I am using?
>The One True IDE! Plain Olde Emacs!!!!

You are a man after my own heart. My first coding tools of choice are TextPad plus my own command line build system (which is just script file wrappers around the jdk command line tools).

I need to look at JEdit as an editor at some point.

And I do intend to give NetBeans a better look in the future, as I could use its code lookup and autorefactor capabilities from time to time.

But there is something so nice about a clean, lean, and mean dev env that is free of all the usual IDE clutter.

Posted by: Brent | Feb 26, 2008 8:52:33 AM

 

wow. i was never a big ant fan, but i have to say that maven2 has really warmed my heart. it really is easy to use, and the syntax makes sense, the dependency management is fantastic if you're working on frameworks that others will be using, it's great for doing multiple modules with dependencies between them.... i'm sure i could code something else up, but, i think you are maybe missing out....

ooh, and one other cool thing, if you package the pom.xml files with the modules, you can use them within a classloader to create isolated classloaders which allow the program to run with conflicting dependencies in different modules where each module sees the version of the external library it expects. good stuff (though, of course, not without its own perils...)

Posted by: Jacy | Mar 28, 2008 11:31:27 PM

 

I promise to take another look at maven, but last time I checked it was full of XML.

Cliff

Posted by: Cliff Click | Mar 29, 2008 9:02:12 AM

 

:) it is at that. but the other nice thing is, i don't have to maintain it, and can focus on the problems i'm more interested in.

Posted by: Jacy | Mar 29, 2008 9:43:18 AM

 

Post a comment