Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Wednesday, December 17, 2008

Automatically Deducing Build Dependencies => Cross ToolRecursive Builds Work

Recursive make sucks. As presented by the Peter's classic paper "Recursive Make Considered Harmful", http://aegis.sourceforge.net/auug97.pdf

Toolssuch as Peter's Aegis and Cons and Scons "automatically" deduce dependencies for an entire project - but only semi-automatically, since they rely on knowledge of the tools you are using. E.g. they know about C++ #include files, but not about languages that are not on their list. Not about things in your shell scripts, unless you tell them, e.g. in the SConstruct file or by writing a special scanner for your language.

I have long been a fan of truly automatically deducing dependencies: run a build, and use a tool like strace to observe the files that get read and written.

Moreover, their whole project dependency analysis only works if all of the project uses the same build tool.

Unfortunately, that is not true for large projects that have subprojects. I am working on asimulator that was originally built using Make. Later it started using Cons,but kept some make around - i.e. the project starts off as a hybrid of Make and Cons.

But now I want to import tools, libraries, I have written. Reuse, eh? But I used SCons.

Having to rewrite all of my SCons into the project-wide Cons is a barrier to reuse.
Similarly, getting the whole project to use SCons is a barrier to reuse.

Basically, any statement of the form
"All of our problems would be solved if everybody did things the same way"
(a) might be true,
but (b) is bigoted, stupid, intolerant of diversity, etc.
Such statements are the root of all evil, and the cause of much ethnic cleansing.
But, ahem, we are talking about programming, aren't we?

I just realized that true automatic generation of build dependencies
would allow projects to be built up out of different build tools.

Best if all of the build tools used "true observation of dependencies". Then tool #1 would just have to know that it should tool#2 to do the build.

But I think that it can also be used to build a true, project wide, dependency system out of
not-fully-automatic build tools. E.g. this meta-builder could call less intelligent tools such as make, or cons, or scons.

It might also be possible for such a meta-build to call another build tool, in a guaranteed-build-from-scratch situation,
and then observe the commands invoked and rerun those.

I.e. such a meta-build might make recursive make correct.


---

John Ousterhout may have created an automatic build dependency tool,
in his company Electric Cloud.
At least the whitepaper
Solving the Dependency Problem in Software Builds
implies that it has been solved.

Unfortunately, I can't click through to be paper,
so I can't really tell.

Past conversations, either with Ousterhout or with EC sales folk,
lead me
(1) unsure as to whether they have done the truly automatic thing for dependencies
but
(2) sure that they are not "diversity tolerant"
- they are yet another tool that requires everyone, the whole project, to use their build tool.

I.e. they may have invented the automatic dependency idea -although I doubt it.

They do not seem tohave realized that it can be used to allow diversity of build tools.

---

My friend Mark Charney grew frustrated with Cons and SCons, because they could not handle truly dynamic systems that create objects. Mark, therefore, wrote his own build tool that was not limited by the phase ordering restrictions of Cons and SCons.

I wonder if the automatic, diversity tolerant, approach I describe above handles such truly dynamic builds. I suspect taht it does.

(With the usual caveat of there being no time dependent behaviour. It is okay to log a timestamp, but it is not okay to create different numbers of files on different days, just because the day is different. Tools must have the property: same inputs, different day => same outputs, at least same output names - although times may be embedded in the output.)

3 comments:

Scott Castle said...

Andy: I think you're right on. The dependencies are the hard, magic bit - and in large software systems with modular structures and recursive build processes, getting accurate depdendencies is *hard* - so if you can get good deps the tool platform problems can be solved in lots of ways.

You have a couple questions in your article i'd like to address, since I work for Electric Cloud:

"Past conversations, either with Ousterhout or with EC sales folk,
lead me
(1) unsure as to whether they have done the truly automatic thing for dependencies
but
(2) sure that they are not "diversity tolerant"
- they are yet another tool that requires everyone, the whole project, to use their build tool."


1) Yep, we really have done the truly automatic thing for dependencies - ElectricAccelerator tracks at the file-system level what files are read and written, collecting the data required to build the dependency graph for a build (and also letting ElectricAccelerator know when a target was prematurely fired, in order to revert it and re-run once all implicit deps are satisfied). The approach Andy indicates earlier in his article - using strace to trap this info - is a great first pass; strace slows a build down 5x or more, though, and isn't available on Windows. ElectricAccelerator is built for speed, so we're more elegant about how we do the data collection.

2) ElectricAccelerator assumes as a commercial requirement that developers are tightly bound to their tools, whether by choice, edict, or (mis)fortune. A commercial product which requires code change will fail, so it's important to us to support specific tools. So we don't actually require everyone to use our tool - we emulate *your* tool in most cases - and compatibility with the original infrastructure is maintained.

If you send me a message at scas tle [at] electric- cloud.com, i'd be happy to send you John's paper or any of the other whitepapers we've produced.

- Scott Castle

Wayne said...

You read my mind. I have had this project in the back of my head for years. Only drawback is portability. Every system is different.

On some systems you use strace, on Solaris dtrace might be the ticket. On Windows there is another solution, macos another. But you write that module once and you have an incredible build tool.

Basically you can build by example. You could even build it by hand once and from then on the tool could rebuild it as needed. And you get 'make clean' for free as well.

Also, the hooks used by stuff like Google Desktop Search to identify all files that have changed might work, if they can also flag files that are read.

Mark Charney said...

Andy, thanks for the plug. :-)

I'm not a scons expert, but in my build system (called mbuild) I was very motivated by the expressiveness and the portability that scons derives from using a (relatively) portable scripting language. In this case uses python.

I didn't like the single scan-build ordering that is inherent to scons. It is possible to subvert that, but I found that mechanism unnatural and the idiom I found on their wiki would break as scons evolved through different versions. In the end, I missed the more obvious control that is afforded by systems like make.

For make-based builds, it was annoying to have to rely on different external scripting and tools systems on each platform. (ie. Do I force the use of cygwin on windows for things like "rm"?)

So I tried to make a hybrid. I cannot say that I solved the recursive build problem, but with mbuild, I do have a portable python based build system. I spend much less time tinkering with my build system than I ever did when I used make or scons. "build" became a nonissue for me and I can focus on other problems. For me, that makes mbuild a success. Now if I could only release it...

--

I've been tripped up by scons' default inability to deal with conditional includes. It is truly hard to know what you depend on for a particular build. I've been considering adding simple CPP "#if defined(yyy)" knowledge to mbuild's source dependence scanner, but that is a slippery slope.

--

A couple of years ago when Andy mentioned the strace idea, I briefly looked at using strace and found that there are truly an enormous number of files that get included or referenced in a typical C++ build. I was pretty surprised. I didn't want to have to list dependences on the "phase of the moon" as it were nor attempt to filter out which ones were really important and which ones were ignorable. And as the other commenters have pointed out, portability is an issue and filtering on different O/Ses seemed tricky.