Speeding up Darwinports builds with distcc

Submitted by hagus on Wed, 2005-11-02 15:14.Apple

So the other day I tried to build Ethereal from Darwinports on my 1.5 GHz 15" PowerMac. Oh boy! When it comes to the raw CPU and IO throughput needed for compiling a lot of C code, this machine barely copes.

A partial solution (i.e. other than waiting for grunty Intel notebooks next year) is to distribute your build using the built in copy of distcc. Distcc is a fairly rudimentary way of distributing builds, but it works.

In this article I assume you have a recent Xcode installed, as well as Darwinports.

The general principal we will follow is:

1. Set up the distcc daemon via Xcode.
2. Organise for our port in Darwinports to use distcc as the compiler front end.
3. Build some stuff!

The rough principal behind distcc is that it's a front end to your compiler. Flags given to distcc just get passed on to gcc. You set up your Makefile (or you "configure" script) to use distcc as the compiler, and ensure "make" is called with a suitable -j flag.

With -j 8, make will attempt to fork eight copies of the compiler. The general rule of thumb is that the argument to -j should be double the number of available CPUs. With distcc as the compiler, it will attempt to farm simultaneous jobs off to other machines. The other target machines are given via the DISTCC_HOSTS variable.

The process that answers build requests on remote machines is distccd. On OS X, we can tweak some settings in Xcode, and it will handle firing up the daemon for us. Fire up Xcode on all your target machines and set the preference pane up thus:

You must also edit the /etc/compilers file. This is the list of valid compilers that distccd will invoke for us. Uncomment the compilers without a full path at the bottom of this file. Distccd should pick up these settings every time it receives a job.

So obviously we need to tweak some of the Portfiles from Darwinports to use the desired compiler and Makefile settings. I have not yet found a way to make this really elegant and simple. For instance, I can't find anywhere in the Darwinports tree to set a global override for the compiler and make flags. This creates a problem for ports that have a lot of dependencies - you have to edit every relevant Portfile, perhaps via an automated script. Bummer.

The project that I wanted to build was ethereal. The Portfile for this project was located here on my machine:

/opt/local/var/db/dports/sources/dports/net/ethereal/Portfile

The format of the Portfile is documented in detail. That's where I learned what Portfile details needed to be changed.

I changed the line 50 and 51 to read as follows:

configure.env LDFLAGS="-Wl,-search_paths_first" CC=distcc
build.args -j10

You can probably guess what's going on here. I am supply the CC environment variable to the configure command, in this case overriding it to distcc. I am then ensuring the -j10 flag is passed to the build command, which I happen to know will be "make".

Now all that remains is to set the environment variable that tells distcc about the other hosts we wish to use:

export DISTCC_HOSTS="test6 localhost"

Distcc is fairly naive; it will just allocate jobs in a linear fashion. There is no way I can tell distcc (to my knowledge) that test6 is actually a dual core 2.0 GHz G5 and localhost is just a simple laptop.

Anyway, you should now be able to issue the 'port build' command in the same directory as the Portfile and see the results.

If you are in doubt as to whether the remote hosts are receiving jobs, you can try two things. One, run tcpdump on the remote host: "tcpdump -A tcp port 3632". Two, visit Xcode Expert Preferences and turn on logging for the distccd process.

You can also set the DISTCC_VERBOSE environment variable on your main machine to '1', which should produce some good debug output. The 'port' command can also be put into verbose mode so that you can see what commands are being used.

Hope this gets you speedier builds from the Darwinports tree :)

Trackback URL for this post:

https://www.hagus.net/trackback/92
from n3th blog on Sat, 2005-11-12 07:15

Apple likes to sell Mac OS X Server, offering a whole bunch of management features and other goodies.
But for the average geek, the client version fulfils most needs just as well.

You can manage packages via Darwinports and even speed them up with dis

from mir.aculo.us on Fri, 2005-11-04 00:36

Tired of waiting for Darwinports installations to finish because of the tedious compile process? Read this guide to make Darwinports use distcc (of course you

from mir.aculo.us on Fri, 2005-11-04 00:32

Tired of waiting for Darwinports installations to finish because of the tedious compile process? Read this guide to make Darwinports use distcc (of course you

Submitted by Anonymous (not verified) on Fri, 2005-11-04 00:35.

I'm not sure how complete the distcc implementation is in OS X, but I believe in Linux, if you want to set priorities for certain machines, you chage the DISTCC_HOSTS variable by putting the number of jobs for each machine. For example: export DISTCC_HOSTS "test6/6 localhost" should spin off 6 jobs to test 6 and the rest( in your example 10 -6, so 4) of the jobs to any other computers in the list. You can also specify the amount of jobs for multiple machines. export DISTCC_HOSTS "test6/5 othermachine/2 localhost".

Also the order you computers appear in the hosts variable will be the order in which distcc will spin off jobs, so you will want more powerful computers earlier in the list.