Next Previous Contents

2. Using distcc

2.1 Invoking distcc

distcc is prefixed to C compiler command lines and acts as a wrapper to invoke the compiler either on the local client machine, or on a remote volunteer host.

For example, to compile the standard application program:

distcc gcc -o hello.o -c hello.c

Standard Makefiles, including those using the GNU autoconf/automake system use the $CC variable as the name of the compiler to run. In most cases, it is sufficient to just override this variable, either from the command line, or perhaps from your login script if you wish to use distcc for all compilation. For example:

make CC='distcc gcc'

NOTE: You cannot just set CC=distcc, because distcc needs to know the name of the real compiler.

2.2 Options

distcc only accepts a single option, --help, which causes it to print a usage message and exit, as does invocation with no arguments. All other options and arguments are understood as the name of a compiler, followed by arguments and options for the compiler.

2.3 Environment Variables

The way in which distcc runs the compiler is controlled by a few environment variables.

NOTE:

Some versions of make do not export Make variables as environment variables by default, and also that assignments to variables within the Makefile may override their definitions in the environment. The most reliable method seems to be to set DISTCC_* variables in the environment of Make, and to set CC on the right-hand-side of the Make command line. For example:

$ DISTCC_HOSTS='localhost wistful toey'
$ export DISTCC_HOSTS
$ make CC='distcc gcc' all
          

DISTCC_HOSTS

Space-separated list of volunteer hosts.

DISTCC_VERBOSE

If set, distcc produces explanatory messages on the standard error stream. This can be helpful in debugging problems. Bug reports should include verbose output.

DISTCC_LOG

Log file to receive messages from distcc itself, rather than stderr.

2.4 Which Jobs are Distributed?

Building a C program on Unix involves several phases:

distcc only ever runs the compiler and assembler remotely. The preprocessor must always run locally because it needs to access various header files on the local machine which may not be present, or may not be the same, on the volunteer. The linker similarly needs to examine libraries and object files, and so must run locally.

The compiler and assembler take only a single input file, the preprocessed source, produce a single output, the object file. distcc ships these two files across the network and can therefore run the compiler/assembler remotely.

Fortunately, for most programs running the preprocessor is relatively cheap, and the linker is called relatively infrequent, so most of the work can be distributed.

distcc examines its command line to determine which of these phases are being invoked, and whether the job can be distributed. The command-line scanner is intended to behave in the same way as gcc. In case of doubt, distcc runs the job locally.

In particular, this means that commands that compile and link in one go cannot be distributed. These are quite rare in realistic projects. Here is one example of a command that could not be distributed:

$ distcc gcc -o hello hello.c

2.5 Running Jobs in Parallel

Moving source across the network is less efficient to compiling it locally. If you have access to a machine much faster than your workstation, the performance gain may overwhelm the cost of transferring the source code and it may be quicker to ship all your source across the network to compile it there.

In general, it is even better to compile on two or machines in parallel. Any number of invocations of distcc can run at the same time, and they will distribute their work across the available hosts.

distcc does not manage parallelization, but relies on Make or some other build system to invoke compiles in parallel.

With GNU Make, you should use the -j option to specify a number of parallel tasks slightly higher than the number of available hosts. For example:

$ export DISTCC_HOSTS='angry toey wistful localhost'
$ make -j5
            

2.6 Choosing a Host?

The $DISTCC_HOSTS variable tells distcc which volunteer machines are available to run jobs.

distcc uses a simple locking heuristic on each client to keep track of which volunteer machines are likely to be busy. distcc prefers to distribute jobs to machines that are not already running a job from this client, and prefers machines occurring earlier in the list of hosts.

distcc does not explicitly coordinate jobs injected from multiple users or client machines.

If only one invocation of distcc runs at a time, it will always execute on the first host in the list. (This behaviour is not guaranteed, however.)

2.7 Diagnostic Messages

distcc prints a message when it runs a command locally or remotely. For more information, set $DISTCC_VERBOSE and look at the server's log file.

By default, distcc prints diagnostic messages to stderr. Sometimes these are too intrusive into the output of the regular compiler, and so they may be selectively redirected by setting the $DISTCC_LOG environment variable to a filename.

2.8 Exit Code

The exit code of distcc is normally that of the compiler: zero for successful compilation and non-zero otherwise. Error messages from local or remote compilers are passed through to diagnostic output on the client.

If distcc fails to distribute a job to a selected volunteer machine, it will try to run the compiler locally on the client. If that fails, distcc will return exit code 1.

distcc tries to distinguish between a failure to distribute the job, and a "genuine" failure of the compiler on the remote machine, for example because of a syntax error in the program. In the second case, distcc does not re-run the compiler locally.

2.9 distcc with ccache

distcc works well with the ccache tool for caching compilation results. To use the two of them together, simply set

CC='ccache distcc gcc'

2.10 File Metadata

distcc transfers only the binary contents of source, error, and object files, without any concern for metadata, attributes, character sets or end-of-line conventions.

distcc never transmits file times across the network or modifies them, and so should not care whether the clocks on the client and volunteer machines are synchronized or not. When an object file is received onto the client, its modification time will be the current time on the client machine.


Next Previous Contents