Diagnosing our recent failed bulk builds

From: John Marino <dragonflybsd@xxxxxxxxx>
Date: Fri, 09 Dec 2011 09:01:08 +0100

The last good bulk build run for x86_64 current was Oct 28. Since then two more runs have been performed resulting in thousands of failed packages reported. They were caused by failures at the checksum phase where *sometimes* the bulk build script could not find the digest. One checksum failure can cascade to hundreds of packages (see math/pari).

It's likely that something about the bulk build setup has changed since Oct 28. I read somewhere that Justin was using NFS to access the pkgsrc directory. Is that the setup being used here? If so, when was it set up like this?

Can we use rsync to make a local copy of the latest pkgsrc on each build box and take NFS out of the equation? NFS issues could explain why sometimes the bulkbuild can access the digest folder and sometimes it can't.

There should be a significant improvement since the Oct 28 build report on both platforms. It would be nice to figure out exactly what broke with the bulk builds and get some updated reports here soon.


