DragonFly BSD
DragonFly kernel List (threaded) for 2004-02
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Packaging system effort

From: "Simon 'corecode' Schubert" <corecode@xxxxxxxxxxxx>
Date: Thu, 26 Feb 2004 13:36:43 +0100

Hey people,

As some of you might know I've written up my thoughts about what a proper packaging system should provide etc. It has become a bit longish, but I just had to add everything in my mind (at least concerning packaging), I feared otherwise people would start bikeshedding about things I had thought about but didn't write down.

This text is not about implementation, not even a little bit (if there is some, ignore it). This is intentional. I think we first need to come to a conclusion *what* we want and after that start thinking about *how* we will implement it.

Uhm, there was some talk about a June release, no? :) So timeline is pretty narrow (if we want to have our own packaging system ready for then). I propose some weeks (1-2) for discussion about these points here; in this time we need to come to a conclusion about what we want and what of that needs to be in in the first release. After that some weeks for implementation proposals and discussion (proposals should be written down completely and then submitted to the list so that we can all discuss about various complete concepts rather than about fragments which could be implemented). After that, start hacking [yes, me too] :)

Now here it comes. Oh yea, current version will be available at <http://chlamydia.fs.ei.tum.de/~corecode/packaging.txt> and Justin will put up the text (after it has been polished) on the blog and the main page too I guess.

Thanks for taking your time for reading and commenting(!)...


Thoughts about a packaging system --------------------------------- $Revision: 1.2 $ $Date: 2004/02/25 15:02:27 $

A package building and installation system (referenced as packaging system now)
should provide several features which come to mind after some time of using
other (partly incapable) systems and thinking about usability.

This is what I did for the last months, still I failed to write down my
thoughts, now I'll try and write down the mess in my head :)

My current knowledge concerning packaging systems is rather limited to Debian's
dpkg, FreeBSD's ports, NetBSD's pkgsrc and Gentoo's portage. As of such I'll
only reference to these systems for comparison with the One True System (OTS)
I'll describe here.

The packaging system should be OS agnostic in almost all parts, the package
descriptions in large parts. This is a requirement if the system wants to be
OTS, or - at least - provide its functionality to a wider range of OSes (see
pkgsrc/zoularis, portage to some extent).

The packaging system must provide the necessary infrastructure to hold
descriptions for multiple versions of one ``program'' without lots of overhead
and easy (read: easy, sane choice of versions) usage. This functionality is
needed for both specialized deployment/installation strategies and multiple
OS/arch support as described below. Portage provides such a feature, ports
doesn't really (ok, there are -devel and versioned directories, still I see this
more as a bandaid).

Multiple architectures must be supported. This means build quirks for special
architectures and the need for multiple versions of one program, as e.g. i386
might be supported the best whereas amd64 might only be unstable for the same
version or won't build at all.

It is highly desirable to be able to install multiple versions of one program at
the same time. Besides means to enable this in the filesystem (symlinks,
variable symlinks, VFS voodoo, etc) - which might not be available on all target
platforms - this also adds some more questions concerning the logic of newly
installed packages. Imagine two perl versions installed: 5.6 and 5.8; which
version should the newly installed spamassassin depend upon?

This brings us to another point: Clean build environments, environments which
only contain the build and/or (to be discussed) runtime requirements (note that
dependencies are the opposite - a misnomer in ports/pkgsrc), so that there is a
guarantee that various configure scripts (or whatever) don't suck in optionally
supported components and create not registered requirements. This also needs
special filesystem voodoo, VFS might be a nice thing to use, pkgsrc does this
via buildlink's symlink system.

As the system should be easily usable and not just academic, easy tools are
strongly needed. This includes tools for all sorts of maintenance: from updating
descriptions over searching to upgrading installed packages. It is most
desirable to also provide graphical tools (ncurses, X11, web), or at least
provide infrastructure so that third parties can easily develop such.

The system should be able to track various kinds of requirements and act
differently upon them: Build time requirements might easily be garbage collected
because they are not needed once packages are built; runtime requirements (e.g.
shared libraries) might not be in use any more and could thus be cleaned too
(compare portage's world file).

Concerning shared libraries/runtime requirements: When runtime or build time
(harder case) requirements are being updated it is not always needed to update
their dependants too. For example, this could be the case for security fixes in
shares libraries; if the shared object major version changes or a dependant is
linked statically to this shared library, this - of course - can't be applied.

Of course the system must provide an advanced requirement and collision system
which also provides room for meta requirements (MTA, web server, whatever;
compare to dpkg and portage). This also means ability to fuzzily specify version
numbers (>=2.0, everything but 1.4) and - where applicable - package flags (see

It is also desirable that the system can dynamically include additional optional
requirements if the host system provides this (e.g. optional GNOME, IPv6 etc);
either automatically or semi-automatically. This choice could possibly
additionally be handled with package flag settings as described next.

A very strong must have is a unified package flags system. Ports provides
package flags (e.g. USE_LDAP), but these are not unified and per port only.
Specifying in make.conf helps a bit, but lacking a global registry this can be
painful. Portage provides a better way by use flags but one package only flags
are not handled correctly. There needs to be a (small sized and thoughtfully
selected) global flags registry which contains more that yes/no: On a server
system, I most certainly don't want any X11 stuff being sucked in when
installing a package, so "never ever" is a needed state. Sometimes I might not
want X11 support if optional but don't care when a package requiring X11
installs this too; this corresponds to a "better not" state. And, of course
there also is a "if it can support it, use it" state.

Packages themselves need the ability to use package local flags too. This might
be the case for e.g. subversion ("I don't want the DAV server stuff") or PHP
("Well, support this and that..."). Ports supports this but it's not unified and
not easy to use. Not everybody wants to more Makefile to find out all flags that
can be used, nor can this be used for recursive requirements. A unified system
is needed which allows the user to customize the packages in an easy (graphical,
for example) and unobtrusive (all questions asked before unattended
installation) way. Nevertheless, some users don't want these choices, they "just
want" some package, so there need to be sane defaults which will be used if the
user chooses not to answer any questions at all.

Package flags and binary packages don't really mix (compare portage: no binary
packages at all), so another feature is needed: package flavors (compare
OpenBSD's ports, I heard). A package flavor is a predefined, sane set of package
flags which can be automatically built into a binary package. This also allows
to give the user a kind of flexibility without the need to cope with all
possible flags.

All these flags of course need to be registered for all installed packages so
that this recorded preference is used in upgrades. If package flags/flavors
changed their meaning or got added/deleted it might be desirable for the user to
get asked to review the settings; if flags/flavors didn't change, the system
should be able to use old recorded settings.

The system should also be able to support split packages: Some packages
(especially X11) are so big so that it's highly desirable to split them. Ports
does this by creating several "independent" packages which just happen to use
the same source code. OpenBSD's ports natively produce several binary packages
for one port, as I heard. The way this is being implemented needs to be subject
of further discussion.

Debian's way of providing -dev packages which have been splitted off is always a
highly controversial point of discussion. This is why I want to comment on this
here. For a source based system as ports etc. having header files available is
unavoidable, and also when using binary packages the bloat due to header files
and static libs is small. Still there may be cases where every file must be
considered, so having the possibility to prune development files if feasible
might be a nice add-on. Same goes for foreign gettext language files etc. This
could all be implemented via global package flags. What I'm opposing is the
creation or use of additional packages for -dev headers/libs. The number of
distinct packages should be kept to a minimum.

It is desirable to have a way to import an individually defined set of packages
for easy deployment of multiple systems.

The system must both support building from source and installing from
precompiled binary packages equally good and be able to use building from source
as fallback method if binary packages are not available in an individual
configuration. Furthermore it should be easy to build binary packages for
installation on another system.

As a direct conclusion the system must have strong binary package distribution
support. In the past a lot of people demanded a streaming binary format to have
the ability to install packages straight whilst downloading without having to
wait for the whole package. This needs to be discussed further as installing
while downloading is even less atomic than installing after download which can
lead to other major problems.

A nice feature might be the availability of relative (binary) patchsets between
certain versions (individually selected) to reduce consumed bandwidth and
installation time. For binary patching systems see the bsdiff effort.

If possible, a nice addition would be the optional integration of installation
and build management of the base system. Together with an advanced and easy
binary update system this would lead to an unified system update mechanism - on
the cost of losing the clear border between system and third party products (as
it is the situation with ports at the moment). Package flags could easily be
used as a way to customize the base system, as we currently use -DNO_CRYPTO etc.
The advantage for the user is clear: System and third party products appear as
the same category; the OS isn't just kernel + some userland; it appears as
everything provided via the packaging system (see linux world).

The system must in any case provide different update strategies which need to be
selectable both globally and per package. This means: on a critical production
server, I don't want to upgrade my software (base system and third party
products as e.g. apache) unless there is a security problem (might even be
classified into local/remote root/DoS) or I need new features only provided by a
newer version. I'll call this way of updating the very conservative way. Other
users might upgrade every now and then to a new version which has been tested
and tagged as stable working (this can be different for various architectures or
OS). Some other power users might upgrade to every newly released version
because they don't care about minor instabilities. This update strategies need
not only to be selected for all available packages as a whole, the user needs to
have the ability to individually specify them for single packages or groups
thereof. Ports doesn't provide this - versions are dictated by committers;
portage provides this feature to some small amount (accept keywords). Debian
runs stable/testing/unstable versions.

For this to work properly, packages need to carry information about
vulnerabilities, new features etc. so that the admin can chose whether an
upgrade is needed or not. This shouldn't be a whole changelog, just a summary of
the most interesting changes.

The use of cryptographic signatures is a hard requirement. This must be
implemented for package descriptions and for binary packages. MD5/SHA1 is no
cryptographic signature! This could mean an openssl requirement for the
packaging system itself or the need of implementing some cryptographic
functions. The distribution and extent of default trust of a certificate needs
to be discussed in this context too.

Another important aspect is a powerful build system. It should be possible that
multiple packages are being built at the same time and get synchronized for
installation etc. It's just PITA if you're compiling KDE or OpenOffice and can't
build/install a small package like mpg123 or irssi because this might damage the
package db. Existing systems handle such cases nice most of the time, but that's
just luck. Pkgsrc implements locking as far as I remember.

Another very nice to have is native distributed build support. This is very much
needed if one needs to install customized packages on a slow machine
(firewall/NAT etc) or does binary package building for distribution. Portage
provides this kind of service via distcc and it just plainly rocks. You can even
build OpenOffice in reasonable time with 10 boxen compiling :) Another
possibility is the use of distributed pmake.

Display of the build progress is a nice add on for users for sure. This can be
both implemented in a macroscopic way (x of y packages built) and microscopic
(anybody wanna hack make for SIGINFO?).

Speaking of compilation for slow boxes: Cross compilation comes to mind. Is this
needed when distcc support exists? Discussion point here.

As times get harder and it's common that the source/configure of major software
get compromised the system should include the possibility (hopefully as default)
to build packages either as non-root or in a chroot/jail (who needs network
access for builds anyways?). This - of course - needs VFS magic or else to map
requirements into the chroot.

It should be possible to build and install packages as an unprivileged user.
Sometimes local security policy or laziness of an admin demands the installation
of a package into the user's home dir. A nice point would be native support for
such in the packaging system. This doesn't mean that binary packages need to be
relocatable into home dirs, but the system would need to provide an alternative
(user home) location of package registry.

An essential duty of a packaging system is the tracking of installed files. It
must be an easy task to remove a package and thus all its installed files from
the system. The system needs to provide collision management (same file
installed by several packages, VFS voodoo?) and configuration file awareness
(see below). Compare with portage (automatic list building) and ports (ugly
manually generated plists).

All config files that might be potentially modified by users (read: all) need to
be treated in a special way: they may not be overwritten, yet new versions
shouldn't be discarded. There must be an easy way for the user to merge own
changes and upstream changes. If the config didn't change since last
modification the system should be intelligent enough to suppress obsolete merge
actions. On temporary deinstall of a package, existing config files shouldn't be
removed but on the user's request the system should be able to purge remaining
config files. Compare with port's .sample files and portage's config file
protection system (path bound, fails e.g. on TeX stuff).

The packaging system descriptions shouldn't consume too much space in general
and inodes in specific. It's just horrible to have a myriad of small files and
directories in your /usr (or whatever) wasting a big deal of inodes. This goes
for end users. Package maintainers could have a different view of the
description which could be collapsed into less files later. A possible approach
could be one description file per available package and version plus
approximately one patch file for each version (vs. patch-per-file in ports).

This leads to patches. As the system should aim to be OS agnostic in most parts
this also counts for patches. These should be specially crafted so that they at
best don't interfere with the build process on other platforms/OSes. This means
extensive use of #if defined(__MyOS__) etc.

This portability is the key to a close communication and development with the
upstream authors. It should be policy that patches are to be written as cleanly
as possible and have always to be submitted upstream. The packaging system might
provide help in or even enforce this process. Having patches go in upstream
reduces needed files, enhances overall acceptance of the packaging system and
also provides people not using the packaging system with features and fixes.

The system should provide support for bug tracking so that users can easily
check for known bugs and report new ones or add followup information for
existing ones. As the bug tracking system should be closely integrated with the
system, bugs need be associated with packages or specific package versions. This
helps maintainers and committers to follow user input better than the GNATS/CVS
decoupling ports is currently using.

It goes without saying that the packaging system implementation must only have
low/moderate requirements for needed tools and processing power. This means the
system should be buildable with only POSIX tools and a moderately new C/C++/ObjC
compiler. If a scripting language is being used it should be one of the really
popular ones: sh, perl, python, tcl.

The system should be able to bootstrap itself. This means it shouldn't depend on
the system tools be included with the host OS. If the system undergoes changes
the tools need to change too. As seen with ports several times having the tools
in the base OS only complicates stuff and leads to legacy issues (e.g. tbz/tgz
issue). Pkgsrc provides this in a nice way. Bootstrapping also means
registering/tracking installation of the package system (and requirements of
it). This seems like a chicken-and-egg problem but it can and should be solved.

Package descriptions must be easy to generate and easy to be used. This could
lead to different views of the packages - one maintainer side and one consumer
side - which can be converted into each other. For e.g. ports a big show stopper
concerning format conversion (remember pkg_info) and fast processing (see INDEX
generation) is the fact that Makefiles are indeed interpreted and not just
parsed. This means slowdown in processing and also problems in automatic
conversions - you never know how creative a maintainer was in (ab-) using make.
This leads me to one conclusion: Don't use an interpreted file format. Use a
standardized description format which only needs to get parsed. This is much
faster and more portable if multiple programs intend to work with the
descriptions. Not having a turing-complete language to use when writing a
description might at first need some change in thinking (when moving from ports
or portage) and will involve the need of writing more text/data (like no more
using ${variable:C/pattern/replacements/}) but will help overall cleanness.

To prevent the need of writing common things all over again and thus the
possibility of inconsistencies, the system needs to provide infrastructure to
group common settings as templates. I'll call it package classes for now. This
similar to portage's eclasses and port's special .mk files. Creation of these
classes shouldn't take place when only few consumers exist as too many existing
classes destroy cleanness and transparency. It must be possible for a package to
use more than one class at the same time.

Basic Principe: The last instance of decision is always the user - but she
shouldn't have to be in most cases.

. .. still to come:


-- /"\ http://corecode.ath.cx/#donate \ / \ ASCII Ribbon Campaign / \ Against HTML Mail and News

Attachment: PGP.sig
Description: This is a digitally signed message part

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]