DragonFly users List (threaded) for 2009-02
Re: the 'why' of pseudofs
Matthew Dillon wrote:
There are several reasons for using PFSs.
* Shared allocation space. You don't have to worry about blowing
out small filesystems and having to resize them.
* Each PFS has its own inode space, allowing mirroring to be used
to manage backups on a per-PFS basis. Thus mirroring slaves can
be conveniently created and destroyed, and masters can be used
to differentiate what you do and do not want to back up. e.g.
I want to backup /home, I don't want to backup /usr/obj.
In this respect there is actually a lot more to it... PFSs are
the primary enabler for most of the future multi-master clustering
work. Even slaves are extremely inconvenient to do without PFSs
to manage independant inode spaces.
* Each PFS can have its own history/snapshot retention policy.
For example you want to retain history on /home but who cares
about /tmp or /usr/obj ? You might want to retain only a few
days worth of snapshots for /var but hundreds of days for /home.
* Each PFS can be pruned / reblocked independantly of the others.
For example /build on pkgbox is configured to spend a lot longer
pruning and reblocking then /archive.
With regards to softlinks vs null mounts, null mounts are preferred
because softlinks are not always handled properly, or handled in
the expected way, by utilities.
An example of this would be, say, /usr/src. If /usr/src is a softlink
then the /usr/obj paths generated would be the expanded softlink.
So instead of getting /usr/obj/usr/src/... you would instead get
It can get messy very quickly when the filesystem space is glued
together with softlinks instead of mounts.
OK - let's say one of the goals here will be more comprehensive
documentation - no only of what 'is now' (above) but w/r the perils,
tribulations - or possibly the advantages, even if edge-case - of doing
I'll first append Michaels' response - embedding a few words:
PFS is the smallest unit of mirroring, and [therefor also the smallest
> |the] unit to which you can
apply specific retainment policies. For example while you do not want
to retain much history for /tmp, you might want to do so for /home.
When it comes to mirroring, you clearly do not want to mirror changes to
PFS /tmp, while you want to mirror changes to PFS /home.
Good concept. Bad choice of examples .
> If everything
would lie on a single huge filesystem "/", we could not decide what to. .. my /, /usr, /var, /home, /tmp are (traditionally) on separate
partitions, if not slices.
I don't need hammer there. Logs aside, these could damn near be in ROM
for as much as two years at a go.
I need HAMMER on the 500 GB single to several TB arrays where client
applications, IMAP mailstore, web sites, and other *dynamic* data reside.
mirror and what not. That's the major design decision.
You might ask, why not simply specify which directories to mirror
and which to leave out (without considering PFS)? The issue here is,
that, AFAIK, mirroring works on a very low level, where only inode
numbers are available and not full pathnames, so something like:
tar -cvzf /tmp/backup.tgz --reject="/tmp;/var/tmp"
would not work, or would be slow.
Another issue is locality. Metadata from one PFS lies more close
together and as such is faster to iterate.
====================== Questions and counterpoints ================
Some GOSPEL we can agree on came at the end of Matt's reply.
Let's take these as givens:
> With regards to softlinks vs null mounts, null mounts are preferred
> *because softlinks are not always handled properly, or handled in
> the expected way*, by utilities.
*It can get messy very quickly when the filesystem space is glued
together with softlinks instead of mounts.*====
To which I'll remind that neither (Matt's) excellent 'cpdup', nor rsync,
nor a seasoned admin working manually can be *certain* that all
softlinks will ever and always be handled as one wished they had been
. ... mere seconds after the bullet holes appear in the feet.
They can be problem-solvers some of the time, but a potential
maintenance hand-grenade ALL the time.
Further, a 'pseudo' mount is, IMNSHO, just another 'virtual' band-aid of
a different flavor. It is NOT (yet) 'enough' more assured to be handled
correctly. Not because the fs & utils cannot be made to do the right
thing, but because the *sysadmin* and habit of long years may not.
New game. More education needed. New habits to be built, and old ones
If/as/when a utility must actually *care* about the foundation level of
inodes, AND THEN 'virtualize' a known-to-not-match set of these, we
should be aware that hazards lurk, and have them 'boxed' ahead of time.
'virtual' means the next word is a lie, and if it hides, it bites.
So we need different utilities to avoid foot-shooting.
As in the 'version-display' capability in 'ls' already in work.
More will be needed...
I *want* the benefits of a file system that is robust in new ways that
address real needs. Enter hammerfs.
I DO NOT want to add-back fragility that negates those very benefits by
moving the failure modes from device and OS and fs into the space
between an admin's ears. Exit 'too much' dependence on softlinks and
Re-balancing seems to be in order.
I'll now motor off for a day or two, do a more research,
experimentation, and preparation.
Back with specifics 'soon'.
NOTE: /, /usr/ var/, /home, /tmp are not even on the radar.
Properly reserved for system and sysadmins, these are easily cloned and
kept synced, backed-up, even swapped in/out 'en bloc' - so long as one
keeps non-admin users, webish-ness, data bases, mailstore,
disaster-recovery image storage, etc. entirely OFF them.
Those needs should be mounted from different slices as a minimum,
different devices (or arrays) *preferably*.
Think 20-200 GB system device, RAID1 - and UFS/FFS is good enough.
Elsewhere - it is on the 500 GB, 1 TB, 2 TB - and up - 'working storage'
where HAMMER may rule.
- hammer mirror-stream over 100BT or Gig-E,
- the ability to up-rank the slave to master,*quickly*,
. .. and the incremental changes to even very large storage can be kept
in sync faster, better, and with fewer resources than by any other method.
Providing one doesn't blow it away with fat-fingers....