DragonFly commits List (threaded) for 2005-08
Re: cvs commit: src/sys/kern vfs_cache.c vfs_syscalls.c vfs_vnops.c vfs_vopops.c src/sys/sys namecache.h stat.h
On Thu, Aug 25, 2005 at 03:09:21PM -0700, Matthew Dillon wrote:
> The entire directory tree does not need to be in memory, only the
> pieces that lead to (cached) vnodes. DragonFly's namecache subsystem
> is able to guarentee this.
*How* can it guaranty that without reading the whole directory tree in
memory first? Unix filesystems have no way to determine in which
directories an inode is linked from. If you have /dir1/link1 and
/dir2/dir3/link2 as hardlinks for the same inode, you can't correctly
update the FSMID for dir2 without having read dir3 first, simply because
no name cache entry exists.
> :On a running system, it is enough to either get notification when a
> :certain vnode changed (kqueue modell) or when a vnode changed (imon /
> :dnotify model). Trying to detect in-flight changes is *not* utterly
> :trivial for any model, since even accurate atime is already difficult to
> :achieve for mmaped files. Believing that you can *reliable* backup a
> :system based on VOP transactions alone is therefore a dream.
> This is not correct. It is certainly NOT enough to just be told
> when an inode changes.... you need to know where in the namespace
> the change occured and you need to know how the change(s) effect
> the namespace. Just knowing that a file with inode BLAH has been
> modified is not nearly enough information.
The point is that the application can determine in which inodes it is
interested in and reread e.g. a directory when it has changed. There are
some edge cases which might be hard to handle without additional
information (e.g. when a link is moved outside the currently supervised
area and you want to continue it's supervision. That's an entirely
different question though.
> Detecting in-flight changes is trivial. You check the FSMID before
> descending into a directory or file, and you check it after you ascend
> back out of it. If it has changed, you know that something changed
> while you were processing the directory or file and you simply re-recurse
> down and rescan just the bits that now have different FSMID's.
But it is also very limited because it doesn't allow any filtering on
what is interesting. In the worst case you just update all the FSMIDs
for nothing. It also means as long as there is no way to store them
persistenly that you can't free namecache entries without having to deal
with exactly those cases in applications. Storing them persistenly has
to deal with unrecorded changes which wouldn't be detected. Just think
about dual-booting to FreeBSD.
> For example, softupdates right now is not able to guarentee data
> consistency. If you crash while writing something out then on reboot
> you can wind up with some data blocks full of zero's, or full of old
> data, while other data blocks contain new data.
That's not so much a problem of softupdates, but of any filesystem without very
strong data journaling. ZFS is said to do something in that area, but it
can't really solve interactions which cross filesystems. The very same
problem exists for FSMIDs. This is something where a transactional database
and a normal filesystem differ: filesystems almost never have full
write-ahead log files, because it makes them awefully slow. The most
important reason is that applications have no means to specify explicit
transaction borders, so you have to assume an autocommit style usage