DragonFly kernel List (threaded) for 2005-02
Re: approach on getting nullfs to work again
:[original private post to matt, but something in my mail system must eat=20
:mails, so i'm hoping that this one will get through]
Nope, I'm just overloaded.
:What happens if a file or directory in the underlying filesystem is being=20
:renamed or deleted? Doesn't that mean that I need to adjust the namecache f=
:the nullfs layer, too?
:We thought of a solution: overlay filesystems must lock their covered (i'll=
:call it "shadow") parallel namecache entries, too, if they are being locked=
:Whereas this is not complicated to implement in cache_lock(), there is=20
:another problem: the namecache doesn't know about overlay filesystems. if=20
:doesn't know that there exist shadow namecache entries. so there must be so=
:way of communication between namecache and vfs, maybe some=20
:now this got a rather long mail, thanks for your attention
:hoping for input,
Ok. We have two problems. The second is solved as you say... the
overlay filesystem itself is aware of the underlying filesystem and
must lock the underlying namecache record. That is fairly straight
The rename-in-underlying-filesystem problem is a cache-coherency issue,
solved by our (not yet existant) cache coherency layer! :-)
So the question begins: Can we construct a minimal cache coherency
layer that can be used to help build nullfs and unionfs but that will
not have to ripped out when we do the 'real' layer ? I think the answer
is: yes, we can. We can create a minimal cache coherency layer
based on the vnode's v_namecache list.
Then it becomes a question of how complex a layer should we try to create?
Taking for example a rename() in the underlying filesystem... do we
want to try to propogate the rename to the overlay or do we simply want
to invalidate the overlay? I think to begin with we just want to
invalidate the overlay.
When I designed the new namecache topology I considered the possibility
of having to deal with multiple overlayed filesystems and made the
vnode's v_namecache a list of namecache records instead of a pointer to
a single record. The idea being that instead of having nullfs fake-up
vnodes (like it does in FreeBSD) we instead have it return the *actual*
vnode and only fake-up the namecache topology. The system has no problem
with multiple namecache records referencing the same vnode. This
greatly reduces the burden on nullfs to translate VOP calls... it only
has to deal with namecache related translations, it does NOT have to
deal with things like VOP_READ(). The notion of the 'current'
directory is now a namecache record in DragonFly, so we can get away
with this without confusing someone CD'd into a nullfs filesystem.
(In FreeBSD the 'current directory' is a vnode and hence nullfs and
unionfs had to fake-up the vnode. In DragonFly it is a namecache
pointer and we do NOT have to fake-up the vnode).
Ok, so once that is dealt with we need to make sure that the cache
invalidation mechanism, our skeleton cache coherency layer, does
not deadlock when it takes a locked namecache record and has to
invalidate a namecache topology elsewhere. This case only occurs
when a filesystem operation on the UNDERLYING filesystem occurs, because
the underlying filesystem is not aware of the overlay. In the case
of the nullfs overlay the nullfs code is aware of the underlying filesystem
and will make the appropriate namecache calls to the underlying
filesystem's namecache topology.
For an operation being done directly on the underlying filesystem the
underlying filesystem is not aware of the overlay, but the namecache
code IS aware of the overlay because it sees multiple namecache
records attached to the vnode. So the namecache code must scan
the list of namecache structures associated with the vnode and issue
the appropriate cache_inval*() calls on the namecache records other
then the one it was called with.
I think this is all very doable and, even better, does not represent
any major surgery for systems not using nullfs (which is all of the
right now), so we can keep things stable during the work. I know
there are several people interested in making nullfs work again,
especially Simon. Who has time to actually code? I would be able to
help out but I'd prefer not to do the core coding.
Questions? Interest? Simon, you want to code this up ?