DragonFly BSD
DragonFly kernel List (threaded) for 2004-08
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: VFS ROADMAP (and vfs01.patch stage 1 available for testing)


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Fri, 13 Aug 2004 09:31:53 -0700 (PDT)

:
:On Thu, Aug 12, 2004 at 06:19:40PM -0700, Matthew Dillon wrote:
:>     Then I'll start working on stage 2 which will be to wrap all the 
:>     VOP forwarding calls (the VCALL and VOCALL macros).
:> 
:>     That will give us the infrastructure necessary to implement a 
:>     messaging interface in a later stage (probably around stage 15 :-)).
:
:Do you want to keep the message API with the structure as argument or do
:you want to switch to direct argument passing and marshalling in the
:messaging layer? In the short term, that would make the calling more
:readable, but might increase the overhead on the stack.

    I think we will have to keep with the structure, just like we do with
    the system call layer.  This will allow us to embed a message and do
    other things without having to completely rewrite every single VOP call
    in the system.

:>     The really nasty stuff starts at stage 3.  Before I can implement the
:>     messaging interface I have to:
:> 
:> 	* Lock namespaces via the namecache rather then via directory 
:> 	  vnode locking (ultimately means that directories do not have
:> 	  to be exclusively locked during create/delete/rename).  Otherwise
:> 	  even the simplest, fully cached namespace operations will wind up
:> 	  needing a message.
:
:How does this play with remote and/or dynamically created filesystems?
:Does the filesystem have to keep track of the namespace entries and
:invalidate them? Moving away from an exclusive vnode lock for modifying
:operations does fit in with internal range locking, because those could
:be implemented very well e.g. in a tree-based FS.

    It shouldn't create an issue if there is sufficient information in
    the remote filesystem VFS to use the bottom up cache invalidation
    infrastructure (described down below), but even so I expect there
    may be collisions, especially with NFS.  However, there are *already*
    collisions with NFS, even with the current infrastructure, because 
    NFS is stateless.  I think all we can do there is maintain the existing
    recovery mechanisms in the form of a retry or late error.  

    The main thing the namespace locking will do is give the VFS layer 
    an assurance that no operations currently initiated by the kernel,
    regardless of lock state, will collide with each other.  What the
    VFS layer does with that assurance is going to be up to it but, e.g.
    what this means is that a filesystem won't have to exclusively lock
    a directory vnode just to prevent a file name from being reused out 
    from under some operation.  For UFS what this means is that eventually
    the directory vnode lock will be able to be changed to just a buffer
    cache (struct buf) lock.
  
:> 	  This step alone will require major changes to the arguments passed
:> 	  in just about every single VOP call because we will be switching
:> 	  from passing directory vnodes to passing namecache pointers.
:> 
:> 	* Ranged data locks will replace the vnode lock for I/O atomicy 
:> 	  guarentees (ultimately means that if program #1 is
:> 	  blocked writing to a vnode program #2 can still read cached
:> 	  data from that same vnode without blocking on program #1).
:> 	  Otherwise the messaging latency will kill I/O performance.
:
:Do you plan to move the data locking into the filesystem or should it
:still be implemented in the VFS layer? Moving it down makes IMO more sense
:because it would allow us to keep a simple locking for less important
:filesystems and would allow us to better exploit the internal data structures.
:E.g. if we have a special data structure to handle the byte ranges of a
:file anyway, we could attach the locking on that level.

    The atomicy guarentee for I/O operations will be a function of the
    kernel, meaning that it will cover *ALL* VFS's.   We will add VOP's
    for record locking but they will only be needed by those remote VFSs
    which have integrated cache management... which is, umm... maybe NFSv4
    (which we don't have), and perhaps coda (but maybe not).   i.e. the
    cubbarts are pretty bare there.

    I actually believe that the range locks will not cost anything.  The
    vast majority of cases will have only one or two I/O range locks on a
    file at any given moment (only databases really need parallel access
    to a file) so it will cost us virtually nothing to implement it in a
    kernel layer.

:> 
:> 	* vattr information will be cached in the vnode so it can be
:> 	  accessed directly without having to enter the VFS layer.
:> 
:> 	* VM objects will become mandatory for all filesystems and will
:> 	  also be made directly accessible to the kernel without having
:> 	  to enter the VFS layer (ultimately this will result in greatly
:> 	  improved read() and write() performance).
:
:How does this effect not physically backed filesystems? If I want to
:support compressed files in ext2, when do I have to decompress the actual
:data?

    The data is always decompressed in the VM object, no matter what.  Same
    with the buffer cache (which is VM object backed).  But remember
    that data just doesn't appear in a VM object... something has to load
    the data into the VM object and if you are reading from a file that
    something is the VFS.  So compressed filesystems would still work as
    expected.

:> 	* Implement a bottom-up cache invalidation abstraction for 
:> 	  namespace, attribute, and file data, so layered filesystems
:> 	  work properly.
:
:The invalidation is one problem, the locking another. The separation of
:vnode and namespace locks should solve most issues though.
:
:Let's discuss the rest later :)
:
:Joerg

    Yes, I think so to.  In many respects the namespace locking is the
    single most difficult part of the work... but it is something we 
    absolutely have to have (along with bottom-up cache invalidation and
    management) if we ever want to have an efficient filesystem caching
    interface in a cluster.

					-Matt
					Matthew Dillon 
					<dillon@xxxxxxxxxxxxx>



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]