Up to [DragonFly] / src / sys / sys
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK. NOTE: unionfs and nullfs are temporarily broken by this commit. * Remove the old namecache API. Remove vfs_cache_lookup(), cache_lookup(), cache_enter(), namei() and lookup() are all gone. VOP_LOOKUP() and VOP_CACHEDLOOKUP() have been collapsed into a single non-caching VOP_LOOKUP(). * Complete the new VFS CACHE (namecache) API. The new API is able to supply topological guarentees and is able to reserve namespaces, including negative cache spaces (whether the target name exists or not), which the new API uses to reserve namespace for things like NRENAME and NCREATE (and others). * Complete the new namecache API. VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE, NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE. These new calls take (typicaly locked) namecache pointers rather then combinations of directory vnodes, file vnodes, and name components. The new calls are *MUCH* simpler in concept and implementation. For example, VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments. The new namecache API uses the namecache to lock namespaces without having to lock the underlying vnodes. For example, this allows the kernel to reserve the target name of a create function trivially. Namecache records are maintained BY THE KERNEL for both positive and negative hits. Generally speaking, the kernel layer is now responsible for resolving path elements. NRESOLVE is called when an unresolved namecache record needs to be resolved. Unlike the old VOP_LOOKUP, NRESOLVE is simply responsible for associating a vnode to a namecache record (positive hit) or telling the system that it's a negative hit, and not responsible for handling symlinks or other special cases or doing any of the other path lookup work, much unlike the old VOP_LOOKUP. It should be particularly noted that the new namecache topology does not allow disconnected namecache records. In rare cases where a vnode must be converted to a namecache pointer for new API operation via a file handle (i.e. NFS), the cache_fromdvp() function is provided and a new API VOP, VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the topology leading up to the requested vnode. These and other topological guarentees greatly reduce the complexity of the new namecache API. The new namei() is called nlookup(). This function uses a combination of cache_n*() calls, VOP_NRESOLVE(), and standard VOP calls resolve the supplied path, deal with symlinks, and so forth, in a nice small compact compartmentalized procedure. * The old VFS code is no longer responsible for maintaining namecache records, a function which was mostly adhoc cache_purge()s occuring before the VFS actually knows whether an operation will succeed or not. The new VFS code is typically responsible for adjusting the state of locked namecache records passed into it. For example, if NCREATE succeeds it must call cache_setvp() to associate the passed namecache record with the vnode representing the successfully created file. The new requirements are much less complex then the old requirements. * Most VFSs still implement the old API calls, albeit somewhat modified and in particular the VOP_LOOKUP function is now *MUCH* simpler. However, the kernel now uses the new API calls almost exclusively and relies on compatibility code installed in the default ops (vop_compat_*()) to convert the new calls to the old calls. * All kernel system calls and related support functions which used to do complex and confusing namei() operations now do far less complex and far less confusing nlookup() operations. * SPECOPS shortcutting has been implemented. User reads and writes now go directly to supporting functions which talk to the device via fileops rather then having to be routed through VOP_READ or VOP_WRITE, saving significant overhead. Note, however, that these only really effect /dev/null and /dev/zero. Implementing this was fairly easy, we now simply pass an optional struct file pointer to VOP_OPEN() and let spec_open() handle the override. SPECIAL NOTES: It should be noted that we must still lock a directory vnode LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because a number of VFS's (including UFS) store active directory scanning information in the directory vnode. The legacy NAMEI_LOOKUP cases can be changed to use LK_SHARED once these VFS cases are fixed. In particular, we are now organized well enough to actually be able to do record locking within a directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't been done yet. Many thanks to all of the testers and in particular David Rhodus for finding a large number of panics and other issues.
VFS messaging/interfacing work stage 6/99. Populate and maintain the namecache pointers previously attached to struct filedesc, giving the new lookup code a base from which to work. Implement the new lookup API (it is not yet being used by anything) and augment the namecache API to handle the new functions, in particular adding cache_setvp() to resolve an unresolved namecache entry into a positive or negative hit and set various flags. Note that we do not yet cache symlink data but we could very easily. The new API is greatly simplified. Basically nlookups need only returned a locked namecache pointer (guarenteeing namespace atomicy). Related vnodes are not locked. Both the leaf and governing directory vnodes can be extracted from the returned namecache pointer. namecache pointers may also represent negative hits, which means that their namespace locking feature serves to reserve a filename that has not yet been created (e.g. open+create, rename). The kernel is still using the old API as of this commit. This commit is primarily introducing the management infrastructure required to actually start writing code to use the new API. VOP_RESOLVE() has been added, along with a default function which falls back to VOP_LOOKUP()/VOP_CACHEDLOOKUP(). This VOP function is not yet being used as of this commit. This VOP will be responsible for taking an unresolved but locked namecache structure (hence the namespace is locked), and actually does the directory lookup. But unlike the far more complex VOP_LOOKUP()/VOP_CACHEDLOOKUP() API the VOP_RESOLVE() API only needs to attach a vnode (or NULL if the entry does not exist) to the passed-in namecache structure. It is likely that timeouts, e.g. for NFS, will also be attached via this API. This commit does not implement any of the cache-coherency infrastructure but keeps this future requirement in mind in its design.
VFS messaging/interfacing work stage 5b/99. More cleanups, remove the (unused) ni_ncp and ni_dncp from struct nameidata. A new structure will be used for the new API.
namecache work stage 4a: Do some minor performance cleanups with negative caching, add a cache entry timeout feature.
Per-CPU VFS Namecache Effectiveness Statistics: * Convert nchstats into a CPU indexed array * Export the per-CPU nchstats as a sysctl vfs.cache.nchstats and let user-land aggregate them. * Add a function called kvm_nch_cpuagg() to libkvm; it is shared by systat(1) and vmstat(1) and the ncache-stats test program. As the function name suggests, it aggregates the per-CPU nchstats. * Move struct nchstats into a separate header to avoid header file namespace pollution; sys/nchstats.h. * Keep a cached copy of the globaldata pointer in the VFS specific LOOKUP op, and use that to increment the namecache effectiveness counters (nchstats). * Modify systat(1) and vmstat(1) to accomodate the new behavior of accessing nchstats. Remove a (now) redundant sysctl to get the cpu count (hw.ncpu), instead we just divide the total length of the nchstats array returned by sysctl by sizeof(struct nchstats) to get the CPU count. * Garbage-collect unused variables and fix nearby warnings in systat(1) an vmstat(1). * Add a very-cool test program, that prints the nchstats per-CPU statistics to show CPU distribution. Here is the output it generates on an 2-processor SMP machine: gray# ncache-stats VFS Name Cache Effectiveness Statistics 4207370 total name lookups COUNTER CPU-1 CPU-2 TOTAL goodhits 2477657 1060677 (3538334 ) neghits 107531 47294 (154825 ) badhits 28968 7720 (36688 ) falsehits 0 0 (0 ) misses 339671 137852 (477523 ) longnames 0 0 (0 ) passes 2 13104 6813 (19917 ) 2-passes 25134 15257 (40391 ) The SMP machine used for testing this commit was proudly presented by David Rhodus <email@example.com>. Reviewed-by: Matthew Dillon <firstname.lastname@example.org>
Separate chroot() into kern_chroot(). Rename change_dir() to checkvp_chdir() and reorganize the code to avoid doing weird things to the passed vnode's lock and ref count in deep subroutines (which lead to buggy code). Fix a bug in chdir()/kern_chdir() (the namei data was not being freed in all cases), and also fix a bug in symlink() (missing zfree in error case). Submitted-by: Paul Herman <email@example.com> Additional-work-by: dillon
Start separating the ucred from NDINIT.
namecache work stage 3a: Adjust the VFS APIs to include a namecache pointer where necessary. For the moment we pass NULL for these parameters (the old 'dvp' vnode pointer's cannot be ripped out quite yet).
Cleanup: get rid of the CNP_NOFOLLOW pseudo-flag. #define 0'd flags are a really bad idea.
namecache work stage 1: namespace cleanups. Add a NAMEI_ prefix to CREATE, LOOKUP, DELETE, and RENAME. Add a CNP_ prefix too all the name lookup flags (nd_flags) e.g. ISDOTDOT->CNP_ISDOTDOT.
__P() != wanted, begin removal, in order to preserve white space this needs to be done by hand, as I accidently killed a source tree that I had gotten this far on. I'm committing this now, LINT and GENERIC both build with these changes, there are many more to come.
Fix minor buildworld issues, mainly #include file dependancies and fields that have moved from struct proc to struct thread.
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread pointers instead of process pointers as arguments, similar to what FreeBSD-5 did. Note however that ultimately both APIs are going to be message-passing which means the current thread context will not be useable for creds and descriptor access.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 22.214.171.124