Up to [DragonFly] / src / sys / dev / disk / vn
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Fix numerous pageout daemon -> buffer cache deadlocks in the main system. These issues usually only occur on systems with small amounts of ram but it is possible to trigger them on any system. * Get rid of the IO_NOBWILL hack. Just have the VN device use IO_DIRECT, which will clean out the buffer on completion of the write. * Add a timeout argument to vm_wait(). * Add a thread->td_flags flag called TDF_SYSTHREAD. kmalloc()'s made from designated threads are allowed to dip into the system reserve when allocating pages. Only the pageout daemon and buf_daemon[_hw] use the flag. * Add a new static procedure, recoverbufpages(), which explicitly tries to free buffers and their backing pages on the clean queue. * Add a new static procedure, bio_page_alloc(), to do all the nasty work of allocating a page on behalf of a buffer cache buffer. This function will call vm_page_alloc() with VM_ALLOC_SYSTEM to allow it to dip into the system reserve. If the allocation fails this function will call recoverbufpages() to try to recycle from VM pages from clean buffer cache buffers, and will then attempt to reallocate using VM_ALLOC_SYSTEM | VM_ALLOC_INTERRUPT to allow it to dip into the interrupt reserve as well. Warnings will blare on the console. If the effort still fails we sleep for 1/20 of a second and retry. The idea though is for all the effort above to not result in a failure at the end. Reported-by: Gergo Szakal <firstname.lastname@example.org>
Fix some IO sequencing performance issues and reformulate the strategy we use to deal with potential buffer cache deadlocks. Generally speaking try to remove roadblocks in the vn_strategy() path. * Remove buf->b_tid (HAMMER no longer needs it) * Replace IO_NOWDRAIN with IO_NOBWILL, requesting that bwillwrite() not be called. Used by VN to try to avoid deadlocking. Remove B_NOWDRAIN. * No longer block in bwrite() or getblk() when we have a lot of dirty buffers. getblk() in particular needs to be callable by filesystems to drain dirty buffers and we don't want to deadlock. * Improve bwillwrite() by having it wake up the buffer flusher at 1/2 the dirty buffer limit but not block, and then block if the limit is reached. This should smooth out flushes during heavy filesystem activity.
Add '-l' support to vnconfig(8) and supporting VNGET ioctl to vn(4). Inspired-by: OpenBSD (with updates to vnconfig -l UI & swap support)
Fix device recognition, /dev/vn0 now uses WHOLE_SLICE_PART, not partition 'c'.
Remove #include <sys/disklabel.h> from various source files which no longer need it.
Remove the roll-your-own disklabel from CCD. Use the kernel disk manager for disklabel support instead. Make CCD a real disk device rather then a fake one. NOTE: All /dev/ccd* devices have changed and must be remade Introduce DSO_COMPATMBR. This forces an MBR sector to be reserved in front of a disklabel even when the target disk does not have slices. It is used by the CCD and VN devices to keep the disklabel aligned the same way it has been historically. Implement 64 bit block addressing for CCD. Implement a new filesystem type "ccd", and require that the devices backing the CCD use that filesystem type for safety. Fix a bug in DIOCGPART where the partinfo->media_blocks was not being set properly for partitions.
* The diskslice abstraction now stores offsets/sizes as 64 bit quantities. (NOTE: DOS partition tables and standard disklabels can't handle 64 bit sector numbers yet). For future pluggable disklabel/partitioning schemes. * The kernel panic / kernel core API is now 64 bits. * The VN device now uses 64 bit sector numbers and can handle block devices up to what is supported by the filesystem (typically 8TB). This change was made primarily so we can test future disklabel / partition table support. * Pass 64 bit LBAs to various block devices and to the SCSI layer. * Check for and assert 32 bit overflow conditions in various places, instead of wrapping.
Continue untangling the disklabel. Use the generic disk_info structure to hold template information instead of the disklabel structure. This removes all references to the disklabel structure from the MBR code and leaves mostly opaque references in the slice code.
MFC rev 1.30: pbn * vn->sc_secsize may wraparound, because both pbn and vn->sc_secsize are of type int. Cast to off_t before multiplication.
pbn * vn->sc_secsize may wraparound, because both pbn and vn->sc_secsize are of type int. Cast to off_t before multiplication.
Rename printf -> kprintf in sys/ and add some defines where necessary (files which are used in userland, too).
Change the kernel dev_t, representing a pointer to a specinfo structure, to cdev_t. Change struct specinfo to struct cdev. The name 'cdev' was taken from FreeBSD. Remove the dev_t shim for the kernel. This commit generally removes the overloading of 'dev_t' between userland and the kernel. Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being properly converted to a userland dev_t.
Rename malloc->kmalloc, free->kfree, and realloc->krealloc. Pass 1
VNode sequencing and locking - part 3/4. VNode aliasing is handled by the namecache (aka nullfs), so there is no longer a need to have VOP_LOCK, VOP_UNLOCK, or VOP_ISSLOCKED as 'VOP' functions. Both NFS and DEADFS have been using standard locking functions for some time and are no longer special cases. Replace all uses with native calls to vn_lock, vn_unlock, and vn_islocked. We can't have these as VOP functions anyhow because of the introduction of the new SYSLINK transport layer, since vnode locks are primarily used to protect the local vnode structure itself.
MASSIVE reorganization of the device operations vector. Change cdevsw to dev_ops. dev_ops is a syslink-compatible operations vector structure similar to the vop_ops structure used by vnodes. Remove a huge number of instances where a thread pointer is still being passed as an argument to various device ops and other related routines. The device OPEN and IOCTL calls now take a ucred instead of a thread pointer, and the CLOSE call no longer takes a thread pointer.
The thread/proc pointer argument in the VFS subsystem originally existed for... well, I'm not sure *WHY* it originally existed when most of the time the pointer couldn't be anything other then curthread or curproc or the code wouldn't work. This is particularly true of lockmgr locks. Remove the pointer argument from all VOP_*() functions, all fileops functions, and most ioctl functions.
Simplify vn_lock(), VOP_LOCK(), and VOP_UNLOCK() by removing the thread_t argument. These calls now always use the current thread as the lockholder. Passing a thread_t to these functions has always been questionable at best.
Remove unused label.
Block devices generally truncate the size of I/O requests which go past EOF. This is exactly what we want when manually reading or writing a block device such as /dev/ad0s1a, but is not desired when a VFS issues I/O ops on filesystem buffers. In such cases, any EOF condition must be considered an error. Implement a new filesystem buffer flag B_BNOCLIP, which getblk() and friends automatically set. If set, block devices are guarenteed to return an error if the I/O request is at EOF or would otherwise have to be clipped to EOF. Block devices further guarentee that b_bcount will not be modified when this flag is set. Adjust all block device EOF checks to use the new flag, and clean up the code while I'm there. Also, set b_resid in a couple of degenerate cases where it was not being set.
- Clarify the definitions of b_bufsize, b_bcount, and b_resid. - Remove unnecessary assignments based on the clarified fields. - Add additional checks for premature EOF. b_bufsize is only used by buffer management entities such as getblk() and other vnode-backed buffer handling procedures. b_bufsize is not required for calls to vn_strategy() or dev_dstrategy(). A number of other subsystems use it to track the original request size. b_bcount is the I/O request size, but b_bcount() is allowed to be truncated by the device chain if the request encompasses EOF (such as on a raw disk device). A caller which needs to record the original buffer size verses the EOF-truncated buffer can compare b_bcount after the I/O against a recorded copy of the original request size. This copy can be recorded in b_bufsize for unmanaged buffers (malloced or getpbuf()'d buffers). b_resid is always relative to b_bcount, not b_bufsize. A successful read that is truncated to the device EOF will thus have a b_resid of 0 and a truncated b_bcount.
Replace the the buffer cache's B_READ, B_WRITE, B_FORMAT, and B_FREEBUF b_flags with a separate b_cmd field. Use b_cmd to test for I/O completion as well (getting rid of B_DONE in the process). This further simplifies the setup required to issue a buffer cache I/O. Remove a redundant header file, bus/isa/i386/isa_dma.h and merge any discrepancies into bus/isa/isavar.h. Give ISADMA_READ/WRITE/RAW their own independant flag definitions instead of trying to overload them on top of B_READ, B_WRITE, and B_RAW. Add a routine isa_dmabp() which takes a struct buf pointer and returns the ISA dma flags associated with the operation. Remove the 'clear_modify' argument to vfs_busy_pages(). Instead, vfs_busy_pages() asserts that the buffer's b_cmd is valid and then uses it to determine the action it must take.
Change *_pager_allocate() to take off_t instead of vm_ooffset_t. The actual underlying type (a 64 bit signed integer) is the same. Recent and upcoming work is standardizing on off_t. Move object->un_pager.vnp.vnp_size to vnode->v_filesize. As before, the field is still only valid when a VM object is associated with the vnode.
Major BUF/BIO work commit. Make I/O BIO-centric and specify the disk or file location with a 64 bit offset instead of a 32 bit block number. * All I/O is now BIO-centric instead of BUF-centric. * File/Disk addresses universally use a 64 bit bio_offset now. bio_blkno no longer exists. * Stackable BIO's hold disk offset translations. Translations are no longer overloaded onto a single structure (BUF or BIO). * bio_offset == NOOFFSET is now universally used to indicate that a translation has not been made. The old (blkno == lblkno) junk has all been removed. * There is no longer a distinction between logical I/O and physical I/O. * All driver BUFQs have been converted to BIOQs. * BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*, and findblk all now take and/or return 64 bit byte offsets instead of block numbers. Note that BMAP now returns a byte range for the before and after variables.
Make the entire BUF/BIO system BIO-centric instead of BUF-centric. Vnode and device strategy routines now take a BIO and must pass that BIO to biodone(). All code which previously managed a BUF undergoing I/O now manages a BIO. The new BIO-centric algorithms allow BIOs to be stacked, where each layer represents a block translation, completion callback, or caller or device private data. This information is no longer overloaded within the BUF. Translation layer linkages remain intact as a 'cache' after I/O has completed. The VOP and DEV strategy routines no longer make assumptions as to which translated block number applies to them. The use the block number in the BIO specifically passed to them. Change the 'untranslated' constant to NOOFFSET (for bio_offset), and (daddr_t)-1 (for bio_blkno). Rip out all code that previously set the translated block number to the untranslated block number to indicate that the translation had not been made. Rip out all the cluster linkage fields for clustered VFS and clustered paging operations. Clustering now occurs in a private BIO layer using private fields within the BIO. Reformulate the vn_strategy() and dev_dstrategy() abstraction(s). These routines no longer assume that bp->b_vp == the vp of the VOP operation, and the dev_t is no longer stored in the struct buf. Instead, only the vp passed to vn_strategy() (and related *_strategy() routines for VFS ops), and the dev_t passed to dev_dstrateg() (and related *_strategy() routines for device ops) is used by the VFS or DEV code. This will allow an arbitrary number of translation layers in the future. Create an independant per-BIO tracking entity, struct bio_track, which is used to determine when I/O is in-progress on the associated device or vnode. NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold the fields describing the data buffer, resid, and error state. Major-testing-by: Stefan Krueger
* Ansify function definitions. * Minor style cleanup. Submitted-by: Alexey Slynko <email@example.com>
Document the dscheck(9) function and explain how it affects the slice- and driver-relative block numbers, bio->bio_blkno and bio->bio_pblkno respectively. The latter is modified by the said function after doing various checks on available information, such as disklabels, etc, while the former is provided by the VFS. Explain how said function re-transforms the block number fields when LABEL support is enabled in the VN driver via vnconfig(8).
VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK. NOTE: unionfs and nullfs are temporarily broken by this commit. * Remove the old namecache API. Remove vfs_cache_lookup(), cache_lookup(), cache_enter(), namei() and lookup() are all gone. VOP_LOOKUP() and VOP_CACHEDLOOKUP() have been collapsed into a single non-caching VOP_LOOKUP(). * Complete the new VFS CACHE (namecache) API. The new API is able to supply topological guarentees and is able to reserve namespaces, including negative cache spaces (whether the target name exists or not), which the new API uses to reserve namespace for things like NRENAME and NCREATE (and others). * Complete the new namecache API. VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE, NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE. These new calls take (typicaly locked) namecache pointers rather then combinations of directory vnodes, file vnodes, and name components. The new calls are *MUCH* simpler in concept and implementation. For example, VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments. The new namecache API uses the namecache to lock namespaces without having to lock the underlying vnodes. For example, this allows the kernel to reserve the target name of a create function trivially. Namecache records are maintained BY THE KERNEL for both positive and negative hits. Generally speaking, the kernel layer is now responsible for resolving path elements. NRESOLVE is called when an unresolved namecache record needs to be resolved. Unlike the old VOP_LOOKUP, NRESOLVE is simply responsible for associating a vnode to a namecache record (positive hit) or telling the system that it's a negative hit, and not responsible for handling symlinks or other special cases or doing any of the other path lookup work, much unlike the old VOP_LOOKUP. It should be particularly noted that the new namecache topology does not allow disconnected namecache records. In rare cases where a vnode must be converted to a namecache pointer for new API operation via a file handle (i.e. NFS), the cache_fromdvp() function is provided and a new API VOP, VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the topology leading up to the requested vnode. These and other topological guarentees greatly reduce the complexity of the new namecache API. The new namei() is called nlookup(). This function uses a combination of cache_n*() calls, VOP_NRESOLVE(), and standard VOP calls resolve the supplied path, deal with symlinks, and so forth, in a nice small compact compartmentalized procedure. * The old VFS code is no longer responsible for maintaining namecache records, a function which was mostly adhoc cache_purge()s occuring before the VFS actually knows whether an operation will succeed or not. The new VFS code is typically responsible for adjusting the state of locked namecache records passed into it. For example, if NCREATE succeeds it must call cache_setvp() to associate the passed namecache record with the vnode representing the successfully created file. The new requirements are much less complex then the old requirements. * Most VFSs still implement the old API calls, albeit somewhat modified and in particular the VOP_LOOKUP function is now *MUCH* simpler. However, the kernel now uses the new API calls almost exclusively and relies on compatibility code installed in the default ops (vop_compat_*()) to convert the new calls to the old calls. * All kernel system calls and related support functions which used to do complex and confusing namei() operations now do far less complex and far less confusing nlookup() operations. * SPECOPS shortcutting has been implemented. User reads and writes now go directly to supporting functions which talk to the device via fileops rather then having to be routed through VOP_READ or VOP_WRITE, saving significant overhead. Note, however, that these only really effect /dev/null and /dev/zero. Implementing this was fairly easy, we now simply pass an optional struct file pointer to VOP_OPEN() and let spec_open() handle the override. SPECIAL NOTES: It should be noted that we must still lock a directory vnode LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because a number of VFS's (including UFS) store active directory scanning information in the directory vnode. The legacy NAMEI_LOOKUP cases can be changed to use LK_SHARED once these VFS cases are fixed. In particular, we are now organized well enough to actually be able to do record locking within a directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't been done yet. Many thanks to all of the testers and in particular David Rhodus for finding a large number of panics and other issues.
VFS messaging/interfacing work stage 8/99: Major reworking of the vnode interlock and other miscellanious things. This patch also fixes FS corruption due to prior vfs work in head. In particular, prior to this patch the namecache locking could introduce blocking conditions that confuse the old vnode deactivation and reclamation code paths. With this patch there appear to be no serious problems even after two days of continuous testing. * VX lock all VOP_CLOSE operations. * Fix two NFS issues. There was an incorrect assertion (found by David Rhodus), and the nfs_rename() code was not properly purging the target file from the cache, resulting in Stale file handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj. * Fix a TTY session issue. Programs which open("/dev/tty" ,...) and then run the TIOCNOTTY ioctl were causing the system to lose track of the open count, preventing the tty from properly detaching. This is actually a very old BSD bug, but it came out of the woodwork in DragonFly because I am now attempting to track device opens explicitly. * Gets rid of the vnode interlock. The lockmgr interlock remains. * Introduced VX locks, which are mandatory vp->v_lock based locks. * Rewrites the locking semantics for deactivation and reclamation. (A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE, and VOP_RECLAIM). New guarentees emplaced with regard to vnode ripouts. * Recodes the mountlist scanning routines to close timing races. * Recodes getnewvnode to close timing races (it now returns a VX locked and refd vnode rather then a refd but unlocked vnode). * Recodes VOP_REVOKE- a locked vnode is now mandatory. * Recodes all VFS inode hash routines to close timing holes. * Removes cache_leaf_test() - vnodes representing intermediate directories are now held so the leaf test should no longer be necessary. * Splits the over-large vfs_subr.c into three additional source files, broken down by major function (locking, mount related, filesystem syncer). * Changes splvm() protection to a critical-section in a number of places (bleedover from another patch set which is also about to be committed). Known issues not yet resolved: * Possible vnode/namecache deadlocks. * While most filesystems now use vp->v_lock, I haven't done a final pass to make vp->v_lock mandatory and to clean up the few remaining inode based locks (nwfs I think and other obscure filesystems). * NullFS gets confused when you hit a mount point in the underlying filesystem. * Only UFS and NFS have been well tested * NFS is not properly timing out namecache entries, causing changes made on the server to not be properly detected on the client if the client already has a negative-cache hit for the filename in question. Testing-by: David Rhodus <firstname.lastname@example.org>, Peter Kadau <email@example.com>, walt <firstname.lastname@example.org>, others
Cleanup cdevsw and dev reference counting ops to fix a warning that occurs when a system is halted or rebooted.
Device layer rollup commit. * cdevsw_add() is now required. cdevsw_add() and cdevsw_remove() may specify a mask/match indicating the range of supported minor numbers. Multiple cdevsw_add()'s using the same major number, but distinctly different ranges, may be issued. All devices that failed to call cdevsw_add() before now do. * cdevsw_remove() now automatically marks all devices within its supported range as being destroyed. * vnode->v_rdev is no longer resolved when the vnode is created. Instead, only v_udev (a newly added field) is resolved. v_rdev is resolved when the vnode is opened and cleared on the last close. * A great deal of code was making rather dubious assumptions with regards to the validity of devices associated with vnodes, primarily due to the persistence of a device structure due to being indexed by (major, minor) instead of by (cdevsw, major, minor). In particular, if you run a program which connects to a USB device and then you pull the USB device and plug it back in, the vnode subsystem will continue to believe that the device is open when, in fact, it isn't (because it was destroyed and recreated). In particular, note that all the VFS mount procedures now check devices via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev is NULL prior to the first open. * The disk layer's device interaction has been rewritten. The disk layer (i.e. the slice and disklabel management layer) no longer overloads its data onto the device structure representing the underlying physical disk. Instead, the disk layer uses the new cdevsw_add() functionality to register its own cdevsw using the underlying device's major number, and simply does NOT register the underlying device's cdevsw. No confusion is created because the device hash is now based on (cdevsw,major,minor) rather then (major,minor). NOTE: This also means that underlying raw disk devices may use the entire device minor number instead of having to reserve the bits used by the disk layer, and also means that can we (theoretically) stack a fully disklabel-supported 'disk' on top of any block device. * The new reference counting scheme prevents this by associating a device with a cdevsw and disconnecting the device from its cdevsw when the cdevsw is removed. Additionally, all udev2dev() lookups run through the cdevsw mask/match and only successfully find devices still associated with an active cdevsw. * Major work on MFS: MFS no longer shortcuts vnode and device creation. It now creates a real vnode and a real device and implements real open and close VOPs. Additionally, due to the disk layer changes, MFS is no longer limited to 255 mounts. The new limit is 16 million. Since MFS creates a real device node, mount_mfs will now create a real /dev/mfs<PID> device that can be read from userland (e.g. so you can dump an MFS filesystem). * BUF AND DEVICE STRATEGY changes. The struct buf contains a b_dev field. In order to properly handle stacked devices we now require that the b_dev field be initialized before the device strategy routine is called. This required some additional work in various VFS implementations. To enforce this requirement, biodone() now sets b_dev to NODEV. The new disk layer will adjust b_dev before forwarding a request to the actual physical device. * A bug in the ISO CD boot sequence which resulted in a panic has been fixed. Testing by: lots of people, but David Rhodus found the most aggregious bugs.
device switch 1/many: Remove d_autoq, add d_clone (where d_autoq was). d_autoq was used to allow the device port dispatch to mix old-style synchronous calls with new style messaging calls within a particular device. It was never used for that purpose. d_clone will be more fully implemented as work continues. We are going to install d_port in the dev_t (struct specinfo) structure itself and d_clone will be needed to allow devices to 'revector' the port on a minor-number by minor-number basis, in particular allowing minor numbers to be directly dispatched to distinct threads. This is something we will be needing later on.
Newtoken commit. Change the token implementation as follows: (1) Obtaining a token no longer enters a critical section. (2) tokens can be held through schedular switches and blocking conditions and are effectively released and reacquired on resume. Thus tokens serialize access only while the thread is actually running. Serialization is not broken by preemptive interrupts. That is, interrupt threads which preempt do no release the preempted thread's tokens. (3) Unlike spl's, tokens will interlock w/ interrupt threads on the same or on a different cpu. The vnode interlock code has been rewritten and the API has changed. The mountlist vnode scanning code has been consolidated and all known races have been fixed. The vnode interlock is now a pool token. The code that frees unreferenced vnodes whos last VM page has been freed has been moved out of the low level vm_page_free() code and moved to the periodic filesystem sycer code in vfs_msycn(). The SMP startup code and the IPI code has been cleaned up considerably. Certain early token interactions on AP cpus have been moved to the BSP. The LWKT rwlock API has been cleaned up and turned on. Major testing by: David Rhodus
namecache work stage 1: namespace cleanups. Add a NAMEI_ prefix to CREATE, LOOKUP, DELETE, and RENAME. Add a CNP_ prefix too all the name lookup flags (nd_flags) e.g. ISDOTDOT->CNP_ISDOTDOT.
DEV messaging stage 1/4: Rearrange struct cdevsw and add a message port and auto-queueing mask. The mask will tell us which message functions can be safely queued to another thread and which still need to run in the context of the caller. Primary configuration fields (name, cmaj, flags, port, autoq mask) are now at the head of the structure. Function vectors, which may eventually go away, are at the end. The port and autoq fields are non-functional in this stage. The old BDEV device major number support has also been removed from cdevsw, and code has been added to translate the bootdev passed from the boot code (the boot code has always passed the now defunct block device major numbers and we obviously need to keep that compatibility intact).
proc->thread stage 5: BUF/VFS clearance! Remove the ucred argument from vop_close, vop_getattr, vop_fsync, and vop_createvobject. These VOPs can be called from multiple contexts so the cred is fairly useless, and UFS ignorse it anyway. For filesystems (like NFS) that sometimes need a cred we use proc0.p_ucred for now. This removal also removed the need for a 'proc' reference in the related VFS procedures, which greatly helps our proc->thread conversion. bp->b_wcred and bp->b_rcred have also been removed, and for the same reason. It makes no sense to have a particular cred when multiple users can access a file. This may create issues with certain types of NFS mounts but if it does we will solve them in a way that doesn't pollute the struct buf.
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread pointers instead of process pointers as arguments, similar to what FreeBSD-5 did. Note however that ultimately both APIs are going to be message-passing which means the current thread context will not be useable for creds and descriptor access.
proc->thread stage 2: MAJOR revamping of system calls, ucred, jail API, and some work on the low level device interface (proc arg -> thread arg). As -current did, I have removed p_cred and incorporated its functions into p_ucred. p_prison has also been moved into p_ucred and adjusted accordingly. The jail interface tests now uses ucreds rather then processes. The syscall(p,uap) interface has been changed to just (uap). This is inclusive of the emulation code. It makes little sense to pass a proc pointer around which confuses the MP readability of the code, because most system call code will only work with the current process anyway. Note that eventually *ALL* syscall emulation code will be moved to a kernel-protected userland layer because it really makes no sense whatsoever to implement these emulations in the kernel. suser() now takes no arguments and only operates with the current process. The process argument has been removed from suser_xxx() so it now just takes a ucred and flags. The sysctl interface was adjusted somewhat.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 18.104.22.168