Up to [DragonFly] / src / sys / sys
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
MFC numerous features from HEAD. * Bounce buffer fixes for physio. * Disk flush support in scsi and nata subsystems. * Dead bio handling
Add BUF_CMD_FLUSH support - issue flush command to mass storage device.
Implement a bounce buffer for physio if the buffer passed from userland is not at least 16-byte aligned. Reported-by: "Steve O'Hara-Smith" <firstname.lastname@example.org>, and others
Code documentation only: Describe B_NOCACHE
Kernel support for HAMMER: * Add another type to the bio->bio_caller_info1 union * Add two new flags to getblk(), used by the cluster code. GETBLK_SZMATCH - Tell getblk() to fail and return NULL if a pre-existing buffer's size does not match the requested size (this prevents getblk() from doing a potentially undesired bwrite() sequence). GETBLK_NOWAIT - Tell getblk() to use a non-blocking lock. * pop_bio() now returns the previous BIO (or NULL if there is no previous BIO). This allows HAMMER to chain bio_done()'s * Fix a bug in cluster_read(). The cluster code's read-ahead at the end could go past the caller-specified limit and force a block to the wrong block size.
Fix hopefully all possible deadlocks that can occur when mixed block sizes are used with the buffer cache. The fix is simply to base the limiting and flushing code on a byte count rather then a buffer count. This will allow UFS to utilizes a greater number of dirty buffers and will cause HAMMER to use fewer. This also makes tuning the buffer cache a whole lot easier.
Replace the bwillwrite() subsystem to make it more fair to processes. * Add new API functions, bwillread(), bwillwrite(), bwillinode() which the kernel calls when it intends to read, write, or make inode modifications. * Redo the backend. Add bd_heatup() and bd_wait(). bd_heatup() heats up the buf_daemon, starting it flushing before we hit any blocking conditions (similar to the previous algorith). * The new bwill*() blocking functions no longer introduce escalating delays to keep the number of dirty buffers under control. Instead it takes a page from HAMMER and estimates the load caused by the caller, then waits for a specific number of dirty buffers to complete their write I/O's before returning. If the buffers can be retired quickly these functions will return more quickly.
Miscellanious performance adjustments to the kernel * Add an argument to VOP_BMAP so VFSs can discern the type of operation the BMAP is being done for. * Normalize the variable name denoting the blocksize to 'blksize' in vfs_cluster.c. * Fix a bug in the cluster code where a stale bp->b_error could wind up getting returned when B_ERROR is not set. * Do not B_AGE cluster bufs. * Pass the block size to both cluster_read() and cluster_write() instead of those routines getting the block size from vp->v_mount->mnt_stat.f_iosize. This allows different areas of a file to use a different block size. * Properly initialize bp->b_bio2.bio_offset to doffset in cluster_read(). This fixes an issue where VFSs were making an extra, unnecessary call to BMAP. * Do not recycle vnodes on the free list until numvnodes has reached desiredvnodes. Vnodes were being recycled when their resident page count had dropped to zero, but this is actually too early as the VFS may cache important information in the vnode that would otherwise require a number of I/O's to re-acquire. This mainly helps HAMMER (whos inode lookups are fairly expensive). * Do not VAGE vnodes. * Remove the minvnodes test. There is no reason not to load the vnode cache all the way through to its max. * buf_cmd_t visibility for the new BMAP argument.
Reimplement B_AGE. Have it cycle the buffer in the queue twice instead of placing buffers at the head of the queue (which causes them to be run-down backwards). Leave B_AGE set through the write cycle and have the bufdaemon set the flag when flushing dirty buffers. B_AGE no longer effects the ordering of the actual write and is allowed to slide through to the clean queue when the write completes.
Add an extern for hidirtybuffers. Reported-by: Michael Neumann <email@example.com>
Keep track of the number of buffers undgoing IO, and include that number in calculations involving numdirtybuffers. This prevents the kernel from believing that there are only a few dirty buffers when, in fact, all the dirty buffers are running IOs.
Fix some IO sequencing performance issues and reformulate the strategy we use to deal with potential buffer cache deadlocks. Generally speaking try to remove roadblocks in the vn_strategy() path. * Remove buf->b_tid (HAMMER no longer needs it) * Replace IO_NOWDRAIN with IO_NOBWILL, requesting that bwillwrite() not be called. Used by VN to try to avoid deadlocking. Remove B_NOWDRAIN. * No longer block in bwrite() or getblk() when we have a lot of dirty buffers. getblk() in particular needs to be callable by filesystems to drain dirty buffers and we don't want to deadlock. * Improve bwillwrite() by having it wake up the buffer flusher at 1/2 the dirty buffer limit but not block, and then block if the limit is reached. This should smooth out flushes during heavy filesystem activity.
HAMMER 25/many: Pruning code * Add b_tid to struct buf so dirty buffer cache buffers can be tagged with a transaction id to try to retain consistency when doing as-of queries on files that change size (so the data records have a TID <= the inode record). This is also an issue when a file is created and immediately written to. This may be temporary, a more sophisticated solution is needed. * Fix a bug in the special handling of create_tid for as-of queries in btree_search(). An assignment was off by one, causing historical queries to not be able to find bits of data here and there. * Freeze the transaction id for newly created inodes until the initial inode record is laid down on disk, so the transaction id matches the transaction id of the related directory entry. * Major work on the pruning code. When pruning the tree to a particular granularity the create_tid and delete_tid of related records must be aligned to that granularity in order to avoid creating 'holes' at various time points. This requires some serious B-Tree manipulation because the right-hand boundary may need to be shifted when the create_tid of an existing record is forward aligned. This work is still in progress but it works in basic testing. Prune the tree in the reverse direction instead of in the forward direction. This keeps the B-Tree consistent when we have to adjust the right-hand boundary to accomodate the realignment of create_tid.
Fix buffer cache deadlocks by splitting dirty buffers into two categories: Light weight dirty buffers and heavy weight dirty buffers. Add a second buffer cache flushing daemon to deal with the heavy weight dirty buffers. Currently only HAMMER uses the new feature, but it can also easily be used by UFS in the future. Buffer cache deadlocks can occur in low memory situations where the buffer cache tries to flush out dirty buffers and deadlocks when the act of flushing a dirty buffer requires additional buffers to be acquired. Because there was only one buffer flushing daemon, a deadlock on a heavy weight buffer prevented any further buffer flushes, whether light or heavy weight, and wound up deadlocking the entire system. Giving the heavy weight buffers their own daemon solves the problem by allowing light weight buffers to continue to be flushed even if a stall occurs on a heavy weight buffer. The numbers of dirty heavy weight buffers is limited to ensure that enough light weight buffers are available. This is primarily implemented by changing getblk()'s mostly unused slpflag parameter to a new blkflags parameter and adding a new buffer cache queue called BQUEUE_DIRTY_HW.
Add bio_ops->io_checkread and io_checkwrite - a read and write pre-check which gives HAMMER a chance to set B_LOCKED if the kernel wants to write out a passively held buffer. Change B_LOCKED semantics slightly. B_LOCKED buffers will not be written until B_LOCKED is cleared. This allows HAMMER to hold off B_DELWRI writes on passively held buffers.
Add regetblk() - reacquire a buffer lock. The buffer must be B_LOCKED or must be interlocked with bio_ops. Used by HAMMER. Further changes to B_LOCKED buffers. A B_LOCKED|B_DELWRI buffer will be placed on the dirty queue and then returned to the locked queue once the I/O completes. That is, B_LOCKED does not interfere with B_DELWRI operation.
Convert the global 'bioops' into per-mount bio_ops. For now we also have to have a per buffer b_ops as well since the controlling filesystem cannot be located from information in struct buf (b_vp could be the backing store so that can't be used). This change allows HAMMER to use bio_ops. Change the ordering of the bio_ops.io_deallocate call so it occurs before the buffer's B_LOCKED is checked. This allows the deallocate call to set B_LOCKED to retain the buffer in situations where the target filesystem is unable to immediately disassociate the buffer. Also keep VMIO intact for B_LOCKED buffers (in addition to B_DELWRI buffers). HAMMER will use this feature to keep buffers passively associated with other filesystem structures and thus be able to avoid constantly brelse()ing and getblk()ing them.
MASSIVE reorganization of the device operations vector. Change cdevsw to dev_ops. dev_ops is a syslink-compatible operations vector structure similar to the vop_ops structure used by vnodes. Remove a huge number of instances where a thread pointer is still being passed as an argument to various device ops and other related routines. The device OPEN and IOCTL calls now take a ucred instead of a thread pointer, and the CLOSE call no longer takes a thread pointer.
Add some diagnostic messages to try to catch a ufs_dirbad panic before it happens. MFC: Reorder BUF_UNLOCK() - it must occur after b_flags is modified, not before. A newly created non-VMIO buffer is now marked B_INVAL. Callers of getblk() now always clear B_INVAL before issuing a READ I/O or when clearing or overwriting the buffer. Before this change, a getblk() (getnewbuf), brelse(), getblk() sequence on a non-VMIO buffer would result in a buffer with B_CACHE set yet containing uninitialized data. MFC: B_NOCACHE cannot be set on a clean VMIO-backed buffer as this will destroy the VM backing store, which might be dirty. MFC: Reorder vnode_pager_setsize() calls to close a race condition.
Fix several buffer cache issues related to B_NOCACHE. * Do not set B_NOCACHE when calling vinvalbuf(... V_SAVE). This will destroy dirty VM backing store associated with clean buffers before the VM system has a chance to check for and flush them. Taken-from: FreeBSD * Properly set B_NOCACHE when destroying buffers related to truncated data. * Fix a bug in vnode_pager_setsize() that was recently introduced. v_filesize was being set before a new/old size comparison, causing a file truncation to not destroy related VM pages past the new EOF. * Remove a bogus B_NOCACHE|B_DIRTY test in brelse(). This was originally intended to be a B_NOCACHE|B_DELWRITE test which then cleared B_NOCACHE, but now that B_NOCACHE operation has been fixed it really does indicate that the buffer, its contents, and its backing store are to be destroyed, even if the buffer is marked B_DELWRI. Instead of clearing B_NOCACHE when B_DELWRITE is found to be set, clear B_DELWRITE when B_NOCACHE is found to be set. Note that B_NOCACHE is still cleared when bdirty() is called in order to ensure that data is not lost when softupdates and other code do a 'B_NOCACHE + bwrite' sequence. Softupdates can redirty a buffer in its io completion hook and a write error can also redirty a buffer. * The VMIO buffer rundown seems to have mophed into a state where the distinction between NFS and non-NFS buffers can be removed. Remove the test.
Clean up more #include files. Create an internal __boolean_t so two or three sys/ header files don't have to juggle the type. Use _KERNEL_STRUCTURES in variuos pieces of user code that delve into kvm. Reported-by: Rumko <firstname.lastname@example.org>, walt <email@example.com>
Block devices generally truncate the size of I/O requests which go past EOF. This is exactly what we want when manually reading or writing a block device such as /dev/ad0s1a, but is not desired when a VFS issues I/O ops on filesystem buffers. In such cases, any EOF condition must be considered an error. Implement a new filesystem buffer flag B_BNOCLIP, which getblk() and friends automatically set. If set, block devices are guarenteed to return an error if the I/O request is at EOF or would otherwise have to be clipped to EOF. Block devices further guarentee that b_bcount will not be modified when this flag is set. Adjust all block device EOF checks to use the new flag, and clean up the code while I'm there. Also, set b_resid in a couple of degenerate cases where it was not being set.
- Clarify the definitions of b_bufsize, b_bcount, and b_resid. - Remove unnecessary assignments based on the clarified fields. - Add additional checks for premature EOF. b_bufsize is only used by buffer management entities such as getblk() and other vnode-backed buffer handling procedures. b_bufsize is not required for calls to vn_strategy() or dev_dstrategy(). A number of other subsystems use it to track the original request size. b_bcount is the I/O request size, but b_bcount() is allowed to be truncated by the device chain if the request encompasses EOF (such as on a raw disk device). A caller which needs to record the original buffer size verses the EOF-truncated buffer can compare b_bcount after the I/O against a recorded copy of the original request size. This copy can be recorded in b_bufsize for unmanaged buffers (malloced or getpbuf()'d buffers). b_resid is always relative to b_bcount, not b_bufsize. A successful read that is truncated to the device EOF will thus have a b_resid of 0 and a truncated b_bcount.
Remove buf->b_saveaddr, assert that vmapbuf() is only called on pbuf's. Pass the user pointer and length to vmapbuf() rather then having it try to pull the information out of the buffer. vmapbuf() is now responsible for setting b_data, b_bufsize, and b_bcount. Also fix a bug in cam_periph_mapmem(). The procedure was failing to unmap earlier vmapped bufs if later vmapbuf() calls in the loop failed.
Remove b_xflags. Fold BX_VNCLEAN and BX_VNDIRTY into b_flags as B_VNCLEAN and B_VNDIRTY. Remove BX_AUTOCHAINDONE and recode the swap pager to use one of the caller data fields in the BIO instead.
Replace the the buffer cache's B_READ, B_WRITE, B_FORMAT, and B_FREEBUF b_flags with a separate b_cmd field. Use b_cmd to test for I/O completion as well (getting rid of B_DONE in the process). This further simplifies the setup required to issue a buffer cache I/O. Remove a redundant header file, bus/isa/i386/isa_dma.h and merge any discrepancies into bus/isa/isavar.h. Give ISADMA_READ/WRITE/RAW their own independant flag definitions instead of trying to overload them on top of B_READ, B_WRITE, and B_RAW. Add a routine isa_dmabp() which takes a struct buf pointer and returns the ISA dma flags associated with the operation. Remove the 'clear_modify' argument to vfs_busy_pages(). Instead, vfs_busy_pages() asserts that the buffer's b_cmd is valid and then uses it to determine the action it must take.
Get rid of pbgetvp() and pbrelvp(). Instead fold the B_PAGING flag directly into getpbuf() (the only type of buffer that pbgetvp() could be called on anyway). Change related b_flags assignments from '=' to '|='. Get rid of remaining depdendancies on b_vp. vn_strategy() now relies solely on the vp passed to it as an argument. Remove buffer cache code that sets b_vp for anonymous pbuf's. Add a stopgap 'vp' argument to vfs_busy_pages(). This is only really needed by NFS and the clustering code do to the severely hackish nature of the NFS and clustering code. Fix a bug in the ext2fs inode code where vfs_busy_pages() was being called on B_CACHE buffers. Add an assertion to vfs_busy_pages() to panic if it encounters a B_CACHE buffer.
Get rid of the remaining buffer background bitmap code. It's been turned off for a while, and it represents a fairly severe hack to the buffer cache code that just complicates further development.
Remove the buffer cache's B_PHYS flag. This flag was originally used as part of a severe hack to treat buffers containing 'user' addresses differently, in particular by using b_offset instead of b_blkno. Now that buffer cache buffers only HAVE b_offset (b_*blkno is gone for good), there is literally no difference between B_PHYS I/O and non-B_PHYS I/O once the buffer has been handed off to the device.
Move most references to the buffer cache array (buf) to kern/vfs_bio.c. Implement a procedure which scans all buffers, called scan_all_buffers(). Cleanup unused debugging code referencing buf.
Clean up the extended lookup features in the red-black tree code.
Major BUF/BIO work commit. Make I/O BIO-centric and specify the disk or file location with a 64 bit offset instead of a 32 bit block number. * All I/O is now BIO-centric instead of BUF-centric. * File/Disk addresses universally use a 64 bit bio_offset now. bio_blkno no longer exists. * Stackable BIO's hold disk offset translations. Translations are no longer overloaded onto a single structure (BUF or BIO). * bio_offset == NOOFFSET is now universally used to indicate that a translation has not been made. The old (blkno == lblkno) junk has all been removed. * There is no longer a distinction between logical I/O and physical I/O. * All driver BUFQs have been converted to BIOQs. * BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*, and findblk all now take and/or return 64 bit byte offsets instead of block numbers. Note that BMAP now returns a byte range for the before and after variables.
Replace the global buffer cache hash table with a per-vnode red-black tree. Add a B_HASHED b_flags bit as a sanity check. Remove the invalhash junk and replace with assertions in several cases where the buffer must already not be hashed. Get rid of incore() and gbincore() and replace with a new function called findblk(). Merge the new RB management with bgetvp(), the two are now fully integrated. Previous work has turned reassignbuf() into a mostly degenerate call, simplify its arguments and functionality to match. Remove an unnecessary reassignbuf() call from the NFS code. Get rid of pbreassignbuf(). Adjust the code in several places where it was assumed that calling BUF_LOCK() with LK_SLEEPFAIL after previously failing with LK_NOWAIT would always fail. This code was used to sleep before a retry. Instead, if the second lock unexpectedly succeeds, simply issue an unlock and retry anyway. Testing-by: Stefan Krueger <firstname.lastname@example.org>
buftimespinlock is utterly useless since the spinlock is released within lockmgr(). The only real problem was with lk_prio, which no longer exists, so get rid of the spin lock and document the remaining passive races.
Make the entire BUF/BIO system BIO-centric instead of BUF-centric. Vnode and device strategy routines now take a BIO and must pass that BIO to biodone(). All code which previously managed a BUF undergoing I/O now manages a BIO. The new BIO-centric algorithms allow BIOs to be stacked, where each layer represents a block translation, completion callback, or caller or device private data. This information is no longer overloaded within the BUF. Translation layer linkages remain intact as a 'cache' after I/O has completed. The VOP and DEV strategy routines no longer make assumptions as to which translated block number applies to them. The use the block number in the BIO specifically passed to them. Change the 'untranslated' constant to NOOFFSET (for bio_offset), and (daddr_t)-1 (for bio_blkno). Rip out all code that previously set the translated block number to the untranslated block number to indicate that the translation had not been made. Rip out all the cluster linkage fields for clustered VFS and clustered paging operations. Clustering now occurs in a private BIO layer using private fields within the BIO. Reformulate the vn_strategy() and dev_dstrategy() abstraction(s). These routines no longer assume that bp->b_vp == the vp of the VOP operation, and the dev_t is no longer stored in the struct buf. Instead, only the vp passed to vn_strategy() (and related *_strategy() routines for VFS ops), and the dev_t passed to dev_dstrateg() (and related *_strategy() routines for device ops) is used by the VFS or DEV code. This will allow an arbitrary number of translation layers in the future. Create an independant per-BIO tracking entity, struct bio_track, which is used to determine when I/O is in-progress on the associated device or vnode. NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold the fields describing the data buffer, resid, and error state. Major-testing-by: Stefan Krueger
Convert the lockmgr interlock from a token to a spinlock. This fixes a problem on SMP boxes where the MP lock would unexpectedly lose atomicy for a short period of time due to token acquisition. Add a tsleep_interlock() call which takes advantage of tsleep()'s cpu locality of reference to provide a helper function which allows us to atomically spin_unlock() and tsleep() in an MP safe manner with only a critical section. Basically all it does is set a cpumask bit for the ident hash index to cause other cpu's issuing a wakeup to notify our cpu. Any actual wakeup occuring during the race period after the spin_unlock but before the tsleep() call will be delayed by the critical section until after the tsleep has queued the thread. Cleanup some unused junk in vm_map.h.
Move bio_lblkno (logical blockno in a file) field back to its rightful place, which is in struct buf. Lower levels have no knowledge of this little critter. Suggested-by: dillon
Move the bswlist symbol into vm/vm_pager.c because PBUFs are the only consumer of the latter. The PBUF abstraction is just a clever hack, this code will be redone at some point so this measure is temporary.
BUF/BIO cleanup 7/99: First attempt at separating low-level information from BUF structure into the new BIO structure. The latter will be used to represent the actual I/O underlying the buffer cache, other subsystems and device drivers. Other information from the BUF structure will be moved eventually once their place in the grand scheme is determined. For now, preprocess macros have been added to reduce widespread changes; this is a temporary measure by all means until more of the BIO and BUF API is formalised. Remove compatibility preprocessor macros in the AAC driver because our BUF/BIO system is mutating; not to mention they were getting in the way. NB the name BIO has been used because it's quite appropriate and known among kernel developers from other operating system groups, be it BSD or Linux. This change should not have any operational affect (famous last words). Reviewed by: Matthew Dillon <email@example.com>
Put unused flag space definitions back to their original position in order to avoid confusion. Requested-by: Matt
Bring name of an unused flag field in line with the rest.
BUF/BIO cleanup 3/99: Retire the B_CALL flag in favour of checking the bp->b_iodone pointer directly, thus simplifying the BUF interface even more. Move scattered B_UNUSED* flag space defintions into one place, that is below the rest of the definitions.
BUF/BIO cleanup 2/99: Localise buffer queue information into kern/vfs_bio.c, it should not be messed with outside of the named file. Convert the QUEUE_* #defines into enum bufq_type, prefix the names with 'B'. The change to initpbuf() is acceptable since they are a hack anyway, not to mention that Move vfs_bufstats() from kern/vfs_syscalls.c into kern/vfs_bio.c since that's where it should really belong, atleast till its use is cleaned. Move bufqueues extern from sys/buf.h into kern/vfs_bio.c as it shouldn't be messed with by anything else. It was only sitting in sys/buf.h because of vfs_bufstats(). Note the change to initpbuf() is acceptable since they are a hack anyway, not to mention that the said function and friends should probably reside in kern/vfs_bio.c.
Implement Red-Black trees for the vnode clean/dirty buffer lists. Implement ranged fsyncs and adjust the syncer to use the new capability. This capability will also soon be used to replace the write_behind heuristic. Rewrite the fsync code for all VFSs to use the new APIs (generally simplifying them). Get rid of B_WRITEINPROG, it is no longer useful or needed. Get rid of B_SCANNED, it is no longer useful or needed. Rewrite the NFS 2-phase commit protocol to take advantage of the new Red-Black tree topology. Add RB_SCAN() for callback-scanning of Red-Black trees. Give RB_SCAN the ability to track the 'next' scan node and automatically fix it up if the callback directly or indirectly or through blocking indirectly deletes nodes in the tree while the scan is in progress. Remove most related loop restart conditions, they are no longer necessary. Disable filesystem background bitmap writes. This really needs to be solved a different way and the concept does not work well with red-black trees.
BUF/BIO stage 2: o Remove remaining source references to b_caller2 and b_driver2 field members of the BUF structure. o Remove b_caller2 and b_driver2 field members from the BUF structure. Discussed-with: Matthew Dillon <firstname.lastname@example.org>
Annotate the b_xio field member of the BUF structure.
BUF/BIO work, for removing the requirement of KVA mappings for I/O requests. Stage 1 of 8: o Replace the b_pages member of the BUF structure with an embedded XIO (b_xio). The XIO will be used for managing the BUF's page lists. o Initialize the XIO at two main (only) points: 1) the pbuf code, which is used by the NFS code to create a temporary buffer; and bufinit(9), which is used by the rest of the BUF/BIO consumers. Discussed-with: Matthew Dillon <email@example.com>,
Make buftimetoken an extern so it is not declared as a common variable. Modules were compiling up with their own local copy of buftimetoken rather then linking against the kernel's buftimetoken, causing modules to crash. Reported-by: Adam K Kirchhoff <firstname.lastname@example.org>
__P() != wanted, begin removal, in order to preserve white space this needs to be done by hand, as I accidently killed a source tree that I had gotten this far on. I'm committing this now, LINT and GENERIC both build with these changes, there are many more to come.
DEV messaging stage 2/4: In this stage all DEV commands are now being funneled through the message port for action by the port's beginmsg function. CONSOLE and DISK device shims replace the port with their own and then forward to the original. FB (Frame Buffer) shims supposedly do the same thing but I haven't been able to test it. I don't expect instability in mainline code but there might be easy-to-fix, and some drivers still need to be converted. See primarily: kern/kern_device.c (new dev_*() functions and inherits cdevsw code from kern/kern_conf.c), sys/device.h, and kern/subr_disk.c for the high points. In this stage all DEV messages are still acted upon synchronously in the context of the caller. We cannot create a separate handler thread until the copyin's (primarily in ioctl functions) are made thread-aware. Note that the messaging shims are going to look rather messy in these early days but as more subsystems are converted over we will begin to use pre-initialized messages and message forwarding to avoid having to constantly rebuild messages prior to use. Note that DEV itself is a mess oweing to its 4.x roots and will be cleaned up in subsequent passes. e.g. the way sub-devices inherit the main device's cdevsw was always a bad hack and it still is, and several functions (mmap, kqfilter, psize, poll) return results rather then error codes, which will be fixed since now we have a message to store the result in :-)
MP Implementation 1/2: Get the APIC code working again, sweetly integrate the MP lock into the LWKT scheduler, replace the old simplelock code with tokens or spin locks as appropriate. In particular, the vnode interlock (and most other interlocks) are now tokens. Also clean up a few curproc/cred sequences that are no longer needed. The APs are left in degenerate state with non IPI interrupts disabled as additional LWKT work must be done before we can really make use of them, and FAST interrupts are not managed by the MP lock yet. The main thing for this stage was to get the system working with an APIC again. buildworld tested on UP and 2xCPU/MP (Dell 2550)
proc->thread stage 5: BUF/VFS clearance! Remove the ucred argument from vop_close, vop_getattr, vop_fsync, and vop_createvobject. These VOPs can be called from multiple contexts so the cred is fairly useless, and UFS ignorse it anyway. For filesystems (like NFS) that sometimes need a cred we use proc0.p_ucred for now. This removal also removed the need for a 'proc' reference in the related VFS procedures, which greatly helps our proc->thread conversion. bp->b_wcred and bp->b_rcred have also been removed, and for the same reason. It makes no sense to have a particular cred when multiple users can access a file. This may create issues with certain types of NFS mounts but if it does we will solve them in a way that doesn't pollute the struct buf.
thread stage 5: Separate the inline functions out of sys/buf.h, creating sys/buf2.h (A methodology that will continue as time passes). This solves inline vs struct ordering problems. Do a major cleanup of the globaldata access methodology. Create a gcc-cacheable 'mycpu' macro & inline to access per-cpu data. Atomicy is not required because we will never change cpus out from under a thread, even if it gets preempted by an interrupt thread, because we want to be able to implement per-cpu caches that do not require locked bus cycles or special instructions.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 22.214.171.124