Up to [DragonFly] / src / sys / bus / cam / scsi
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
MFC CAM fixes for the 2.0 release.
Fix multiple bugs in CAM related devices which go away unexpectedly. This fixes numerous panics when pulling a USB mass media device in the midst of heavy I/O. * The SIM lock was being unlocked via the periph->sim path after periph was unheld. periph can become free and blow up the unlock, so get the sim into a local variable first, then release periph. * The code which waits for CCB completion needs to be a while loop, not an if. It worked anyway, but wasn't very robust. * Add CAM_SIM_DEREGISTERED to flag when a sim is undergoing deregistration. * Beef up cam_dead_sim so it works more like a real sim. * Properly install &cam_dead_sim in the device and periph structures related to a SCSI bus, when deregistering the bus. * Disallow the addition of new devices when deregistering a bus. * NULL out periph->softc when freeing it.
Sync CAM with FreeBSD using lockmgr locks instead of mutexes. Note: This is mostly a code sync with FreeBSD which improves stability in addition to the items listed below. This provides a framework for releasing the mplock, but for now it's still there. Add an xpt_print function to reduce most of the xpt_print_path/printf pairs. Convert the core code to use it. Initial cut at Basic Domain Validation. Make cam_xpt's pronouncements match camcontrol (Tagged -> Command) Queueing. Pay attention to return value from xpt_bus_register in xpt_init. Add an xpt_rescan function and a thread that will field rescan requests. The purpose of this is to allow a SIM (or other entities) to request a bus rescan and have it then fielded in a different (process) context from the caller. Check the return value from cam_periph_acquire. Drop the periph/sim lock when calling disk_destroy(). Drop the topology lock before calling the periph oninvalidate and dtor vectors. For the XPT_SASYNC_CB operation, only decouple the broadcast to the bus and device lists instead of decoupling the whole operation. This avoids problems with SIMs going away. Split the camisr into per-SIM done queues. This optimizes the locking a little bit and allows for direct dispatch of the doneq from certain contexts that would otherwise face recursive locking problems. Zero the CCBs when mallocing them. Only schedule the xpt_finishconfig_task once. Eliminate the use of M_TEMP. Add a helper function for registering async callbacks. Release the bus reference that is acquired when doing a CAMIOCOMMAND ioctl. Zero scsi_readcapacity allocations so we can really tell if there has been data returned. Remove duplicate includes and fix typos. Add a bunch of definitions and structures to support newer drivers. When probing a newly found device, don't automatically assume that the device supports retrieving a serial number. Instead, first query the list of VPD pages it does support, and only query the serial number if it's supported, else silently move on. This eliminates a lot of noise during verbose booting, and will likely eliminate the need for most NOSERIAL quirks. Reduce diffs from FreeBSD. Obtained-from: FreeBSD
Sync quirk tables with FreeBSD. Obtained-from: FreeBSD
Incorporate the O_NONBLOCK open semantics of Linux and Solaris. Obtained-from: FreeBSD
Fine-grained malloc statistics - replace some M_DEVBUF with module-specific types. Obtained-from: FreeBSD
Make sure we look at the correct sub op codes when deciding whether it's an operation we can perform via the control device. Obtained-from: FreeBSD
avoid use after free Obtained-from: FreeBSD
Spelling fix in comment. Obtained-from: FreeBSD
Only set single initiator buffered mode if we've recorded in our softc that we should set it. Obtained-from: FreeBSD
Minor debug output changes. Also, the previous fallthrough was not intentional, so move the code around to perform correctly. Obtained-from: FreeBSD
Remove unused variables. Remove break after return. Add XXX comment where intent is unclear. Obtained-from: FreeBSD
panic() shouldn't have a \n. Obtained-from: FreeBSD
Rewrite of the CAM error recovery code. Some of the major changes include: - The SCSI error handling portion of cam_periph_error() has been broken out into a number of subfunctions to better modularize the code that handles the hierarchy of SCSI errors. As a result, the code is now much easier to read. - String handling and error printing has been significantly revamped. We now use sbufs to do string formatting instead of using printfs (for the kernel) and snprintf/strncat (for userland) as before. There is a new catchall error printing routine, cam_error_print() and its string-based counterpart, cam_error_string() that allow the kernel and userland applications to pass in a CCB and have errors printed out properly, whether or not they're SCSI errors. Among other things, this helped eliminate a fair amount of duplicate code in camcontrol. We now print out more information than before, including the CAM status and SCSI status and the error recovery action taken to remedy the problem. Obtained-from: FreeBSD
Don't use /dev/rXXX names. Obtained-from: FreeBSD
Change the peripheral driver list from a linker set to module driven driver registration. This should allow things like da, sa, cd etc to be in seperate KLD's to the cam core and make them preloadable. Obtained-from: FreeBSD
Rename printf -> kprintf in sys/ and add some defines where necessary (files which are used in userland, too).
Change the kernel dev_t, representing a pointer to a specinfo structure, to cdev_t. Change struct specinfo to struct cdev. The name 'cdev' was taken from FreeBSD. Remove the dev_t shim for the kernel. This commit generally removes the overloading of 'dev_t' between userland and the kernel. Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being properly converted to a userland dev_t.
Rename malloc->kmalloc, free->kfree, and realloc->krealloc. Pass 1
MASSIVE reorganization of the device operations vector. Change cdevsw to dev_ops. dev_ops is a syslink-compatible operations vector structure similar to the vop_ops structure used by vnodes. Remove a huge number of instances where a thread pointer is still being passed as an argument to various device ops and other related routines. The device OPEN and IOCTL calls now take a ucred instead of a thread pointer, and the CLOSE call no longer takes a thread pointer.
Replace the the buffer cache's B_READ, B_WRITE, B_FORMAT, and B_FREEBUF b_flags with a separate b_cmd field. Use b_cmd to test for I/O completion as well (getting rid of B_DONE in the process). This further simplifies the setup required to issue a buffer cache I/O. Remove a redundant header file, bus/isa/i386/isa_dma.h and merge any discrepancies into bus/isa/isavar.h. Give ISADMA_READ/WRITE/RAW their own independant flag definitions instead of trying to overload them on top of B_READ, B_WRITE, and B_RAW. Add a routine isa_dmabp() which takes a struct buf pointer and returns the ISA dma flags associated with the operation. Remove the 'clear_modify' argument to vfs_busy_pages(). Instead, vfs_busy_pages() asserts that the buffer's b_cmd is valid and then uses it to determine the action it must take.
b_resid and b_bcount are int, so use %d.
Make the entire BUF/BIO system BIO-centric instead of BUF-centric. Vnode and device strategy routines now take a BIO and must pass that BIO to biodone(). All code which previously managed a BUF undergoing I/O now manages a BIO. The new BIO-centric algorithms allow BIOs to be stacked, where each layer represents a block translation, completion callback, or caller or device private data. This information is no longer overloaded within the BUF. Translation layer linkages remain intact as a 'cache' after I/O has completed. The VOP and DEV strategy routines no longer make assumptions as to which translated block number applies to them. The use the block number in the BIO specifically passed to them. Change the 'untranslated' constant to NOOFFSET (for bio_offset), and (daddr_t)-1 (for bio_blkno). Rip out all code that previously set the translated block number to the untranslated block number to indicate that the translation had not been made. Rip out all the cluster linkage fields for clustered VFS and clustered paging operations. Clustering now occurs in a private BIO layer using private fields within the BIO. Reformulate the vn_strategy() and dev_dstrategy() abstraction(s). These routines no longer assume that bp->b_vp == the vp of the VOP operation, and the dev_t is no longer stored in the struct buf. Instead, only the vp passed to vn_strategy() (and related *_strategy() routines for VFS ops), and the dev_t passed to dev_dstrateg() (and related *_strategy() routines for device ops) is used by the VFS or DEV code. This will allow an arbitrary number of translation layers in the future. Create an independant per-BIO tracking entity, struct bio_track, which is used to determine when I/O is in-progress on the associated device or vnode. NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold the fields describing the data buffer, resid, and error state. Major-testing-by: Stefan Krueger
* Move function types to a separate line. * Ansify function definitions. * Remove (void) casts for discarded return values. In collaboration with: Alexey Slynko <firstname.lastname@example.org>
Remove spl*() calls from the bus/ infrastructure, replacing them with critical sections. Remove splusb() from everywhere, replacing it with critical sections.
Device layer rollup commit. * cdevsw_add() is now required. cdevsw_add() and cdevsw_remove() may specify a mask/match indicating the range of supported minor numbers. Multiple cdevsw_add()'s using the same major number, but distinctly different ranges, may be issued. All devices that failed to call cdevsw_add() before now do. * cdevsw_remove() now automatically marks all devices within its supported range as being destroyed. * vnode->v_rdev is no longer resolved when the vnode is created. Instead, only v_udev (a newly added field) is resolved. v_rdev is resolved when the vnode is opened and cleared on the last close. * A great deal of code was making rather dubious assumptions with regards to the validity of devices associated with vnodes, primarily due to the persistence of a device structure due to being indexed by (major, minor) instead of by (cdevsw, major, minor). In particular, if you run a program which connects to a USB device and then you pull the USB device and plug it back in, the vnode subsystem will continue to believe that the device is open when, in fact, it isn't (because it was destroyed and recreated). In particular, note that all the VFS mount procedures now check devices via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev is NULL prior to the first open. * The disk layer's device interaction has been rewritten. The disk layer (i.e. the slice and disklabel management layer) no longer overloads its data onto the device structure representing the underlying physical disk. Instead, the disk layer uses the new cdevsw_add() functionality to register its own cdevsw using the underlying device's major number, and simply does NOT register the underlying device's cdevsw. No confusion is created because the device hash is now based on (cdevsw,major,minor) rather then (major,minor). NOTE: This also means that underlying raw disk devices may use the entire device minor number instead of having to reserve the bits used by the disk layer, and also means that can we (theoretically) stack a fully disklabel-supported 'disk' on top of any block device. * The new reference counting scheme prevents this by associating a device with a cdevsw and disconnecting the device from its cdevsw when the cdevsw is removed. Additionally, all udev2dev() lookups run through the cdevsw mask/match and only successfully find devices still associated with an active cdevsw. * Major work on MFS: MFS no longer shortcuts vnode and device creation. It now creates a real vnode and a real device and implements real open and close VOPs. Additionally, due to the disk layer changes, MFS is no longer limited to 255 mounts. The new limit is 16 million. Since MFS creates a real device node, mount_mfs will now create a real /dev/mfs<PID> device that can be read from userland (e.g. so you can dump an MFS filesystem). * BUF AND DEVICE STRATEGY changes. The struct buf contains a b_dev field. In order to properly handle stacked devices we now require that the b_dev field be initialized before the device strategy routine is called. This required some additional work in various VFS implementations. To enforce this requirement, biodone() now sets b_dev to NODEV. The new disk layer will adjust b_dev before forwarding a request to the actual physical device. * A bug in the ISO CD boot sequence which resulted in a panic has been fixed. Testing by: lots of people, but David Rhodus found the most aggregious bugs.
device switch 1/many: Remove d_autoq, add d_clone (where d_autoq was). d_autoq was used to allow the device port dispatch to mix old-style synchronous calls with new style messaging calls within a particular device. It was never used for that purpose. d_clone will be more fully implemented as work continues. We are going to install d_port in the dev_t (struct specinfo) structure itself and d_clone will be needed to allow devices to 'revector' the port on a minor-number by minor-number basis, in particular allowing minor numbers to be directly dispatched to distinct threads. This is something we will be needing later on.
Do some M_WAITOK<->M_INTWAIT cleanups. Code entered from userland, such as device open and device ioctl, generally use M_WAITOK, while low level structures such as the capacity structure are allocated using M_INTWAIT.
Change M_NOWAIT to M_INTWAIT or M_WAITOK. CAM does a mediocre job checking for NULL returns from malloc() and even when it does it generally causes the device operation to fail instead of retrying, resulting in unacceptable behavior. M_NOWAIT semantics allow NULL to be returned during normal system operation. This is especially true in DragonFly. Also remove much of the code that previously checked for NULL. By using M_INTWAIT or M_WAITOK, malloc() will panic rather then return NULL. Only the addition of M_NULLOK allows a blocking malloc() to return NULL, and we do not use that flag in CAM. Add M_ZERO to a number of malloc()'s and remove subsequent bzero()'s, and add M_ZERO to a few mallocs (primarily for the read capacity data structure) that did not bother zeroing out the structure before. While the data is supposed to be overwritten read-capacity is often quite fragile due to the SCSI simulation layer, so we do not take any chances.
Do some fairly major include file cleanups to further separate kernelland from userland. * Do not allow userland to include sys/proc.h directly, it must use sys/user.h instead. This is because sys/proc.h has a huge number of kernel header file dependancies. * Do cleanups and work in lwkt_thread.c and lwkt_msgport.c to allow these files to be directly compiled in an upcoming userland thread support library. * sys/lock.h is inappropriately included by a number of third party programs so we can't disallow its inclusion, but do not include any kernel structures unless _KERNEL or _KERNEL_STRUCTURES are defined. * <ufs/ufs/inode.h> is often included by userland to get at the on-disk inode structure. Only include the on-disk components and do not include kernel structural components unless _KERNEL or _KERNEL_STRUCTURES is defined * Various usr.bin programs include sys/proc.h unnecessarily. * The slab allocator has no concept of malloc buckets. Remove malloc buckets structures and VMSTAT support from the system. * Make adjustments to sys/thread.h and sys/msgport.h such that the upcoming userland thread support library can include these files directly rather then copy them. * Use low level __int types in sys/globaldata.h, sys/msgport.h, sys/slaballoc.h, sys/thread.h, and sys/malloc.h, instead of high level sys/types.h types, reducing include dependancies.
kernel tree reorganization stage 1: Major cvs repository work (not logged as commits) plus a major reworking of the #include's to accomodate the relocations. * CVS repository files manually moved. Old directories left intact and empty (temporary). * Reorganize all filesystems into vfs/, most devices into dev/, sub-divide devices by function. * Begin to move device-specific architecture files to the device subdirs rather then throwing them all into, e.g. i386/include * Reorganize files related to system busses, placing the related code in a new bus/ directory. Also move cam to bus/cam though this may not have been the best idea in retrospect. * Reorganize emulation code and place it in a new emulation/ directory. * Remove the -I- compiler option in order to allow #include file localization, rename all config generated X.h files to use_X.h to clean up the conflicts. * Remove /usr/src/include (or /usr/include) dependancies during the kernel build, beyond what is normally needed to compile helper programs. * Make config create 'machine' softlinks for architecture specific directories outside of the standard <arch>/include. * Bump the config rev. WARNING! after this commit /usr/include and /usr/src/sys/compile/* should be regenerated from scratch.
DEV messaging stage 1/4: Rearrange struct cdevsw and add a message port and auto-queueing mask. The mask will tell us which message functions can be safely queued to another thread and which still need to run in the context of the caller. Primary configuration fields (name, cmaj, flags, port, autoq mask) are now at the head of the structure. Function vectors, which may eventually go away, are at the end. The port and autoq fields are non-functional in this stage. The old BDEV device major number support has also been removed from cdevsw, and code has been added to translate the bootdev passed from the boot code (the boot code has always passed the now defunct block device major numbers and we obviously need to keep that compatibility intact).
Remove the priority part of the priority|flags argument to tsleep(). Only flags are passed now. The priority was a user scheduler thingy that is not used by the LWKT subsystem. For process statistics assume sleeps without P_SINTR set to be disk-waits, and sleeps with it set to be normal sleeps. This commit should not contain any operational changes.
proc->thread stage 2: MAJOR revamping of system calls, ucred, jail API, and some work on the low level device interface (proc arg -> thread arg). As -current did, I have removed p_cred and incorporated its functions into p_ucred. p_prison has also been moved into p_ucred and adjusted accordingly. The jail interface tests now uses ucreds rather then processes. The syscall(p,uap) interface has been changed to just (uap). This is inclusive of the emulation code. It makes little sense to pass a proc pointer around which confuses the MP readability of the code, because most system call code will only work with the current process anyway. Note that eventually *ALL* syscall emulation code will be moved to a kernel-protected userland layer because it really makes no sense whatsoever to implement these emulations in the kernel. suser() now takes no arguments and only operates with the current process. The process argument has been removed from suser_xxx() so it now just takes a ucred and flags. The sysctl interface was adjusted somewhat.
thread stage 5: Separate the inline functions out of sys/buf.h, creating sys/buf2.h (A methodology that will continue as time passes). This solves inline vs struct ordering problems. Do a major cleanup of the globaldata access methodology. Create a gcc-cacheable 'mycpu' macro & inline to access per-cpu data. Atomicy is not required because we will never change cpus out from under a thread, even if it gets preempted by an interrupt thread, because we want to be able to implement per-cpu caches that do not require locked bus cycles or special instructions.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 220.127.116.11