Up to [DragonFly] / src / sys / dev / disk / ata
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Implement raw extensions for WHOLE_DISK_SLICE device accesses for acd0. Disallow special accesses on devices that do not support the extensions. Implement direct track reading via /dev/acd0 or /dev/acd0t* (use MAKEDEV acd0t to create per-track devices). Fix a few bugs with the minor device numbers generated by MAKEDEV for /dev/acd*. /dev/acd0a and /dev/acd0c were improperly specifying the WHOLE_DISK_SLICE instead of the compatibility slice. Change all mountroot operations that were trying to access disks via RAW_PART to instead access them via WHOLE_SLICE_PART (removing more dependancies on the old disklabel structure). Replace the unconditional sector sanity check in dsopen() with better sanity checks in dscheck(). The checks are not made for special WHOLE_DISK_SLICE accesses, allowing weird sector sizes to feed through to the device.
The normal ATA driver is capable of handling 48 bit block addressing, but the blockaddr field in the ad_request structure was only 32 bits. Expand it to 64 bits. Note that NATA didn't have this problem.
Continue untangling the disklabel. Have most disk device drivers fill out and install a generic disk_info structure instead of filling out random fields in the disklabel. The generic disk_info structure uses a 64 bit integer to represent the media size in bytes or total sector count.
Rename printf -> kprintf in sys/ and add some defines where necessary (files which are used in userland, too).
Do a major clean-up of the BUSDMA architecture. A large number of essentially machine-independant drivers use the structures and definitions in machine-dependant directories that are really machine-independant in nature. Split <machine/bus_dma.h> into machine-depdendant and machine-independant parts and make the primary access run through <sys/bus_dma.h>. Remove <machine/bus.h>, <machine/bus_memio.h> and <machine/bus_pio.h>. The optimizations related to bus_memio.h and bus_pio.h made a huge mess, introduced machine-specific knowledge into essentially machine-independant drivers, and required specific #include file orderings to do their job. They may be reintroduced in some other form later on. Move <machine/resource.h> to <sys/bus_resource.h>. The contents of the file is machine-independant or can be made a superset across many platforms. Make <sys/bus.h> include <sys/bus_dma.h> and <sys/bus_resource.h> and include <sys/bus.h> where necessary. Remove all #include's of <machine/resource.h> and <machine/bus.h>. That is, make the BUSDMA infrastructure integral to I/O-mapped and memory-mapped accesses to devices and remove a large chunk of machine-specific dependancies from drivers. bus_if.h and device_if.h are now required to be present when using <sys/bus.h>.
Change the kernel dev_t, representing a pointer to a specinfo structure, to cdev_t. Change struct specinfo to struct cdev. The name 'cdev' was taken from FreeBSD. Remove the dev_t shim for the kernel. This commit generally removes the overloading of 'dev_t' between userland and the kernel. Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being properly converted to a userland dev_t.
Rename malloc->kmalloc, free->kfree, and realloc->krealloc. Pass 1
MASSIVE reorganization of the device operations vector. Change cdevsw to dev_ops. dev_ops is a syslink-compatible operations vector structure similar to the vop_ops structure used by vnodes. Remove a huge number of instances where a thread pointer is still being passed as an argument to various device ops and other related routines. The device OPEN and IOCTL calls now take a ucred instead of a thread pointer, and the CLOSE call no longer takes a thread pointer.
Replace the the buffer cache's B_READ, B_WRITE, B_FORMAT, and B_FREEBUF b_flags with a separate b_cmd field. Use b_cmd to test for I/O completion as well (getting rid of B_DONE in the process). This further simplifies the setup required to issue a buffer cache I/O. Remove a redundant header file, bus/isa/i386/isa_dma.h and merge any discrepancies into bus/isa/isavar.h. Give ISADMA_READ/WRITE/RAW their own independant flag definitions instead of trying to overload them on top of B_READ, B_WRITE, and B_RAW. Add a routine isa_dmabp() which takes a struct buf pointer and returns the ISA dma flags associated with the operation. Remove the 'clear_modify' argument to vfs_busy_pages(). Instead, vfs_busy_pages() asserts that the buffer's b_cmd is valid and then uses it to determine the action it must take.
Major BUF/BIO work commit. Make I/O BIO-centric and specify the disk or file location with a 64 bit offset instead of a 32 bit block number. * All I/O is now BIO-centric instead of BUF-centric. * File/Disk addresses universally use a 64 bit bio_offset now. bio_blkno no longer exists. * Stackable BIO's hold disk offset translations. Translations are no longer overloaded onto a single structure (BUF or BIO). * bio_offset == NOOFFSET is now universally used to indicate that a translation has not been made. The old (blkno == lblkno) junk has all been removed. * There is no longer a distinction between logical I/O and physical I/O. * All driver BUFQs have been converted to BIOQs. * BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*, and findblk all now take and/or return 64 bit byte offsets instead of block numbers. Note that BMAP now returns a byte range for the before and after variables.
Make the entire BUF/BIO system BIO-centric instead of BUF-centric. Vnode and device strategy routines now take a BIO and must pass that BIO to biodone(). All code which previously managed a BUF undergoing I/O now manages a BIO. The new BIO-centric algorithms allow BIOs to be stacked, where each layer represents a block translation, completion callback, or caller or device private data. This information is no longer overloaded within the BUF. Translation layer linkages remain intact as a 'cache' after I/O has completed. The VOP and DEV strategy routines no longer make assumptions as to which translated block number applies to them. The use the block number in the BIO specifically passed to them. Change the 'untranslated' constant to NOOFFSET (for bio_offset), and (daddr_t)-1 (for bio_blkno). Rip out all code that previously set the translated block number to the untranslated block number to indicate that the translation had not been made. Rip out all the cluster linkage fields for clustered VFS and clustered paging operations. Clustering now occurs in a private BIO layer using private fields within the BIO. Reformulate the vn_strategy() and dev_dstrategy() abstraction(s). These routines no longer assume that bp->b_vp == the vp of the VOP operation, and the dev_t is no longer stored in the struct buf. Instead, only the vp passed to vn_strategy() (and related *_strategy() routines for VFS ops), and the dev_t passed to dev_dstrateg() (and related *_strategy() routines for device ops) is used by the VFS or DEV code. This will allow an arbitrary number of translation layers in the future. Create an independant per-BIO tracking entity, struct bio_track, which is used to determine when I/O is in-progress on the associated device or vnode. NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold the fields describing the data buffer, resid, and error state. Major-testing-by: Stefan Krueger
Add a comment on top of ad_start, mentioning that it is called with a critical section held.
Remove spl*() in disk/{ata,buslogic,ccd} and replace them with
critical sections.
Unbreak addump(). request.callout must be callout_init()'ed before a call to ad_transfer(). Submitted by: YONETANI Tomokazu <qhwt+dragonfly-commits@les.ath.cx>
timeout/untimeout ==> callout_*
Add some robustness to the error-requeue code. FreeBSD-5's (new) ata driver had an issue with the donecount not being properly reset. This issue is not believed to occur with the old code but add sanity checks to be sure.
Fix an improper DELAY in the ata tag code (but nobody should be using the ata tag code anyway).
Clean up some misuses of bp->b_dev after a strategy function has completed (the field cannot be used after biodone() has been called). Add a separate dev_t argument to diskerr() to take care of the issue and get rid of some FD error reporting hacks at the same time. Reported-by: David Rhodus
Device layer rollup commit. * cdevsw_add() is now required. cdevsw_add() and cdevsw_remove() may specify a mask/match indicating the range of supported minor numbers. Multiple cdevsw_add()'s using the same major number, but distinctly different ranges, may be issued. All devices that failed to call cdevsw_add() before now do. * cdevsw_remove() now automatically marks all devices within its supported range as being destroyed. * vnode->v_rdev is no longer resolved when the vnode is created. Instead, only v_udev (a newly added field) is resolved. v_rdev is resolved when the vnode is opened and cleared on the last close. * A great deal of code was making rather dubious assumptions with regards to the validity of devices associated with vnodes, primarily due to the persistence of a device structure due to being indexed by (major, minor) instead of by (cdevsw, major, minor). In particular, if you run a program which connects to a USB device and then you pull the USB device and plug it back in, the vnode subsystem will continue to believe that the device is open when, in fact, it isn't (because it was destroyed and recreated). In particular, note that all the VFS mount procedures now check devices via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev is NULL prior to the first open. * The disk layer's device interaction has been rewritten. The disk layer (i.e. the slice and disklabel management layer) no longer overloads its data onto the device structure representing the underlying physical disk. Instead, the disk layer uses the new cdevsw_add() functionality to register its own cdevsw using the underlying device's major number, and simply does NOT register the underlying device's cdevsw. No confusion is created because the device hash is now based on (cdevsw,major,minor) rather then (major,minor). NOTE: This also means that underlying raw disk devices may use the entire device minor number instead of having to reserve the bits used by the disk layer, and also means that can we (theoretically) stack a fully disklabel-supported 'disk' on top of any block device. * The new reference counting scheme prevents this by associating a device with a cdevsw and disconnecting the device from its cdevsw when the cdevsw is removed. Additionally, all udev2dev() lookups run through the cdevsw mask/match and only successfully find devices still associated with an active cdevsw. * Major work on MFS: MFS no longer shortcuts vnode and device creation. It now creates a real vnode and a real device and implements real open and close VOPs. Additionally, due to the disk layer changes, MFS is no longer limited to 255 mounts. The new limit is 16 million. Since MFS creates a real device node, mount_mfs will now create a real /dev/mfs<PID> device that can be read from userland (e.g. so you can dump an MFS filesystem). * BUF AND DEVICE STRATEGY changes. The struct buf contains a b_dev field. In order to properly handle stacked devices we now require that the b_dev field be initialized before the device strategy routine is called. This required some additional work in various VFS implementations. To enforce this requirement, biodone() now sets b_dev to NODEV. The new disk layer will adjust b_dev before forwarding a request to the actual physical device. * A bug in the ISO CD boot sequence which resulted in a panic has been fixed. Testing by: lots of people, but David Rhodus found the most aggregious bugs.
device switch 1/many: Remove d_autoq, add d_clone (where d_autoq was). d_autoq was used to allow the device port dispatch to mix old-style synchronous calls with new style messaging calls within a particular device. It was never used for that purpose. d_clone will be more fully implemented as work continues. We are going to install d_port in the dev_t (struct specinfo) structure itself and d_clone will be needed to allow devices to 'revector' the port on a minor-number by minor-number basis, in particular allowing minor numbers to be directly dispatched to distinct threads. This is something we will be needing later on.
General ata malloc() flags cleanup. Use M_INTWAIT where appropriate and get rid of unnecessary NULL checks.
Bring in a bunch of well tested MPIPE changes. Preallocate a minimum number of mpipe elements when it is initialized. Use an array to cache free MPIPE buffers nad remove the data structure overloading that was previously occuring on the buffer itself. Add a deconstructor. Separate the blocking and non-blocking allocation APIs into their own functions. The new code still needs Giant, but it's getting a lot closer to being lock free.
The cam_sim structure was being deallocated unconditionally by device driver detach routines. The problem with this is that part of the CAM bus structure may still be active (for example, with pending timeout()'s), and even though the bus, target, and device is freed, since the sim IS freed any accesses through the sim will hit 0xdeadc0de. This case most often occurs with USB UMASS devices. The CAM_XPT and CAM_SIM layer has been revamped. CAM_DEV_UNCONFIGURED is now accounted for in the device->refcount, and the cam_sim structure is now ref-counted as well. Additionally, the cam_simq* code which handles the device queues has been revamped to refcount as well, so shared device queues (raid and multi-channel devices) are not free()'d before all references have gone away. scsi_low free'd its cam_sim twice. Fixed. USB was improperly using M_NOWAIT. All M_NOWAIT instances have been renamed to M_INTWAIT.
ATAng stage 5: sync additional function API changes from FBsd-4. We now have everything except the dma chipset changes and the busdma changes. Note that we retain our MPIPE code as it is far superior to what is in 4.x and 5.x.
ATAng stage 3: sync additional atang from 4.x, mostly non-opertional changes, changes in procedure args, etc.
ATAng stage 2: sync part of the ata_dma*() API. No operational changes.
ATAng stage 1: synch ad_attach() and atapi_attach(), including a fix for a recursive lock issue.
FreeBSD-5.x removed the 'read interrupt arrived early' check code, for undocumented reasons. As far as I can tell, only the ATA_S_READY bit needs to be set when a read interrupt arrives for a PIO read operation so reduce the test from ATA_S_READY|ATA_S_DSC|ATA_S_DRQ to just ATA_S_READY. Change the ATA RESET timing. Instead of polling up to 310000 times in 100uS intervals poll 3100 times in 10000uS intervals. Note that 5.x polls 310 times in 100ms intervals, which is silly (an unnecessarily long polling delay). Add a mandatory 50ms delay after all BUSY bits have been released. Some ATA devices release BUSY before they are ready to accept commands. Note that the ATA code waits 100ms after releasing RESET, before checking BUSY, which is probably overkill, but this does not cover the mandatory delay that must occur after BUSY is observed to have been released. (This patch primarily removes bogus 'read interrupt arrived early' warnings on the console for things like CF card IDE adapters and other badly designed IDE devices).
Add the MPIPE subsystem. This subsystem is used for 'pipelining' fixed-size allocations. Pipelining is used to avoid lack-of-resource deadlocks by still allowing resource allocations to 'block' by guarenteeing that an already in-progress operation will soon free memory that will be immediately used to satisfy the blocked resource. Adjust the ATAold code to use the new mechanism and remove the code that tried to back-off into PIO mode when resources were lacking.
kernel tree reorganization stage 1: Major cvs repository work (not logged as
commits) plus a major reworking of the #include's to accomodate the
relocations.
* CVS repository files manually moved. Old directories left intact
and empty (temporary).
* Reorganize all filesystems into vfs/, most devices into dev/,
sub-divide devices by function.
* Begin to move device-specific architecture files to the device
subdirs rather then throwing them all into, e.g. i386/include
* Reorganize files related to system busses, placing the related code
in a new bus/ directory. Also move cam to bus/cam though this may
not have been the best idea in retrospect.
* Reorganize emulation code and place it in a new emulation/ directory.
* Remove the -I- compiler option in order to allow #include file
localization, rename all config generated X.h files to use_X.h to
clean up the conflicts.
* Remove /usr/src/include (or /usr/include) dependancies during the
kernel build, beyond what is normally needed to compile helper
programs.
* Make config create 'machine' softlinks for architecture specific
directories outside of the standard <arch>/include.
* Bump the config rev.
WARNING! after this commit /usr/include and /usr/src/sys/compile/*
should be regenerated from scratch.
DEV messaging stage 2/4: In this stage all DEV commands are now being funneled through the message port for action by the port's beginmsg function. CONSOLE and DISK device shims replace the port with their own and then forward to the original. FB (Frame Buffer) shims supposedly do the same thing but I haven't been able to test it. I don't expect instability in mainline code but there might be easy-to-fix, and some drivers still need to be converted. See primarily: kern/kern_device.c (new dev_*() functions and inherits cdevsw code from kern/kern_conf.c), sys/device.h, and kern/subr_disk.c for the high points. In this stage all DEV messages are still acted upon synchronously in the context of the caller. We cannot create a separate handler thread until the copyin's (primarily in ioctl functions) are made thread-aware. Note that the messaging shims are going to look rather messy in these early days but as more subsystems are converted over we will begin to use pre-initialized messages and message forwarding to avoid having to constantly rebuild messages prior to use. Note that DEV itself is a mess oweing to its 4.x roots and will be cleaned up in subsequent passes. e.g. the way sub-devices inherit the main device's cdevsw was always a bad hack and it still is, and several functions (mmap, kqfilter, psize, poll) return results rather then error codes, which will be fixed since now we have a message to store the result in :-)
DEV messaging stage 1/4: Rearrange struct cdevsw and add a message port and auto-queueing mask. The mask will tell us which message functions can be safely queued to another thread and which still need to run in the context of the caller. Primary configuration fields (name, cmaj, flags, port, autoq mask) are now at the head of the structure. Function vectors, which may eventually go away, are at the end. The port and autoq fields are non-functional in this stage. The old BDEV device major number support has also been removed from cdevsw, and code has been added to translate the bootdev passed from the boot code (the boot code has always passed the now defunct block device major numbers and we obviously need to keep that compatibility intact).
proc->thread stage 2: MAJOR revamping of system calls, ucred, jail API, and some work on the low level device interface (proc arg -> thread arg). As -current did, I have removed p_cred and incorporated its functions into p_ucred. p_prison has also been moved into p_ucred and adjusted accordingly. The jail interface tests now uses ucreds rather then processes. The syscall(p,uap) interface has been changed to just (uap). This is inclusive of the emulation code. It makes little sense to pass a proc pointer around which confuses the MP readability of the code, because most system call code will only work with the current process anyway. Note that eventually *ALL* syscall emulation code will be moved to a kernel-protected userland layer because it really makes no sense whatsoever to implement these emulations in the kernel. suser() now takes no arguments and only operates with the current process. The process argument has been removed from suser_xxx() so it now just takes a ucred and flags. The sysctl interface was adjusted somewhat.
thread stage 5: Separate the inline functions out of sys/buf.h, creating sys/buf2.h (A methodology that will continue as time passes). This solves inline vs struct ordering problems. Do a major cleanup of the globaldata access methodology. Create a gcc-cacheable 'mycpu' macro & inline to access per-cpu data. Atomicy is not required because we will never change cpus out from under a thread, even if it gets preempted by an interrupt thread, because we want to be able to implement per-cpu caches that do not require locked bus cycles or special instructions.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 1.60.2.24