Up to [DragonFly] / src / sys / sys
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Make some adjustments to clean up structural field names. Add type and storage uuid's to the partinfo structure for the DIOCGPART ioctl and load the fields up for GPT slices and disklabel64 partitions.
Implement non-booting support for the DragonFly 64 bit disklabel: * Add full kernel support. Both 32 and 64 bit labels will be probed. * Add a new program, disklabel64, which allows you to create and edit the new disklabel. * Add some logic to prevent foot shooting. DragonFly's 64 bit disklabels start at byte offset 0 on the disk slice or GPT partition and operate in a slice-relative fashion. No translation is required when going from on-disk to in-core or vise-versa, unlike the existing 32 bit disklabels. 512 bytes at the beginning of the label are reserved for legacy boot code. Specifically, the label starts at sector 0, NOT sector 1, which means its location on the disk is the same regardless of the sector size. The label has a UUID to uniquely identify the storage and a type and object uuid for each partition. All location specifications are 64 bit byte offsets, NOT logical blocks. The label enforces an alignment requirement for label-related I/O and partitions which defaults to 4K regardless of the sector size. This makes the label 100% portable across media with different sector sizes within the constraints of the alignment requirement. All partitions are specified using byte offsets and sizes, constrained by the alignment requirement, relative to the base of the label (i.e. offset 0 in the slice). disklabel64 will adjust the offsets for display purposes to be relative to the partition table area. The label headers, partition table, and boot2 areas come BEFORE the partition table area and partitions which overlap any of those objects are not allowed. By default, a virgin 64 bit disklabel will reserve 32K for boot2. As of this writing, boot1 and boot2 blocks have not yet been implemented.
Move all the code related to handling the current 32 bit disklabel to subr_disklabel32.c. Move the header file from sys/disklabel.h to sys/disklabel32.h. Rename all the related structures and constants and retire 'struct disklabel'. Redo the sys/disklabel.h header file to implement a generic disklabel abstraction. Modify kern/subr_diskslice.c to use this abstraction, with some shims for the ops dispatch at the moment which will be cleaned up later. Adjust all auxillary code that directly accesses 32 bit disklabels to use the new structure and constant names. Remove the snoop-adjust code. The kernel would snoop reads and writes to the disklabel area via the raw slice device (e.g. ad0s1) and convert the disklabel from the in-core format to the on-disk format and vise-versa. The reads and writes made by disklabel -r and the kernel's own internal readdisklabel and writedisklabel code used the snooping. Rearrange the kernel's internal code to manually convert the disklabel when reading and writing. Rearrange the /sbin/disklabel program to do the same when the -r option is used. Have the disklabel program also check which DragonFly OS it is running under so it can be run on older systems. Note that the disklabel binary prior to these changes will NOT operate on the disklabel properly if running on a NEW kernel. Introduce skeleton files for 64 bit disklabel support.
Disklabel separation work - Generally shift all disklabel-specific procedures for the kernel proper to a new source file, subr_disklabel32.c. Move the DTYPE_ and FS_ defines out of sys/disklabel.h and into a new header files sys/dtype.h Make adjustments to the uuids file, renaming "DragonFly Label" to "DragonFly Label32" and creating a "DragonFly Label64" uuid.
Expand the diskslice->ds_openmask from 8 bits to 256 bits to cover all possible partitions. Partitions from 'i' on, and the whole-disk partition, were not being properly tracked, resulting in multiple device opens and device closes to the underlying device. In particular, this caused USB memory sticks to connect to the CAM driver with ever-increasing DA#n unit numbers because CAMs reference counting got seriously corrupted. Reported-by: "Simon 'corecode' Schubert" <firstname.lastname@example.org>
Continue untangling the disklabel. * Remove numerous #include <sys/disklabel.h> lines that are no longer used. * Move DIOCWLABEL from sys/disklabel.h to sys/diskslice.h * Modify ffsinfo, fsirand, growfs, and newfs_msdos to use DIOCGPART instead of DIOCGDINFO to obtain disk geometry information. Add defaults where necessary to allow these programs to run on files instead of devices. Also, change ffsinfo to output to stdout by default.
Continue untangling the disklabel. * Move dk*() inline functions and other related stuff not directly related to the BSD disklabel out of sys/disklabel.h and into sys/diskslice.h. Add additional functions to sys/diskslice.h * Extend the slice and partition fields in the device minor number. We now support up to 128 slices and up to 256 partitions. * Implement new minor device numbers for 'raw slices', such as ad0s1. Previously raw slices used the same minor number as partition c within the slice. e.g. ad0s1 and ad0s1c had the same device number. This made it impossible to distinguish between the two. The 'whole disk' device's minor number has also changed. Our new whole-slice and whole-disk devices specify a partition number of (DKMAXPARTITIONS - 1) (aka 255). * Completely disable disklabel related operations on the raw disk, e.g. da0, and on partitions, e.g. da0s1a. Only allow disklabel operations on whole slices, e.g. da0s1. NOTE!! For compatibility while booting drivers which set DSO_COMPATLABEL, the compat disklabel may be read, but not written, via the whole-disk device. e.g. acd0. NOTE!! For compatibility we have no choice but to continue to snoop read/write operations on raw slices (e.g. da0s1) because the disklabel program and the kernel still depend on the snooping to modify the in-core version of the disklabel to the on-disk version. No snooping will occur on the whole-disk device (e.g. da0). No snooping will occur on raw slices (e.g. da0s1) if the disk is unlabeled and no in-core label was set. Note that disklabel -r -w DOES set an in-core label before writing to a raw-slice, so it is still ok. * dsopen() no longer attempts to scan the MBR or slice table when the whole-disk device (e.g. da0) is opened, and no longer attempts to read the disklabel when the whole-slice device is opened (e.g. da0s1). The disklabel is only read when a partition is explicitly opened or the label is explicitly read via an ioctl. * The virgin disklabel is stored in the struct diskslice for WHOLE_DISK_SLICE (slice 1).
Add getdisktabbyname() to libc. This will soon replace getdiskbyname(). Move _PATH_DISKTAB to <disktab.h>, remove #define DISKTAB entirely.
Remove the roll-your-own disklabel from CCD. Use the kernel disk manager for disklabel support instead. Make CCD a real disk device rather then a fake one. NOTE: All /dev/ccd* devices have changed and must be remade Introduce DSO_COMPATMBR. This forces an MBR sector to be reserved in front of a disklabel even when the target disk does not have slices. It is used by the CCD and VN devices to keep the disklabel aligned the same way it has been historically. Implement 64 bit block addressing for CCD. Implement a new filesystem type "ccd", and require that the devices backing the CCD use that filesystem type for safety. Fix a bug in DIOCGPART where the partinfo->media_blocks was not being set properly for partitions.
Continue untangling the disklabel. Add sector index reservation fields to the diskslice and partinfo structures. These fields will replace the hardcoded LABELSECTOR constant and also help manage reserved areas in the disklabel.
Continue untangling the disklabel. Reorganize struct partinfo and the DIOCGPART ioctl to extract the required information directly, and fix the DIOCGPART ioctl direction so userland can use it. This removes numerous disklabel references, particularly from the filesystem code which was doing silly indirections just to figure out the sector size. NOTE: The absolute byte offset of the slice or partition (relative to the base of the raw disk) is also made available, but is not currently used by the kernel.
Continue untangling the disklabel. Have most disk device drivers fill out and install a generic disk_info structure instead of filling out random fields in the disklabel. The generic disk_info structure uses a 64 bit integer to represent the media size in bytes or total sector count.
Start untangling the disklabel from various bits of code with the goal of introducing support for a new 64 bit disklabel. Remove the D_* flags for disklabel.d_flags. These sorts of flags just do not belong in the disk image. Relabel the partition sub-structure in disktab.h, and remove other ancient compatibility defines in disklabel.h.
Replace NOCDEV with NULL. NOCDEV was ((void *)-1) and as inherited from *BSD a long time ago due to the device pointer / device number duality. Now that the pointer and device number has been separated, we can just use NULL to indicate no-pointer. Replace si_refs with si_sysref. Use SYSREF for ref-count cdev_t. Enable cdev_t reclamation on deletion.
Change the kernel dev_t, representing a pointer to a specinfo structure, to cdev_t. Change struct specinfo to struct cdev. The name 'cdev' was taken from FreeBSD. Remove the dev_t shim for the kernel. This commit generally removes the overloading of 'dev_t' between userland and the kernel. Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being properly converted to a userland dev_t.
I'm growing tired of having to add #include lines for header files that the include file(s) I really want depend on. Go through nearly all major system include files and add appropriately #ifndef'd #include lines to include all dependant header files. Kernel source files now only need to #include the header files they directly depend on. So, for example, if I wanted to add a SYSCTL to a kernel source file, I would only have to #include <sys/sysctl.h> to bring in the support for it, rather then four or five header files in addition to <sys/sysctl.h>.
Major BUF/BIO work commit. Make I/O BIO-centric and specify the disk or file location with a 64 bit offset instead of a 32 bit block number. * All I/O is now BIO-centric instead of BUF-centric. * File/Disk addresses universally use a 64 bit bio_offset now. bio_blkno no longer exists. * Stackable BIO's hold disk offset translations. Translations are no longer overloaded onto a single structure (BUF or BIO). * bio_offset == NOOFFSET is now universally used to indicate that a translation has not been made. The old (blkno == lblkno) junk has all been removed. * There is no longer a distinction between logical I/O and physical I/O. * All driver BUFQs have been converted to BIOQs. * BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*, and findblk all now take and/or return 64 bit byte offsets instead of block numbers. Note that BMAP now returns a byte range for the before and after variables.
Make the entire BUF/BIO system BIO-centric instead of BUF-centric. Vnode and device strategy routines now take a BIO and must pass that BIO to biodone(). All code which previously managed a BUF undergoing I/O now manages a BIO. The new BIO-centric algorithms allow BIOs to be stacked, where each layer represents a block translation, completion callback, or caller or device private data. This information is no longer overloaded within the BUF. Translation layer linkages remain intact as a 'cache' after I/O has completed. The VOP and DEV strategy routines no longer make assumptions as to which translated block number applies to them. The use the block number in the BIO specifically passed to them. Change the 'untranslated' constant to NOOFFSET (for bio_offset), and (daddr_t)-1 (for bio_blkno). Rip out all code that previously set the translated block number to the untranslated block number to indicate that the translation had not been made. Rip out all the cluster linkage fields for clustered VFS and clustered paging operations. Clustering now occurs in a private BIO layer using private fields within the BIO. Reformulate the vn_strategy() and dev_dstrategy() abstraction(s). These routines no longer assume that bp->b_vp == the vp of the VOP operation, and the dev_t is no longer stored in the struct buf. Instead, only the vp passed to vn_strategy() (and related *_strategy() routines for VFS ops), and the dev_t passed to dev_dstrateg() (and related *_strategy() routines for device ops) is used by the VFS or DEV code. This will allow an arbitrary number of translation layers in the future. Create an independant per-BIO tracking entity, struct bio_track, which is used to determine when I/O is in-progress on the associated device or vnode. NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold the fields describing the data buffer, resid, and error state. Major-testing-by: Stefan Krueger
Remove DEC Alpha support.
Now that const'ification of users of dktypenames also const'ified most of users of fstypenames, finish the last bit of const'ification.
Make dktypnames const.
Remove PC98 support.
Clean up some misuses of bp->b_dev after a strategy function has completed (the field cannot be used after biodone() has been called). Add a separate dev_t argument to diskerr() to take care of the issue and get rid of some FD error reporting hacks at the same time. Reported-by: David Rhodus
Device layer rollup commit. * cdevsw_add() is now required. cdevsw_add() and cdevsw_remove() may specify a mask/match indicating the range of supported minor numbers. Multiple cdevsw_add()'s using the same major number, but distinctly different ranges, may be issued. All devices that failed to call cdevsw_add() before now do. * cdevsw_remove() now automatically marks all devices within its supported range as being destroyed. * vnode->v_rdev is no longer resolved when the vnode is created. Instead, only v_udev (a newly added field) is resolved. v_rdev is resolved when the vnode is opened and cleared on the last close. * A great deal of code was making rather dubious assumptions with regards to the validity of devices associated with vnodes, primarily due to the persistence of a device structure due to being indexed by (major, minor) instead of by (cdevsw, major, minor). In particular, if you run a program which connects to a USB device and then you pull the USB device and plug it back in, the vnode subsystem will continue to believe that the device is open when, in fact, it isn't (because it was destroyed and recreated). In particular, note that all the VFS mount procedures now check devices via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev is NULL prior to the first open. * The disk layer's device interaction has been rewritten. The disk layer (i.e. the slice and disklabel management layer) no longer overloads its data onto the device structure representing the underlying physical disk. Instead, the disk layer uses the new cdevsw_add() functionality to register its own cdevsw using the underlying device's major number, and simply does NOT register the underlying device's cdevsw. No confusion is created because the device hash is now based on (cdevsw,major,minor) rather then (major,minor). NOTE: This also means that underlying raw disk devices may use the entire device minor number instead of having to reserve the bits used by the disk layer, and also means that can we (theoretically) stack a fully disklabel-supported 'disk' on top of any block device. * The new reference counting scheme prevents this by associating a device with a cdevsw and disconnecting the device from its cdevsw when the cdevsw is removed. Additionally, all udev2dev() lookups run through the cdevsw mask/match and only successfully find devices still associated with an active cdevsw. * Major work on MFS: MFS no longer shortcuts vnode and device creation. It now creates a real vnode and a real device and implements real open and close VOPs. Additionally, due to the disk layer changes, MFS is no longer limited to 255 mounts. The new limit is 16 million. Since MFS creates a real device node, mount_mfs will now create a real /dev/mfs<PID> device that can be read from userland (e.g. so you can dump an MFS filesystem). * BUF AND DEVICE STRATEGY changes. The struct buf contains a b_dev field. In order to properly handle stacked devices we now require that the b_dev field be initialized before the device strategy routine is called. This required some additional work in various VFS implementations. To enforce this requirement, biodone() now sets b_dev to NODEV. The new disk layer will adjust b_dev before forwarding a request to the actual physical device. * A bug in the ISO CD boot sequence which resulted in a panic has been fixed. Testing by: lots of people, but David Rhodus found the most aggregious bugs.
Partitions>8: Increase the number of supported partitions from 8 to 16. Decrease the number of supported slices from 32 to 16. Note that the 5.x boot2 code, which we adopted, was being installed just after the old disklabel. This commit moves the boot code install to the next logical sector (aka 4.x) in order to accomodate the larger label. Fix newfs to not hardcode 'h' as the last partition. Also modify 'disklabel' to not complain about preexisting garbage past partition #8, and to detect and refuse to overwrite the old bootcode with the new larger label until after you have installed new boot code.
Fully synchronize sys/boot from FreeBSD-5.x, but add / to the module path so /kernel will be found and loaded instead of /boot/kernel. This will give us all the capabilities of the FreeBSD-5 boot code including AMD64 and ELF64 support. As part of this work, rather then try to adjust ufs/fs.h and friends to get UFS2 info I instead copied the fs.h and friends from FreeBSD-5 into the sys/boot subtree Additionally, import Peter Wemm's linker set improvements from FreeBSD-5.x. They happen to be compatible with GCC 2.95.x and it allows very few changes to be made to the boot code. Additionally import a number of other elements from FreeBSD-5 including sys/diskmbr.h separation.
__P() != wanted, begin removal, in order to preserve white space this needs to be done by hand, as I accidently killed a source tree that I had gotten this far on. I'm committing this now, LINT and GENERIC both build with these changes, there are many more to come.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 220.127.116.11