Up to [DragonFly] / src / sys / vfs / isofs / cd9660
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Miscellanious performance adjustments to the kernel * Add an argument to VOP_BMAP so VFSs can discern the type of operation the BMAP is being done for. * Normalize the variable name denoting the blocksize to 'blksize' in vfs_cluster.c. * Fix a bug in the cluster code where a stale bp->b_error could wind up getting returned when B_ERROR is not set. * Do not B_AGE cluster bufs. * Pass the block size to both cluster_read() and cluster_write() instead of those routines getting the block size from vp->v_mount->mnt_stat.f_iosize. This allows different areas of a file to use a different block size. * Properly initialize bp->b_bio2.bio_offset to doffset in cluster_read(). This fixes an issue where VFSs were making an extra, unnecessary call to BMAP. * Do not recycle vnodes on the free list until numvnodes has reached desiredvnodes. Vnodes were being recycled when their resident page count had dropped to zero, but this is actually too early as the VFS may cache important information in the vnode that would otherwise require a number of I/O's to re-acquire. This mainly helps HAMMER (whos inode lookups are fairly expensive). * Do not VAGE vnodes. * Remove the minvnodes test. There is no reason not to load the vnode cache all the way through to its max. * buf_cmd_t visibility for the new BMAP argument.
Make necessary changes to readdir/getdirentries to support HAMMER. HAMMER needs to use 64 bit directory cookies. Adjust libc's DIR structure and change readdir to acquire the directory position via lseek() instead of using the basep argument to getdirentries(). The basep argument is a long, which is 32 bits on IA32, and it just isn't wide enough. The seek position is 64 bits and is wide enough. Sizeof(DIR) has changed, but hopefully won't cause any major issues since libc is responsible for allocating it. The APIs remain the same. Adjust the VOP_READIR() VFS interface routine to return 64 bit cookies. All VFSs have been reworked, requiring only minor adjustments.
Modify struct vattr: Increase va_nlink, va_fileid (the inode number), and va_gen from 32 bit to 64 bit integers. Add va_uid_uuid, va_gid_uuid, and va_fsid_uuid, and flags to indicate that these fields are valid. The original va_uid and va_gid are retained. This change has no external visibility. Modify struct statvfs: Use spare fields to add f_fsid_uuid and f_uid_uuid to the structure, and flags indicating that those fields are valid. This change has minimal external visibility. The size of the structure has not changed. Modify struct stat: Add a new file type S_IFDB. DB files are like regular files but access data on a record by record basis. The seek position is a 64 bit record key and not a byte offset. Further work in this area will be done later on to support related UIO operations. This change has minimal external visibility. The size of the structure has not changed.
Add vop_stdgetpages() and vop_stdputpages() and replace those filesystem getpages and putpages routines which were doing the same thing.
Remove the vpp (returned underlying device vnode) argument from VOP_BMAP(). VOP_BMAP() may now only be used to determine linearity and clusterability of the blocks underlying a filesystem object. The meaning of the returned block number (other then being contiguous as a means of indicating linearity or clusterability) is now up to the VFS. This removes visibility into the device(s) underlying a filesystem from the rest of the kernel.
Adjust some comments with reality.
Give the device major / minor numbers their own separate 32 bit fields in the kernel. Change dev_ops to use a RB tree to index major device numbers and remove the 256 device major number limitation. Build a dynamic major number assignment feature into dev_ops_add() and adjust ASR (which already had a hand-rolled one), and MFS to use the feature. MFS at least does not require any filesystem visibility to access its backing device. Major devices numbers >= 256 are used for dynamic assignment. Retain filesystem compatibility for device numbers that fall within the range that can be represented in UFS or struct stat (which is a single 32 bit field supporting 8 bit major numbers and 24 bit minor numbers).
Rename printf -> kprintf in sys/ and add some defines where necessary (files which are used in userland, too).
Rename malloc->kmalloc, free->kfree, and realloc->krealloc. Pass 1
VNode sequencing and locking - part 4/4 - subpart 1 of many. Move the vnode lock for VOP_READDIR out of the kernel upper layers and into the filesystem.
VNode sequencing and locking - part 3/4. VNode aliasing is handled by the namecache (aka nullfs), so there is no longer a need to have VOP_LOCK, VOP_UNLOCK, or VOP_ISSLOCKED as 'VOP' functions. Both NFS and DEADFS have been using standard locking functions for some time and are no longer special cases. Replace all uses with native calls to vn_lock, vn_unlock, and vn_islocked. We can't have these as VOP functions anyhow because of the introduction of the new SYSLINK transport layer, since vnode locks are primarily used to protect the local vnode structure itself.
Introduce sys/syslink.h, the beginnings of a VOP-compatible RPC-like communications infrastructure that will be used for userland VFS and communications between hosts in a cluster. Begin merging the vnode operations vector code with syslink by replacing vnodeop_desc with syslink_desc. Also get rid of a lot of junk related to vnodeop_desc that is no longer used.
Remove several layers in the vnode operations vector init code. Declare the operations vector directly instead of via a descriptor array. Remove most of the recalculation code, it stopped being needed over a year ago. This work is similar to what FreeBSD now does, but was developed along a different line. Ultimately our vop_ops will become SYSLINK ops for userland VFS and clustering support.
Use the MP friendly objcache instead of zalloc to allocate temporary MAXPATHLEN space.
* Fix a number of cases where too much kernel memory might be allocated to satisfy a directory read operation. * Calculate a minimum of (1) allocated directory cookie and limit the maximum to 1024. * Rewrite ufs_readdir() (part 1/2) to use the buffer cache instead of allocating a kernel buffer and to do better validation of the scanned directory entries. * Use a simpler fix for EXT2FS. Reported-by: [NetBSD.org #7471]
The thread/proc pointer argument in the VFS subsystem originally existed for... well, I'm not sure *WHY* it originally existed when most of the time the pointer couldn't be anything other then curthread or curproc or the code wouldn't work. This is particularly true of lockmgr locks. Remove the pointer argument from all VOP_*() functions, all fileops functions, and most ioctl functions.
Due to continuing issues with VOP_READ/VOP_WRITE ops being called without a VOP_OPEN, particularly by NFS, redo the way VM objects are associated with vnodes. * The size of the object is now passed to vinitvmio(). vinitvmio() no longer calls VOP_GETATTR(). * Instead of trying to call vinitvmio() conditionally in various places, we now call it unconditionally when a vnode is instantiated if the filesystem at any time in the future intends to use the buffer cache to access that vnode's dataspace. * Specfs 'disk' devices are an exception. Since we cannot safely do I/O on such vnodes if they have not been VOP_OPEN()'ed anyhow, the VM objects for those vnodes are still only associated on open. The performance impact is limited to the case where large numbers of vnodes are being created and destroyed. This case only occurs when a large directory topology (number of files > kernel's vnode cache) is traversed and all related inodes are cached by the system. Being a pure-cpu case the slight loss of performance due to the VM object allocations is not really a big dael.
Clone cd9660_blkatoff() into a new procedure, cd9660_devblkatoff(), which returns a devvp-relative buffer rather then the vp-relative buffer. This allows us to access meta-data relative to a vnode without having to instantiate a VM object for that vnode. The new function is used for all directory scans and (negative offset) meta-data access. This fixes a panic due to recent buffer cache commits that formalized the requirements for using the buffer cache. Also, prior to this change, the CD9660 filesystem was using B_MALLOC buffers for a great deal of meta-data access that could very easily have been backed by the device vnode's VM object instead. B_MALLOC buffers have severe caching limitations. This commit fixes all of that as well.
Use the vnode v_opencount and v_writecount universally. They were previously only used by specfs. Require that VOP_OPEN and VOP_CLOSE calls match. Assert on boundary errors. Clean up umount's FORCECLOSE mode. Adjust deadfs to allow duplicate closes (which can happen due to a forced unmount or revoke). Add vop_stdopen() and vop_stdclose() and adjust the default vnode ops to call them. All VFSs except DEADFS which supply their own vop_open and vop_close now call vop_stdopen() and vop_stdclose() to handle v_opencount and v_writecount adjustments. Change the VOP_OPEN/fp specs. VOP_OPEN (aka vop_stdopen) is now responsible for filling in the file pointer information, rather than the caller of VOP_OPEN. Additionally, when supplied a file pointer, VOP_OPEN is now allowed to populate the file pointer with a different vnode then the one passed to it, which will be used later on to allow filesystems which synthesize different vnodes on open, for example so we can create a generic tty/pty pairing devices rather than scanning for an unused pty, and so we can create swap-backed generic anonymous file descriptors rather than having to use /tmp. And for other purposes as well. Fix UFS's mount/remount/unmount code to make the proper VOP_OPEN and VOP_CLOSE calls when a filesystem is remounted read-only or read-write.
Remove VOP_GETVOBJECT, VOP_DESTROYVOBJECT, and VOP_CREATEVOBJECT. Rearrange the VFS code such that VOP_OPEN is now responsible for associating a VM object with a vnode. Add the vinitvmio() helper routine.
Major BUF/BIO work commit. Make I/O BIO-centric and specify the disk or file location with a 64 bit offset instead of a 32 bit block number. * All I/O is now BIO-centric instead of BUF-centric. * File/Disk addresses universally use a 64 bit bio_offset now. bio_blkno no longer exists. * Stackable BIO's hold disk offset translations. Translations are no longer overloaded onto a single structure (BUF or BIO). * bio_offset == NOOFFSET is now universally used to indicate that a translation has not been made. The old (blkno == lblkno) junk has all been removed. * There is no longer a distinction between logical I/O and physical I/O. * All driver BUFQs have been converted to BIOQs. * BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*, and findblk all now take and/or return 64 bit byte offsets instead of block numbers. Note that BMAP now returns a byte range for the before and after variables.
Make the entire BUF/BIO system BIO-centric instead of BUF-centric. Vnode and device strategy routines now take a BIO and must pass that BIO to biodone(). All code which previously managed a BUF undergoing I/O now manages a BIO. The new BIO-centric algorithms allow BIOs to be stacked, where each layer represents a block translation, completion callback, or caller or device private data. This information is no longer overloaded within the BUF. Translation layer linkages remain intact as a 'cache' after I/O has completed. The VOP and DEV strategy routines no longer make assumptions as to which translated block number applies to them. The use the block number in the BIO specifically passed to them. Change the 'untranslated' constant to NOOFFSET (for bio_offset), and (daddr_t)-1 (for bio_blkno). Rip out all code that previously set the translated block number to the untranslated block number to indicate that the translation had not been made. Rip out all the cluster linkage fields for clustered VFS and clustered paging operations. Clustering now occurs in a private BIO layer using private fields within the BIO. Reformulate the vn_strategy() and dev_dstrategy() abstraction(s). These routines no longer assume that bp->b_vp == the vp of the VOP operation, and the dev_t is no longer stored in the struct buf. Instead, only the vp passed to vn_strategy() (and related *_strategy() routines for VFS ops), and the dev_t passed to dev_dstrateg() (and related *_strategy() routines for device ops) is used by the VFS or DEV code. This will allow an arbitrary number of translation layers in the future. Create an independant per-BIO tracking entity, struct bio_track, which is used to determine when I/O is in-progress on the associated device or vnode. NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold the fields describing the data buffer, resid, and error state. Major-testing-by: Stefan Krueger
* Remove (void) casts for discarded return values. * Put function types on separate lines. * Ansify function definitions. In-collaboration-with: Alexey Slynko <firstname.lastname@example.org>
Rename all the functions and structures for the old VOP namespace API functions from vop_* to vop_old_*. e.g. vop_lookup -> vop_old_lookup. This will make it easier to identify areas containing old VOP API code. Remove vop_old_*_ap() functions, they are not used (and not allowed to be used). The old API is only allowed at the leaf of a VFS stack.
Make struct dirent contain a full 64bit inode. Allow more than 255 byte filenames by increasing d_namlen to 16bit. Remove UFS specific macros from sys/dirent.h, programs which really need them should include vfs/ufs/dir.h. MAXNAMLEN should not be used, but replaced by NAME_MAX. To keep the impact for older BSD code small, d_ino and d_fileno are kept in the old meaning when __BSD_VISIBLE is defined, otherwise the POSIX version d_ino is used. This will be changed later to always define only d_ino and make d_fileno a compatiblity macro for __BSD_VISIBLE. d_name is left with hard-coded 256 byte space, this will be changed at some point in the future and doesn't affect the ABI. Programs should correctly allocate space themselve, since the maximum directory entry length can be > 256 byte. For allocating dirents (e.g. for readdir_r), _DIRENT_RECLEN and _DIRENT_DIRSIZ should be used. NetBSD has choosen the same names. Revamp the compatibility code to always use a local kernel buffer and write out the entries. This will be changed later by passing down the output function to vop_readdir, elimininating the redundant copy. Change NFS and CD9660 to use to use vop_write_dirent, for CD9660 ensure that the buffers are big enough by prepending char arrays of the right size. Tested-by & discussed-with: dillon
Make nlink_t 32bit and ino_t 64bit. Implement the old syscall numbers for *stat by wrapping the new syscalls and truncation of the values. Add a hack for boot2 to keep ino_t 32bit, otherwise we would have to link the 64bit math code in and that would most likely overflow boot2. Bump libc major to annotate changed ABI and work around a problem with strip during installworld. strip is dynamically linked and doesn't play well with the new libc otherwise. Support for 64bit inode numbers is still incomplete, because the dirent limited to 32bit. The checks for nlink_t have to be redone too.
Introduce vnodepv_entry_t as type for the vnodeopv_entry functions. This is slightly better than casting all the functions to void *, which is a data pointer.
VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK. NOTE: unionfs and nullfs are temporarily broken by this commit. * Remove the old namecache API. Remove vfs_cache_lookup(), cache_lookup(), cache_enter(), namei() and lookup() are all gone. VOP_LOOKUP() and VOP_CACHEDLOOKUP() have been collapsed into a single non-caching VOP_LOOKUP(). * Complete the new VFS CACHE (namecache) API. The new API is able to supply topological guarentees and is able to reserve namespaces, including negative cache spaces (whether the target name exists or not), which the new API uses to reserve namespace for things like NRENAME and NCREATE (and others). * Complete the new namecache API. VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE, NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE. These new calls take (typicaly locked) namecache pointers rather then combinations of directory vnodes, file vnodes, and name components. The new calls are *MUCH* simpler in concept and implementation. For example, VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments. The new namecache API uses the namecache to lock namespaces without having to lock the underlying vnodes. For example, this allows the kernel to reserve the target name of a create function trivially. Namecache records are maintained BY THE KERNEL for both positive and negative hits. Generally speaking, the kernel layer is now responsible for resolving path elements. NRESOLVE is called when an unresolved namecache record needs to be resolved. Unlike the old VOP_LOOKUP, NRESOLVE is simply responsible for associating a vnode to a namecache record (positive hit) or telling the system that it's a negative hit, and not responsible for handling symlinks or other special cases or doing any of the other path lookup work, much unlike the old VOP_LOOKUP. It should be particularly noted that the new namecache topology does not allow disconnected namecache records. In rare cases where a vnode must be converted to a namecache pointer for new API operation via a file handle (i.e. NFS), the cache_fromdvp() function is provided and a new API VOP, VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the topology leading up to the requested vnode. These and other topological guarentees greatly reduce the complexity of the new namecache API. The new namei() is called nlookup(). This function uses a combination of cache_n*() calls, VOP_NRESOLVE(), and standard VOP calls resolve the supplied path, deal with symlinks, and so forth, in a nice small compact compartmentalized procedure. * The old VFS code is no longer responsible for maintaining namecache records, a function which was mostly adhoc cache_purge()s occuring before the VFS actually knows whether an operation will succeed or not. The new VFS code is typically responsible for adjusting the state of locked namecache records passed into it. For example, if NCREATE succeeds it must call cache_setvp() to associate the passed namecache record with the vnode representing the successfully created file. The new requirements are much less complex then the old requirements. * Most VFSs still implement the old API calls, albeit somewhat modified and in particular the VOP_LOOKUP function is now *MUCH* simpler. However, the kernel now uses the new API calls almost exclusively and relies on compatibility code installed in the default ops (vop_compat_*()) to convert the new calls to the old calls. * All kernel system calls and related support functions which used to do complex and confusing namei() operations now do far less complex and far less confusing nlookup() operations. * SPECOPS shortcutting has been implemented. User reads and writes now go directly to supporting functions which talk to the device via fileops rather then having to be routed through VOP_READ or VOP_WRITE, saving significant overhead. Note, however, that these only really effect /dev/null and /dev/zero. Implementing this was fairly easy, we now simply pass an optional struct file pointer to VOP_OPEN() and let spec_open() handle the override. SPECIAL NOTES: It should be noted that we must still lock a directory vnode LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because a number of VFS's (including UFS) store active directory scanning information in the directory vnode. The legacy NAMEI_LOOKUP cases can be changed to use LK_SHARED once these VFS cases are fixed. In particular, we are now organized well enough to actually be able to do record locking within a directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't been done yet. Many thanks to all of the testers and in particular David Rhodus for finding a large number of panics and other issues.
The inode size must be unsigned-extended to a quad, not sign-extended, so the size of files in the 2-4GB range on DVDs are properly reported. Taken-from: Jun Kuriyama <email@example.com> post on freebsd-current@
VFS messaging/interfacing work stage 2/99. This stage retools the vnode ops vector dispatch, making the vop_ops a per-mount structure rather then a per-filesystem structure. Filesystem mount code, typically in blah_vfsops.c, must now register various vop_ops pointers in the struct mount to compile its VOP operations set. This change will allow us to begin adding per-mount hooks to VFSes to support things like kernel-level journaling, various forms of cache coherency management, and so forth. In addition, the vop_*() calls now require a struct vop_ops pointer as the first argument instead of a vnode pointer (note: in this commit the VOP_*() macros currently just pull the vop_ops pointer from the vnode in order to call the vop_*() procedures). This change is intended to allow us to divorce ourselves from the requirement that a vnode pointer always be part of a VOP call. In particular, this will allow namespace based routines such as remove(), mkdir(), stat(), and so forth to pass namecache pointers rather then locked vnodes and is a very important precursor to the goal of using the namecache for namespace locking.
VFS messaging/interfacing work stage 1/99. This stage replaces the old dynamic VFS descriptor and inlined wrapper mess with a fixed structure and fixed procedural wrappers. Most of the work is straightforward except for vfs_init, which was basically rewritten (and greatly simplified). It is my intention to make the vop_*() call wrappers eventually handle range locking and cache coherency issues as well as implementing the direct call -> messaging interface layer. The call wrappers will also API translation as we shift the APIs over to new, more powerful mechanisms in order to allow the work to be incrementally committed. This is the first stage of what is likely to be a huge number of stages to modernize the VFS subsystem.
Implement advisory locking support for the cd9660 filesystem.
Style(9) cleanup to src/sys/vfs, stage 7/21: isofs. - Convert K&R-style function definitions to ANSI style. Submitted-by: Andre Nathan <firstname.lastname@example.org> Additional-reformatting-by: cpressey
__P()!=wanted, remove old style prototypes from the vfs subtree
kernel tree reorganization stage 1: Major cvs repository work (not logged as commits) plus a major reworking of the #include's to accomodate the relocations. * CVS repository files manually moved. Old directories left intact and empty (temporary). * Reorganize all filesystems into vfs/, most devices into dev/, sub-divide devices by function. * Begin to move device-specific architecture files to the device subdirs rather then throwing them all into, e.g. i386/include * Reorganize files related to system busses, placing the related code in a new bus/ directory. Also move cam to bus/cam though this may not have been the best idea in retrospect. * Reorganize emulation code and place it in a new emulation/ directory. * Remove the -I- compiler option in order to allow #include file localization, rename all config generated X.h files to use_X.h to clean up the conflicts. * Remove /usr/src/include (or /usr/include) dependancies during the kernel build, beyond what is normally needed to compile helper programs. * Make config create 'machine' softlinks for architecture specific directories outside of the standard <arch>/include. * Bump the config rev. WARNING! after this commit /usr/include and /usr/src/sys/compile/* should be regenerated from scratch.
Register keyword removal Approved by: Matt Dillon
proc->thread stage 5: BUF/VFS clearance! Remove the ucred argument from vop_close, vop_getattr, vop_fsync, and vop_createvobject. These VOPs can be called from multiple contexts so the cred is fairly useless, and UFS ignorse it anyway. For filesystems (like NFS) that sometimes need a cred we use proc0.p_ucred for now. This removal also removed the need for a 'proc' reference in the related VFS procedures, which greatly helps our proc->thread conversion. bp->b_wcred and bp->b_rcred have also been removed, and for the same reason. It makes no sense to have a particular cred when multiple users can access a file. This may create issues with certain types of NFS mounts but if it does we will solve them in a way that doesn't pollute the struct buf.
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread pointers instead of process pointers as arguments, similar to what FreeBSD-5 did. Note however that ultimately both APIs are going to be message-passing which means the current thread context will not be useable for creds and descriptor access.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 1.62