DragonFly BSD

CVS log for src/sys/vfs/mfs/mfs_vfsops.c

[BACK] Up to [DragonFly] / src / sys / vfs / mfs

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.41: download - view: text, markup, annotated - select for diffs
Sat Jul 26 22:31:54 2008 UTC (6 years, 3 months ago) by mneumann
Branches: MAIN
CVS tags: HEAD
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +1 -1 lines
Fix style(9).

Revision 1.40: download - view: text, markup, annotated - select for diffs
Wed May 9 00:53:35 2007 UTC (7 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_2_0_Slip, DragonFly_RELEASE_2_0, DragonFly_RELEASE_1_12_Slip, DragonFly_RELEASE_1_12, DragonFly_RELEASE_1_10_Slip, DragonFly_RELEASE_1_10, DragonFly_Preview
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +2 -4 lines
Give the device major / minor numbers their own separate 32 bit fields
in the kernel.  Change dev_ops to use a RB tree to index major device
numbers and remove the 256 device major number limitation.

Build a dynamic major number assignment feature into dev_ops_add() and
adjust ASR (which already had a hand-rolled one), and MFS to use the
feature.  MFS at least does not require any filesystem visibility to
access its backing device.  Major devices numbers >= 256 are used for
dynamic assignment.

Retain filesystem compatibility for device numbers that fall within the
range that can be represented in UFS or struct stat (which is a single
32 bit field supporting 8 bit major numbers and 24 bit minor numbers).

Revision 1.39: download - view: text, markup, annotated - select for diffs
Sun May 6 19:23:34 2007 UTC (7 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +1 -1 lines
Use SYSREF to reference count struct vnode.  v_usecount is now
v_sysref(.refcnt).  v_holdcnt is now v_auxrefs.  SYSREF's termination state
(using a negative reference count from -0x40000000+) now places the vnode in
a VCACHED or VFREE state and deactivates it.  The vnode is now assigned a
64 bit unique id via SYSREF.

vhold() (which manipulates v_auxrefs) no longer reactivates a vnode and
is explicitly used only to track references from auxillary structures
and references to prevent premature destruction of the vnode.  vdrop()
will now only move a vnode from VCACHED to VFREE on the 1->0 transition
of v_auxrefs if the vnode is in a termination state.

vref() will now panic if used on a vnode in a termination state.  vget()
must now be used to explicitly reactivate a vnode.  These requirements
existed before but are now explicitly asserted.

vlrureclaim() and allocvnode() should now interact a bit better.  In
particular, vlrureclaim() will do a better job of finding vnodes to flush
and transition from VCACHED to VFREE, and allocvnode() will do a better
job finding vnodes to reuse without getting blocked by a flush.

allocvnode now uses a real VX lock to sequence vnodes into VRECLAIMED.  All
vnode special state processing now uses a VX lock.

Vnodes are now able to be slowly returned to the memory pool when
kern.maxvnodes is reduced at run time.

Various initialization elements have been moved to CTOR/DTOR and are
no longer in the critical path, improving performance.  However, since
SYSREF uses atomic_cmpset_int() (aka cmpxchgl), which reduces performance
somewhat, overall performance tends to be about the same.

Revision 1.38: download - view: text, markup, annotated - select for diffs
Sun Feb 25 23:17:13 2007 UTC (7 years, 8 months ago) by corecode
Branches: MAIN
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +1 -0 lines
Get rid of struct user/UAREA.

Merge procsig with sigacts and replace usage of procsig with
sigacts, like it used to be in 4.4BSD.

Put signal-related inline functions in sys/signal2.h.

Reviewed-by: Thomas E. Spanjaard <tgen@netphreax.net>

Revision 1.37: download - view: text, markup, annotated - select for diffs
Wed Feb 21 15:46:48 2007 UTC (7 years, 9 months ago) by corecode
Branches: MAIN
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +1 -1 lines
1:1 Userland threading stage 2.20/4:

Unify access to pending threads with a new function, lwp_sigpend(), which
returns pending signals for the lwp, which includes both lwp-specific
signals and signals pending on the process.  The new function lwp_delsig()
is used to remove a certain signal from the pending set of both process and
lwp.

Rework the places which access the pending signal list to either use those
two functions or, where not possibly, to work on both lwp and proc signal
lists.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Sat Feb 3 17:05:59 2007 UTC (7 years, 9 months ago) by corecode
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +1 -1 lines
1:1 Userland threading stage 2.11/4:

Move signals into lwps, take p_lwp out of proc.

Originally-Submitted-by:  David Xu <davidxu@freebsd.org>
Reviewed-by: Thomas E. Spanjaard <tgen@netphreax.net>

Revision 1.35: download - view: text, markup, annotated - select for diffs
Sun Sep 10 01:26:41 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +3 -3 lines
Change the kernel dev_t, representing a pointer to a specinfo structure,
to cdev_t.  Change struct specinfo to struct cdev.  The name 'cdev' was taken
from FreeBSD.  Remove the dev_t shim for the kernel.

This commit generally removes the overloading of 'dev_t' between userland and
the kernel.

Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being
properly converted to a userland dev_t.

Revision 1.34: download - view: text, markup, annotated - select for diffs
Fri Jul 28 02:17:41 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +21 -25 lines
MASSIVE reorganization of the device operations vector.  Change cdevsw
to dev_ops.  dev_ops is a syslink-compatible operations vector structure
similar to the vop_ops structure used by vnodes.

Remove a huge number of instances where a thread pointer is still being
passed as an argument to various device ops and other related routines.
The device OPEN and IOCTL calls now take a ucred instead of a thread pointer,
and the CLOSE call no longer takes a thread pointer.

Revision 1.33: download - view: text, markup, annotated - select for diffs
Tue Jul 18 22:22:15 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +2 -3 lines
Remove several layers in the vnode operations vector init code.  Declare
the operations vector directly instead of via a descriptor array.  Remove
most of the recalculation code, it stopped being needed over a year ago.

This work is similar to what FreeBSD now does, but was developed along a
different line.  Ultimately our vop_ops will become SYSLINK ops for userland
VFS and clustering support.

Revision 1.32: download - view: text, markup, annotated - select for diffs
Thu May 11 08:23:20 2006 UTC (8 years, 6 months ago) by swildner
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +0 -2 lines
* Remove the following obsolete options from the system:

  AAC_COMPAT_LINUX
  ACPI_MAX_THREADS
  AVM_A1_PCI
  CD9660_ROOT
  CPU_UPGRADE_HW_CACHE
  DEBUG_LINUX
  DEBUG_TOKENS
  DPT_ALLOW_MEMIO
  IDE_DELAY
  INVARIANT_SUPPORT
  KERNFS
  MFS_ROOT
  MFS_ROOT_SIZE
  NTIMECOUNTER
  OLTR_NO_BULLSEYE_MAC
  OLTR_NO_HAWKEYE_MAC
  OLTR_NO_TMS_MAC
  UGEN_DEBUG
  UHCI_DEBUG
  UHID_DEBUG
  UHUB_DEBUG
  UKBD_DEBUG
  ULPT_DEBUG
  UMASS_DEBUG
  UMS_DEBUG
  VM_KMEM_SIZE
  VM_KMEM_SIZE_MAX
  VM_KMEM_SIZE_SCALE

* Add numerous options to LINT

* Fix typo in options: TWA_FLASH_FIREWARE -> TWA_FLASH_FIRMWARE

* Fix typo in dgb.c: opt_depricated.h -> opt_deprecated.h

* Fix some minor manpage issues

Revision 1.31: download - view: text, markup, annotated - select for diffs
Sat May 6 18:48:53 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +14 -11 lines
Remove the thread argument from all mount->vfs_* function vectors,
replacing it with a ucred pointer when applicable.  This cleans up a
considerable amount of VFS function code that previously delved into
the process structure to get the cred, though some code remains.

Get rid of the compatibility thread argument for hpfs and nwfs.  Our
lockmgr calls are now mostly compatible with NetBSD (which doesn't use a
thread argument either).

Get rid of some complex junk in fdesc_statfs() that nobody uses.

Remove the thread argument from dounmount() as well as various other
filesystem specific procedures (quota calls primarily) which no longer
need it due to the lockmgr, VOP, and VFS cleanups.  These cleanups also
have the effect of making the VFS code slightly less dependant on the
calling thread's context.

Revision 1.30: download - view: text, markup, annotated - select for diffs
Sat May 6 16:20:18 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +2 -2 lines
Remove the thread argument from ffs_flushfiles(), ffs_mountfs(),
softdep_flushfiles(), ffs_reload(), ufs_quotaon(), and ufs_quotaoff().

Revision 1.29: download - view: text, markup, annotated - select for diffs
Thu May 4 18:32:23 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +37 -23 lines
Block devices generally truncate the size of I/O requests which go past EOF.
This is exactly what we want when manually reading or writing a block device
such as /dev/ad0s1a, but is not desired when a VFS issues I/O ops on
filesystem buffers.  In such cases, any EOF condition must be considered an
error.

Implement a new filesystem buffer flag B_BNOCLIP, which getblk() and friends
automatically set.  If set, block devices are guarenteed to return an error
if the I/O request is at EOF or would otherwise have to be clipped to EOF.
Block devices further guarentee that b_bcount will not be modified when this
flag is set.

Adjust all block device EOF checks to use the new flag, and clean up the code
while I'm there.  Also, set b_resid in a couple of degenerate cases where
it was not being set.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Sun Apr 2 01:35:34 2006 UTC (8 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +21 -1 lines
Give the MFS pseudo block device vnode a VM object, as is now required
for buffer cache operations.  Do not try to optimize it for now
(i.e. MFS will still double-cache everything).

Revision 1.27: download - view: text, markup, annotated - select for diffs
Sat Apr 1 20:46:53 2006 UTC (8 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +3 -1 lines
Use the vnode v_opencount and v_writecount universally.  They were previously
only used by specfs.  Require that VOP_OPEN and VOP_CLOSE calls match.
Assert on boundary errors.

Clean up umount's FORCECLOSE mode.  Adjust deadfs to allow duplicate closes
(which can happen due to a forced unmount or revoke).

Add vop_stdopen() and vop_stdclose() and adjust the default vnode ops to
call them.  All VFSs except DEADFS which supply their own vop_open and
vop_close now call vop_stdopen() and vop_stdclose() to handle v_opencount
and v_writecount adjustments.

Change the VOP_OPEN/fp specs.  VOP_OPEN (aka vop_stdopen) is now responsible
for filling in the file pointer information, rather than the caller of
VOP_OPEN.  Additionally, when supplied a file pointer, VOP_OPEN is now
allowed to populate the file pointer with a different vnode then the one
passed to it, which will be used later on to allow filesystems which
synthesize different vnodes on open, for example so we can create a generic
tty/pty pairing devices rather than scanning for an unused pty, and so we
can create swap-backed generic anonymous file descriptors rather than having
to use /tmp.  And for other purposes as well.

Fix UFS's mount/remount/unmount code to make the proper VOP_OPEN and
VOP_CLOSE calls when a filesystem is remounted read-only or read-write.

Revision 1.26: download - view: text, markup, annotated - select for diffs
Fri Mar 24 18:35:34 2006 UTC (8 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +8 -3 lines
Major BUF/BIO work commit.  Make I/O BIO-centric and specify the disk or
file location with a 64 bit offset instead of a 32 bit block number.

* All I/O is now BIO-centric instead of BUF-centric.

* File/Disk addresses universally use a 64 bit bio_offset now.  bio_blkno
  no longer exists.

* Stackable BIO's hold disk offset translations.  Translations are no longer
  overloaded onto a single structure (BUF or BIO).

* bio_offset == NOOFFSET is now universally used to indicate that a
  translation has not been made.  The old (blkno == lblkno) junk has all
  been removed.

* There is no longer a distinction between logical I/O and physical I/O.

* All driver BUFQs have been converted to BIOQs.

* BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*,
  and findblk all now take and/or return 64 bit byte offsets instead
  of block numbers.  Note that BMAP now returns a byte range for the before
  and after variables.

Revision 1.25: download - view: text, markup, annotated - select for diffs
Fri Feb 17 19:18:07 2006 UTC (8 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +15 -15 lines
Make the entire BUF/BIO system BIO-centric instead of BUF-centric.  Vnode
and device strategy routines now take a BIO and must pass that BIO to
biodone().  All code which previously managed a BUF undergoing I/O now
manages a BIO.

The new BIO-centric algorithms allow BIOs to be stacked, where each layer
represents a block translation, completion callback, or caller or device
private data.  This information is no longer overloaded within the BUF.
Translation layer linkages remain intact as a 'cache' after I/O has completed.

The VOP and DEV strategy routines no longer make assumptions as to which
translated block number applies to them.  The use the block number in the
BIO specifically passed to them.

Change the 'untranslated' constant to NOOFFSET (for bio_offset), and
(daddr_t)-1 (for bio_blkno).  Rip out all code that previously set the
translated block number to the untranslated block number to indicate
that the translation had not been made.

Rip out all the cluster linkage fields for clustered VFS and clustered
paging operations.  Clustering now occurs in a private BIO layer using
private fields within the BIO.

Reformulate the vn_strategy() and dev_dstrategy() abstraction(s).  These
routines no longer assume that bp->b_vp == the vp of the VOP operation, and
the dev_t is no longer stored in the struct buf.  Instead, only the vp passed
to vn_strategy() (and related *_strategy() routines for VFS ops), and
the dev_t passed to dev_dstrateg() (and related *_strategy() routines for
device ops) is used by the VFS or DEV code.  This will allow an arbitrary
number of translation layers in the future.

Create an independant per-BIO tracking entity, struct bio_track, which
is used to determine when I/O is in-progress on the associated device
or vnode.

NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold
the fields describing the data buffer, resid, and error state.

Major-testing-by: Stefan Krueger

Revision 1.24: download - view: text, markup, annotated - select for diffs
Fri Jan 13 21:09:27 2006 UTC (8 years, 10 months ago) by swildner
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +1 -1 lines
* Remove (void) casts for discarded return values.

* Put function types on separate lines.

* Ansify function definitions.

In-collaboration-with: Alexey Slynko <slynko@tronet.ru>

Revision 1.23: download - view: text, markup, annotated - select for diffs
Tue Jul 26 15:43:35 2005 UTC (9 years, 4 months ago) by hmp
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_4_Slip, DragonFly_RELEASE_1_4
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +12 -14 lines
Clean the VFS operations vector and related code:

* take advantage of C99 sparse structure initialisation, this allows
  us to initialise left out vfsops entries cleanly when vfs_register()
  is called; any vfsop entries that are not specified will be assigned
  vfs_std* functions.  the only exception to this rule is VFS_SYNC
  which is assigned vfs_stdnosync() since a file system may not have
  support for it.  file systems can simply assign vfs_stdsync if they
  do not have their own sync operation.

* add KKASSERTS to make sure that the VFS_ROOT, VFS_MOUNT and VFS_UNMOUNT
  vfs operations are provided by a file system being registered.  all of
  the above are necessary to ensure a minimally working file system.

* remove scattered no-op definitions of VFS_START() vfsop vector entry
  and take advantage of sparse vfsop initialisation.  VFS_START is only
  used by MFS to make ensure calling process is not swapped out when
  I/O is initialised.  The entry point is called from the mount path,
  before the file system is marked ready.

* remove scattered no-op definitions of VFS_QUOTACTL() vfsop vector entry
  and take advantage of sparse vfsop initialisation.

* give UFS a VFS_UNINIT vfsop entry and make use of it in ext2fs when
  ripping down the hash tables.

* many file systems in the kernel seem to not implement the complementing
  VFS_UNINIT() vfsop entry, this is not so much of a problem when the
  file system is compiled into the kernel, but it can leave leakage when
  compiled as KLD modules.  add uninitialisation code and entry points
  for ext2fs, ufs, fdescfs.  grab the ufs_ihash_token when free'ing the
  inode hash table at ripping time.

* add typedefs for all the vfsop entry points, make use of it in definition
  of struct vfsops; this results in clean and consolidate code.  use the
  typedefs for vfs_std* function prototypes.

Revision 1.22: download - view: text, markup, annotated - select for diffs
Mon Jun 6 15:09:38 2005 UTC (9 years, 5 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +6 -5 lines
Replace spl with critical sections.

Revision 1.21: download - view: text, markup, annotated - select for diffs
Wed Feb 2 21:34:18 2005 UTC (9 years, 9 months ago) by joerg
Branches: MAIN
CVS tags: DragonFly_Stable, DragonFly_RELEASE_1_2_Slip, DragonFly_RELEASE_1_2
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +2 -15 lines
Don't use the statfs field f_mntonname in filesystems. For the userland
export code, it can synthesized from mnt_ncp.
For debugging code, use f_mntfromname, it should be enough to find
culprit. The vfs_unmountall doesn't use code_fullpath to avoid problems
with resource allocation and to make it more likely that a call from ddb
succeds.
Change getfsstat and fhstatfs to not show directories outside a chroot
path, with the exception of the filesystem counting the chroot root itself.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Fri Dec 17 00:18:25 2004 UTC (9 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +1 -1 lines
VFS messaging/interfacing work stage 10/99:

Start adding the journaling, range locking, and (very slightly) cache
coherency infrastructure.  Continue cleaning up the VOP operations vector.

Expand on past commits that gave each mount structure its own set of VOP
operations vectors by adding additional vector sets for journaling or
cache coherency operations.  Remove the vv_jops and vv_cops fields
from the vnode operations vector in favor of placing those vop_ops directly
in the mount structure.  Reorganize the VOP calls as a double-indirect
and add a field to the mount structure which represents the current
vnode operations set (which will change when e.g. journaling is turned on
or off).  This creates the infrastructure necessary to allow us to stack
a generic journaling implementation on top of a filesystem.

Introduce a hard range-locking API for vnodes.   This API will be used by
high level system/vfs calls in order to handle atomicy guarentees.  It is
a prerequisit for: (1) being able to break I/O's up into smaller pieces
for the vm_page list/direct-to-DMA-without-mapping goal, (2) to support
the parallel write operations on a vnode goal, (3) to support the clustered
(remote) cache coherency goal, and (4) to support massive parallelism in
dispatching operations for the upcoming threaded VFS work.

This commit represents only infrastructure and skeleton/API work.

Revision 1.19: download - view: text, markup, annotated - select for diffs
Tue Oct 12 19:20:59 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +1 -0 lines
VFS messaging/interfacing work stage 8/99: Major reworking of the vnode
interlock and other miscellanious things.  This patch also fixes FS
corruption due to prior vfs work in head.  In particular, prior to this
patch the namecache locking could introduce blocking conditions that
confuse the old vnode deactivation and reclamation code paths.  With
this patch there appear to be no serious problems even after two days
of continuous testing.

* VX lock all VOP_CLOSE operations.
* Fix two NFS issues.  There was an incorrect assertion (found by
  David Rhodus), and the nfs_rename() code was not properly
  purging the target file from the cache, resulting in Stale file
  handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj.
* Fix a TTY session issue.  Programs which open("/dev/tty" ,...) and
  then run the TIOCNOTTY ioctl were causing the system to lose track
  of the open count, preventing the tty from properly detaching.
  This is actually a very old BSD bug, but it came out of the woodwork
  in DragonFly because I am now attempting to track device opens
  explicitly.
* Gets rid of the vnode interlock.  The lockmgr interlock remains.
* Introduced VX locks, which are mandatory vp->v_lock based locks.
* Rewrites the locking semantics for deactivation and reclamation.
  (A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE,
  and VOP_RECLAIM).  New guarentees emplaced with regard to vnode
  ripouts.
* Recodes the mountlist scanning routines to close timing races.
* Recodes getnewvnode to close timing races (it now returns a
  VX locked and refd vnode rather then a refd but unlocked vnode).
* Recodes VOP_REVOKE- a locked vnode is now mandatory.
* Recodes all VFS inode hash routines to close timing holes.
* Removes cache_leaf_test() - vnodes representing intermediate
  directories are now held so the leaf test should no longer be
  necessary.
* Splits the over-large vfs_subr.c into three additional source
  files, broken down by major function (locking, mount related,
  filesystem syncer).

* Changes splvm() protection to a critical-section in a number of
  places (bleedover from another patch set which is also about to be
  committed).

Known issues not yet resolved:

* Possible vnode/namecache deadlocks.
* While most filesystems now use vp->v_lock, I haven't done a final
  pass to make vp->v_lock mandatory and to clean up the few remaining
  inode based locks (nwfs I think and other obscure filesystems).
* NullFS gets confused when you hit a mount point in the underlying
  filesystem.
* Only UFS and NFS have been well tested
* NFS is not properly timing out namecache entries, causing changes made
  on the server to not be properly detected on the client if the client
  already has a negative-cache hit for the filename in question.

Testing-by: David Rhodus <sdrhodus@gmail.com>,
	    Peter Kadau <peter.kadau@tuebingen.mpg.de>,
	    walt <wa1ter@myrealbox.com>,
	    others

Revision 1.18: download - view: text, markup, annotated - select for diffs
Thu Sep 30 19:00:01 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +2 -4 lines
VFS messaging/interfacing work stage 7/99.  BEGIN DESTABILIZATION!

Implement the infrastructure required to allow us to begin switching to the
new nlookup() VFS API.

	filedesc->fd_ncdir, fd_nrdir, fd_njdir

	    File descriptors (associated with processes) now record the
	    namecache pointer related to the current directory, root directory,
	    and jail directory, in addition to the vnode pointers.  These
	    pointers are used as the basis for the new path lookup code
	    (nlookup() and friends).

	file->f_ncp

	    File pointers may now have a referenced+unlocked namecache
	    pointer associated with them.  All fp's representing directories
	    have this attached.  This allows fchdir() to properly record
	    the ncp in fdp->fd_ncdir and friends.

	mount->mnt_ncp

	    The namecache topology for crossing a mount point works as
	    follows: when looking up a path element which is a mount point,
	    cache_nlookup() will locate the ncp for the vnode-under the
	    mount point.  mount->mnt_ncp represents the root of the mount,
	    that is the vnode-over.  nlookup() detects the mount point and
	    accesses mount->mnt_ncp to skip past the vnode-under.  When going
	    backwards (..), nlookup() detects the case and skips backwards.

	    The ncp linkages are: ncp->ncp->ncp[vnode_under]->ncp[vnode_over].
	    That is, when going forwards or backwards nlookup must explicitly
	    skip over the double-ncp when crossing a mount point.  This allows
	    us to keep the namecache topology intact across mount points.

NEW CACHE level API functions:

	cache_get()	Reference and lock a namecache entry
	cache_put()	Dereference and unlock a namecache entry
	cache_lock()	lock an already-referenced namecache entry
	cache_unlock()	unlock a lockednamecache entry

	    NOTE: namecache locks are exclusive and recursive.  These are
	    the 'namespace' locks that we will be using to guarentee namespace
	    operations such as in a CREATE, RENAME, or REMOVE.

	vfs_cache_setroot() 	Set the new system-wide root directory
	cache_allocroot()   	System bootstrap helper function to allocate
			    	 the root namecache node.

	cache_resolve()		Resolve a NCF_UNRESOLVED namecache node.  The
				namecache node should be locked on call.

	cache_setvp()		(resolver) associate a VP or create a negative
				cache entry representation for a namecache
				pointer and clear NCF_UNRESOLVED.  The
				namecache node should be locked on call.

	cache_setunresolved()	Revert a resolved namecache entry back to an
				unresolved state, disassociating any vnode
				but leaving the topology intact.  The
				namecache node should be locked on call.

	cache_vget()		Obtain the locked+refd vnode related to
				a namecache entry, resolving the entry if
				necessary.  Return ENOENT if the entry
				represents a negative cache hit.

	cache_vref()		Obtained a refd (not locked) vnode related to
				a namecache entry, as above.

	cache_nlookup()		The new namecache lookup routine.  This routine
				does a lookup and allocates a new namecache
				node (into an unresolved state) if necessary.
				Returns a namecache record whether or not
				the item can be found and whether or not it
				represents a positive or negative hit.

	cache_lookup()		OLD API CODE DEPRECATED, but must be maintained
				until everything has been converted over.
	cache_enter()		OLD API CODE DEPRECATED, but must be maintained
				until everything has been converted over.

NEW default VOPs

	vop_noresolve()		Implements a namecache resolver for VFSs
				which are still using the old VOP_LOOKUP/
				VOP_CACHEDLOOKUP API (which is all of them
				still).

	VOP_LOOKUP		OLD API CODE DEPRECATED, but must be maintained
				until everything has been converted over.
	VOP_CACHEDLOOKUP	OLD API CODE DEPRECATED, but must be maintained
				until everything has been converted over.

NEW PATHNAME LOOKUP CODE

	nlookup_init()		Similar to NDINIT, initialize a nlookupdata
				structure for nlookup() and nlookup_done().

	nlookup()		Lookup a path.  Unlike the old namei/lookup
				code the new lookup code does not do any
				fancy pre-disposition of the cache for
				create/delete, it simply looks up the requested
				path and returns the appropriate locked
				namecache pointer.  The caller can obtain the
				vnode and directory vnode, as applicable, from
				the one namecache structure that is returned.

				Access checks are done on directories leading
				up to the result but not done on the returned
				namecache node.

	nlookup_done()		Mandatory routine to cleanup a nlookupdata
				structure after it has been initialized and
				all operations have been completed on it.

	nlookup_simple()	(in progress) all-in-one wrapped new lookup.

	nlookup_mp()		helper call for resolving a mount point's
				glue NCP.  hackish, will be cleaned up later.

	nreadsymlink()		helper call to resolve a symlink.  Note that
				the namecache does not yet cache symlink data
				but the intention is to eventually do so to
				avoid having to do VFS ops to get the data.

	naccess()		Perform access checks on a namecache node
				given a mode and cred.

	naccess_va()		Perform access cheks on a vattr given a
				mode and cred.

Begin switching VFS operations from using namei to using nlookup.
In this batch:

	* mount 	(install mnt_ncp for cross-mount-point handling in
			nlookup, simplify the vfs_mount() API to no longer
			pass a nameidata structure)
	* [l]stat	(use nlookup)
	* [f]chdir	(use nlookup, use recorded f_ncp)
	* [f]chroot	(use nlookup, use recorded f_ncp)

Revision 1.17: download - view: text, markup, annotated - select for diffs
Sat Aug 28 19:02:17 2004 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004, DragonFly_Snap13Sep2004
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +1 -1 lines
VFS messaging/interfacing work stage 4/99.  This stage goes a long ways
towards allowing us to move the vnode locking into a kernel layer.  It
gets rid of a lot of cruft from FreeBSD-4.  FreeBSD-5 has done some of this
stuff too (such as changing the default locking to stdlock from nolock),
but DragonFly is going further.

* Consolidate vnode locks into the vnode structure, add an embedded v_lock,
  and getting rid of both v_vnlock and v_data based head-of-structure locks.

* Change the default vops to use a standard vnode lock rather then a fake
  non-lock.

* Get rid of vop_nolock() and friends, we no longer support non-locking
  vnodes.

* Get rid of vop_sharedlock(), we no longer support non standard shared-only
  locks (only NFS was using it and the mount-crossing lookup code should
  now prevent races to root from dead NFS volumes).

* Integrate lock initialization into getnewvnode().  We do not yet
  incorporate automatically locking into getnewvnode().  getnewvnode()
  now has two additional arguments, lktimeout and lkflags, for lock
  structure initialization.

* Change the sync vnode lock from nolock to stdlock.  This may require more
  tuning down the line.  Fix various sync_inactive() to properly unlock
  the lock as per the VOP API.

* Properly flag the 'rename' vop operation regarding required tdvp and tvp
  unlocks (the flags are only used by nullfs).

* Get rid of all inode-embedded vnode locks

* Remove manual lockinit and use new getnewvnode() args instead.
  Lock the vnode prior to doing anything that might block in
  order to avoid synclist access before the vnode has been properly
  initialize.

* Generally change inode hash insertion to also check
  for a hash collision and return failure if it occurs,
  rather then doing (often non-atomic) relookups and
  other checks.  These sorts of collisions can occur
  if a vnode is being destroyed at the same time a new
  vnode is being created from an inode.  A new vnode is
  not generally accessible, except by the sync code (from
  the mountlist) until it's underlying inode has been hashed
  so dealing with a hash collision should be as simple as
  throwing away the vnode with a vput().

* Do not initialize a new vnode's v_data until after
  the associated inode has been successfully added to
  the hash, and make the xxx_inactive() and xxx_reclaim()
  code friendly towards vnodes with a NULL v_data.

* NFS now uses standard locks rather then shared-only locks.

* PROCFS now uses standard locks rather then non-locks, and PROCFS's
  lookup code now understands VOP lookup semantics.  PROCFS now uses
  a real hash table for its node search rather then a single singly-linked
  list (which should better scale to systems with thousands of processes).

* NULLFS should now properly handle lookup() and rename() locks.  NULLFS's
  node handling code has been rewritten.  NULLFS's bypass code now understands
  vnode unlocks (rename case).

* UFS no longer needs the ffs_inode_hash_lock hacks.  It now uses the new
  collision-on-hash-add methodology.   This will speed up UFS when operating
  on lots of small files (reported by David Rhodus).

Revision 1.16: download - view: text, markup, annotated - select for diffs
Fri Aug 13 17:51:11 2004 UTC (10 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +2 -2 lines
VFS messaging/interfacing work stage 1/99.  This stage replaces the old
dynamic VFS descriptor and inlined wrapper mess with a fixed structure
and fixed procedural wrappers.  Most of the work is straightforward except
for vfs_init, which was basically rewritten (and greatly simplified).

It is my intention to make the vop_*() call wrappers eventually handle
range locking and cache coherency issues as well as implementing the
direct call -> messaging interface layer.  The call wrappers will also
API translation as we shift the APIs over to new, more powerful mechanisms
in order to allow the work to be incrementally committed.

This is the first stage of what is likely to be a huge number of stages
to modernize the VFS subsystem.

Revision 1.15: download - view: text, markup, annotated - select for diffs
Wed May 19 22:53:04 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +70 -9 lines
Device layer rollup commit.

* cdevsw_add() is now required.  cdevsw_add() and cdevsw_remove() may specify
  a mask/match indicating the range of supported minor numbers.  Multiple
  cdevsw_add()'s using the same major number, but distinctly different
  ranges, may be issued.  All devices that failed to call cdevsw_add() before
  now do.

* cdevsw_remove() now automatically marks all devices within its supported
  range as being destroyed.

* vnode->v_rdev is no longer resolved when the vnode is created.  Instead,
  only v_udev (a newly added field) is resolved.  v_rdev is resolved when
  the vnode is opened and cleared on the last close.

* A great deal of code was making rather dubious assumptions with regards
  to the validity of devices associated with vnodes, primarily due to
  the persistence of a device structure due to being indexed by (major, minor)
  instead of by (cdevsw, major, minor).  In particular, if you run a program
  which connects to a USB device and then you pull the USB device and plug
  it back in, the vnode subsystem will continue to believe that the device
  is open when, in fact, it isn't (because it was destroyed and recreated).

  In particular, note that all the VFS mount procedures now check devices
  via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev
  is NULL prior to the first open.

* The disk layer's device interaction has been rewritten.  The disk layer
  (i.e. the slice and disklabel management layer) no longer overloads
  its data onto the device structure representing the underlying physical
  disk.  Instead, the disk layer uses the new cdevsw_add() functionality
  to register its own cdevsw using the underlying device's major number,
  and simply does NOT register the underlying device's cdevsw.  No
  confusion is created because the device hash is now based on
  (cdevsw,major,minor) rather then (major,minor).

  NOTE: This also means that underlying raw disk devices may use the entire
  device minor number instead of having to reserve the bits used by the disk
  layer, and also means that can we (theoretically) stack a fully
  disklabel-supported 'disk' on top of any block device.

* The new reference counting scheme prevents this by associating a device
  with a cdevsw and disconnecting the device from its cdevsw when the cdevsw
  is removed.  Additionally, all udev2dev() lookups run through the cdevsw
  mask/match and only successfully find devices still associated with an
  active cdevsw.

* Major work on MFS:  MFS no longer shortcuts vnode and device creation.  It
  now creates a real vnode and a real device and implements real open and
  close VOPs.  Additionally, due to the disk layer changes, MFS is no longer
  limited to 255 mounts.  The new limit is 16 million.  Since MFS creates a
  real device node, mount_mfs will now create a real /dev/mfs<PID> device
  that can be read from userland (e.g. so you can dump an MFS filesystem).

* BUF AND DEVICE STRATEGY changes.  The struct buf contains a b_dev field.
  In order to properly handle stacked devices we now require that the b_dev
  field be initialized before the device strategy routine is called.  This
  required some additional work in various VFS implementations.  To enforce
  this requirement, biodone() now sets b_dev to NODEV.  The new disk layer
  will adjust b_dev before forwarding a request to the actual physical
  device.

* A bug in the ISO CD boot sequence which resulted in a panic has been fixed.

Testing by: lots of people, but David Rhodus found the most aggregious bugs.

Revision 1.14: download - view: text, markup, annotated - select for diffs
Thu May 13 23:49:26 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +1 -1 lines
device switch 1/many: Remove d_autoq, add d_clone (where d_autoq was).

d_autoq was used to allow the device port dispatch to mix old-style synchronous
calls with new style messaging calls within a particular device.  It was never
used for that purpose.

d_clone will be more fully implemented as work continues.  We are going to
install d_port in the dev_t (struct specinfo) structure itself and d_clone
will be needed to allow devices to 'revector' the port on a minor-number
by minor-number basis, in particular allowing minor numbers to be directly
dispatched to distinct threads.  This is something we will be needing later
on.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Thu Apr 15 00:59:41 2004 UTC (10 years, 7 months ago) by cpressey
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +3 -10 lines
Style(9) cleanup to src/sys/vfs, stage 8/21: mfs.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Additional-reformatting-by: cpressey

Revision 1.12: download - view: text, markup, annotated - select for diffs
Mon Dec 1 04:38:26 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +1 -0 lines
Add a missing PRELE() when the mfs_mount kernel process exits.  Because
DragonFly does not teardown zombie processes while p_lock is non-zero this
prevented such processes from being reaped and deadlocked the init process.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Wed Aug 20 09:56:32 2003 UTC (11 years, 3 months ago) by rob
Branches: MAIN
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +6 -6 lines
__P()!=wanted, remove old style prototypes from the vfs subtree

Revision 1.10: download - view: text, markup, annotated - select for diffs
Thu Aug 7 21:17:41 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +8 -9 lines
kernel tree reorganization stage 1: Major cvs repository work (not logged as
commits) plus a major reworking of the #include's to accomodate the
relocations.

    * CVS repository files manually moved.  Old directories left intact
      and empty (temporary).

    * Reorganize all filesystems into vfs/, most devices into dev/,
      sub-divide devices by function.

    * Begin to move device-specific architecture files to the device
      subdirs rather then throwing them all into, e.g. i386/include

    * Reorganize files related to system busses, placing the related code
      in a new bus/ directory.  Also move cam to bus/cam though this may
      not have been the best idea in retrospect.

    * Reorganize emulation code and place it in a new emulation/ directory.

    * Remove the -I- compiler option in order to allow #include file
      localization, rename all config generated X.h files to use_X.h to
      clean up the conflicts.

    * Remove /usr/src/include (or /usr/include) dependancies during the
      kernel build, beyond what is normally needed to compile helper
      programs.

    * Make config create 'machine' softlinks for architecture specific
      directories outside of the standard <arch>/include.

    * Bump the config rev.

    WARNING! after this commit /usr/include and /usr/src/sys/compile/*
    should be regenerated from scratch.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Sat Jul 26 22:04:26 2003 UTC (11 years, 4 months ago) by rob
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +5 -5 lines
Register keyword removal

Approved by: Matt Dillon

Revision 1.8: download - view: text, markup, annotated - select for diffs
Thu Jul 24 20:43:18 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +1 -1 lines
Have MFS register a device as a VCHR instead of VBLK, fixing a panic.

Report-by: Joerg Sonnenberger <joerg@britannica.bec.de>

Revision 1.7: download - view: text, markup, annotated - select for diffs
Mon Jul 21 05:50:47 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +9 -5 lines
DEV messaging stage 1/4: Rearrange struct cdevsw and add a message port
and auto-queueing mask.  The mask will tell us which message functions
can be safely queued to another thread and which still need to run in the
context of the caller.   Primary configuration fields (name, cmaj, flags,
port, autoq mask) are now at the head of the structure.  Function vectors,
which may eventually go away, are at the end.  The port and autoq fields
are non-functional in this stage.

The old BDEV device major number support has also been removed from cdevsw,
and code has been added to translate the bootdev passed from the boot code
(the boot code has always passed the now defunct block device major numbers
and we obviously need to keep that compatibility intact).

Revision 1.6: download - view: text, markup, annotated - select for diffs
Sat Jul 19 21:14:52 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +2 -5 lines
Remove the priority part of the priority|flags argument to tsleep().  Only
flags are passed now.  The priority was a user scheduler thingy that is not
used by the LWKT subsystem.  For process statistics assume sleeps without
P_SINTR set to be disk-waits, and sleeps with it set to be normal sleeps.

This commit should not contain any operational changes.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Sun Jul 6 21:23:55 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +2 -1 lines
MP Implementation 1/2: Get the APIC code working again, sweetly integrate the
MP lock into the LWKT scheduler, replace the old simplelock code with
tokens or spin locks as appropriate.  In particular, the vnode interlock
(and most other interlocks) are now tokens.  Also clean up a few curproc/cred
sequences that are no longer needed.

The APs are left in degenerate state with non IPI interrupts disabled as
additional LWKT work must be done before we can really make use of them,
and FAST interrupts are not managed by the MP lock yet.  The main thing
for this stage was to get the system working with an APIC again.

buildworld tested on UP and 2xCPU/MP (Dell 2550)

Revision 1.4: download - view: text, markup, annotated - select for diffs
Wed Jun 25 03:56:12 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +17 -22 lines
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread
pointers instead of process pointers as arguments, similar to what FreeBSD-5
did.  Note however that ultimately both APIs are going to be message-passing
which means the current thread context will not be useable for creds and
descriptor access.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Thu Jun 19 01:55:08 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +3 -1 lines
thread stage 5: Separate the inline functions out of sys/buf.h, creating
sys/buf2.h (A methodology that will continue as time passes).  This solves
inline vs struct ordering problems.

Do a major cleanup of the globaldata access methodology.  Create a
gcc-cacheable 'mycpu' macro & inline to access per-cpu data.  Atomicy is not
required because we will never change cpus out from under a thread, even if
it gets preempted by an interrupt thread, because we want to be able to
implement per-cpu caches that do not require locked bus cycles or special
instructions.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Jun 17 04:28:59 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +1 -0 lines
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jun 17 02:55:54 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
CVS tags: FREEBSD_4_FORK
import from FreeBSD RELENG_4 1.81.2.3

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options