DragonFly BSD

CVS log for src/sys/kern/vfs_subr.c

[BACK] Up to [DragonFly] / src / sys / kern

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.116.2.2: download - view: text, markup, annotated - select for diffs
Thu Sep 25 02:20:46 2008 UTC (5 years, 6 months ago) by dillon
Branches: DragonFly_RELEASE_2_0
CVS tags: DragonFly_RELEASE_2_0_Slip
Diff to: previous 1.116.2.1: preferred, unified; branchpoint 1.116: preferred, unified; next MAIN 1.117: preferred, unified
Changes since revision 1.116.2.1: +2 -0 lines
MFC numerous features from HEAD.

* NFS export support for nullfs mounted filesystems,
  intended for nullfs mounted hammer PFSs.

* Each nullfs mount constructs a unique fsid based on
  the underlying mount.

* Each nullfs mount maintains its own netexport structure.

* The mount pointer in the nch (namecache handle) is passed
  into FHTOVP and friends, allowing operations to occur
  on the underlying vnodes but still go through the nullfs
  mount.

Revision 1.118: download - view: text, markup, annotated - select for diffs
Wed Sep 17 21:44:18 2008 UTC (5 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: HEAD
Diff to: previous 1.117: preferred, unified
Changes since revision 1.117: +2 -0 lines
* Implement the ability to export NULLFS mounts via NFS.

* Enforce PFS isolation when exporting a HAMMER PFS via a NULLFS mount.

NOTE: Exporting anything other then HAMMER PFS root's via nullfs does
NOT protect the parent of the exported directory from being accessed via NFS.

Generally speaking this feature is implemented by giving each nullfs mount
a synthesized fsid based on what is being mounted and implementing the
NFS export infrastructure in the nullfs code instead of just bypassing those
functions to the underyling VFS.

Revision 1.116.2.1: download - view: text, markup, annotated - select for diffs
Sat Aug 2 14:34:29 2008 UTC (5 years, 8 months ago) by dillon
Branches: DragonFly_RELEASE_2_0
Diff to: previous 1.116: preferred, unified
Changes since revision 1.116: +8 -10 lines
MFC 1.117 - fix desiredvnodes calculation for machines with >2G ram.

Requested-by: Francois Tigeot <ftigeot@wolfpond.org>

Revision 1.117: download - view: text, markup, annotated - select for diffs
Sun Jul 27 17:37:52 2008 UTC (5 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.116: preferred, unified
Changes since revision 1.116: +8 -10 lines
Adjust the desiredvnodes (kern.maxvnodes) calculation for machines
with 3G+ of ram to prevent it from blowing out KVM.

Reported-by: Michael Neumann <mneumann@ntecs.de>

Revision 1.116: download - view: text, markup, annotated - select for diffs
Sat Jul 12 02:44:59 2008 UTC (5 years, 9 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Preview
Branch point for: DragonFly_RELEASE_2_0
Diff to: previous 1.115: preferred, unified
Changes since revision 1.115: +1 -1 lines
Correct a bug in the last commit.

Revision 1.115: download - view: text, markup, annotated - select for diffs
Sat Jul 12 01:09:46 2008 UTC (5 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.114: preferred, unified
Changes since revision 1.114: +17 -0 lines
Add a vclean_unlocked() call that allows HAMMER to try to get rid of a
vnode.

Revision 1.114: download - view: text, markup, annotated - select for diffs
Sun May 18 05:54:25 2008 UTC (5 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.113: preferred, unified
Changes since revision 1.113: +16 -0 lines
Fix a number of core kernel issues related to HAMMER operation.

* The cluster code was incorrectly using the maximum IO size from
  the filesystem on which /dev is mounted instead of the maximum
  IO size of the block device.  This became evident when HAMMER
  (with 16K blocks) tried to call cluster_read() via /dev/ad6s1h
  (on UFS with 8K blocks).

* Change the way the VNLRU code works to avoid an infinite loop in
  vmntvnodescan().  The vnode LRU recycling code was cycling vnodes
  from the head of mp->mnt_nvnodelist to the tail.  Under certain heavy
  load conditions this could cause a vmntvnodescan() to never finish
  running and eventually hit a count assertion (at 1,000,000 vnodes scanned).

  Instead of cycling the vnodes in the mnt_nvnodelist, use the syncer
  vnode (mount->mnt_syncer) as a placemarker and move *IT* within the
  list to represent the LRU scan.  By not cycling vnodes to the end
  of the list, vmntvnodescan() can no longer get into an infinite loop.

* Change the mount->mnt_syncer logic slightly to avoid races against
  a background sync while unmounting.  The field is no longer cleared
  by the sync_reclaim() call but is instead cleared by the unmount code
  before vrele()ing the special vnode.

Revision 1.113: download - view: text, markup, annotated - select for diffs
Thu May 8 01:41:05 2008 UTC (5 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.112: preferred, unified
Changes since revision 1.112: +5 -0 lines
Fix a race between the namecache and the vnode recycler.  A vnode cannot be
recycled if it's namecache entry represents a directory with locked children.
The various VOP_N*() functions require the parent dvp to be stable.

The main fix is in vrecycle() (kern/vfs_subr.c).  Do not vgone() the vnode
if we can't clean out the children.

Also create an API to assert that the parent dvp is stable, and make it
vhold/vdrop the dvp.

The race primarily effected HAMMER which uses the VOP_N*() API.

Revision 1.112: download - view: text, markup, annotated - select for diffs
Wed Apr 30 17:34:11 2008 UTC (5 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.111: preferred, unified
Changes since revision 1.111: +9 -0 lines
Have vfsync() call buf_checkwrite() on buffers with bioops to determine
whether it is ok to write out a buffer or not.  Used by HAMMER to prevent
specfs from syncing out meta-data at the wrong time.

Revision 1.111: download - view: text, markup, annotated - select for diffs
Tue Feb 5 20:49:49 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_12_Slip, DragonFly_RELEASE_1_12
Diff to: previous 1.110: preferred, unified
Changes since revision 1.110: +5 -4 lines
* Implement a mountctl() op for setting export control on a filesystem.

* Adjust mountd to try to use the mountctl() op BEFORE calling a UFS-style
  mount() to set export ops for a filesystem.

* Add a prototype for the mountctl() system call in sys/mountctl.h

* Cleanup WARNS for the mountctl utility.

Revision 1.110: download - view: text, markup, annotated - select for diffs
Sat Jan 5 14:02:38 2008 UTC (6 years, 3 months ago) by swildner
Branches: MAIN
Diff to: previous 1.109: preferred, unified
Changes since revision 1.109: +1 -2 lines
For kmalloc(), MALLOC() and contigmalloc(), use M_ZERO instead of
explicitly bzero()ing.

Reviewed-by: sephe

Revision 1.109: download - view: text, markup, annotated - select for diffs
Wed Nov 7 00:46:36 2007 UTC (6 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.108: preferred, unified
Changes since revision 1.108: +3 -0 lines
Add bio_ops->io_checkread and io_checkwrite - a read and write pre-check
which gives HAMMER a chance to set B_LOCKED if the kernel wants to write out
a passively held buffer.

Change B_LOCKED semantics slightly.  B_LOCKED buffers will not be written
until B_LOCKED is cleared.  This allows HAMMER to hold off B_DELWRI writes
on passively held buffers.

Revision 1.108: download - view: text, markup, annotated - select for diffs
Fri Nov 2 19:52:25 2007 UTC (6 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.107: preferred, unified
Changes since revision 1.107: +1 -0 lines
Modify struct vattr:
	Increase va_nlink, va_fileid (the inode number), and va_gen from
	32 bit to 64 bit integers.

	Add va_uid_uuid, va_gid_uuid, and va_fsid_uuid, and flags to
	indicate that these fields are valid.  The original va_uid and
	va_gid are retained.

	This change has no external visibility.

Modify struct statvfs:
	Use spare fields to add f_fsid_uuid and f_uid_uuid to the
	structure, and flags indicating that those fields are valid.

	This change has minimal external visibility. The size of the
	structure has not changed.

Modify struct stat:
	Add a new file type S_IFDB.  DB files are like regular files but
	access data on a record by record basis.  The seek position is a
	64 bit record key and not a byte offset.  Further work in this
	area will be done later on to support related UIO operations.

	This change has minimal external visibility. The size of the
	structure has not changed.

Revision 1.107: download - view: text, markup, annotated - select for diffs
Wed Oct 24 21:56:41 2007 UTC (6 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.106: preferred, unified
Changes since revision 1.106: +11 -2 lines
Reactivate a vnode after associated it with deadfs after a forced unmount.
This fixes numerous system panics that can occur due to the vnode's
unexpected change in state.

Submitted-by: "Nicolas Thery" <nthery@gmail.com>

Revision 1.105.2.1: download - view: text, markup, annotated - select for diffs
Tue Jul 31 22:40:50 2007 UTC (6 years, 8 months ago) by dillon
Branches: DragonFly_RELEASE_1_10
CVS tags: DragonFly_RELEASE_1_10_Slip
Diff to: previous 1.105: preferred, unified; next MAIN 1.106: preferred, unified
Changes since revision 1.105: +1 -1 lines
Synchronize all changes made in HEAD to date with the 1.10 release branch.

* usbdevs update
* header file fixes
* vinum root
* vinum device I/O fixes
* MD fixes
* New PCI ids for netif rum and ural
* New USB uplcom ids
* linux exec memory leak
* devclass ordering fixes (sound devices)
* rate-limited kprintf support (filesystem full console spams)
* msdosfs fixes
* Manual page work

Revision 1.106: download - view: text, markup, annotated - select for diffs
Tue Jul 31 01:14:50 2007 UTC (6 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.105: preferred, unified
Changes since revision 1.105: +1 -1 lines
vrecycle() is typically called from a VFS's inactive function, which
in turn is called when the sysref reference count transitions from
1->0xc0000000.  Fix a test that was causing the vnode to not be immediately
recycled.

msdosfs depended on the recyclement behavior due to rename-over issues -
msdosfs is not allowed to reuse an 'inode' (which is basically the position
of a directory entry in a directory) until the underlying file is entirely
reclaimed.

This also effected ufs somewhat by preventing inodes from being reused as
quickly as they could be, but ufs allocates inodes dynamically and masked
the problem.

Reported-by: walt <wa1ter@myrealbox.com>

Revision 1.105: download - view: text, markup, annotated - select for diffs
Fri Jun 8 02:00:45 2007 UTC (6 years, 10 months ago) by dillon
Branches: MAIN
Branch point for: DragonFly_RELEASE_1_10
Diff to: previous 1.104: preferred, unified
Changes since revision 1.104: +1 -1 lines
Formalize the object sleep/wakeup code when waiting on a dead VM object and
remove spurious calls to wakeup().

Revision 1.104: download - view: text, markup, annotated - select for diffs
Wed May 9 00:53:34 2007 UTC (6 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.103: preferred, unified
Changes since revision 1.103: +19 -13 lines
Give the device major / minor numbers their own separate 32 bit fields
in the kernel.  Change dev_ops to use a RB tree to index major device
numbers and remove the 256 device major number limitation.

Build a dynamic major number assignment feature into dev_ops_add() and
adjust ASR (which already had a hand-rolled one), and MFS to use the
feature.  MFS at least does not require any filesystem visibility to
access its backing device.  Major devices numbers >= 256 are used for
dynamic assignment.

Retain filesystem compatibility for device numbers that fall within the
range that can be represented in UFS or struct stat (which is a single
32 bit field supporting 8 bit major numbers and 24 bit minor numbers).

Revision 1.103: download - view: text, markup, annotated - select for diffs
Tue May 8 02:31:42 2007 UTC (6 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.102: preferred, unified
Changes since revision 1.102: +8 -8 lines
Replace NOCDEV with NULL.  NOCDEV was ((void *)-1) and as inherited
from *BSD a long time ago due to the device pointer / device number
duality.  Now that the pointer and device number has been separated, we
can just use NULL to indicate no-pointer.

Replace si_refs with si_sysref.  Use SYSREF for ref-count cdev_t.  Enable
cdev_t reclamation on deletion.

Revision 1.102: download - view: text, markup, annotated - select for diffs
Sun May 6 19:23:31 2007 UTC (6 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.101: preferred, unified
Changes since revision 1.101: +26 -37 lines
Use SYSREF to reference count struct vnode.  v_usecount is now
v_sysref(.refcnt).  v_holdcnt is now v_auxrefs.  SYSREF's termination state
(using a negative reference count from -0x40000000+) now places the vnode in
a VCACHED or VFREE state and deactivates it.  The vnode is now assigned a
64 bit unique id via SYSREF.

vhold() (which manipulates v_auxrefs) no longer reactivates a vnode and
is explicitly used only to track references from auxillary structures
and references to prevent premature destruction of the vnode.  vdrop()
will now only move a vnode from VCACHED to VFREE on the 1->0 transition
of v_auxrefs if the vnode is in a termination state.

vref() will now panic if used on a vnode in a termination state.  vget()
must now be used to explicitly reactivate a vnode.  These requirements
existed before but are now explicitly asserted.

vlrureclaim() and allocvnode() should now interact a bit better.  In
particular, vlrureclaim() will do a better job of finding vnodes to flush
and transition from VCACHED to VFREE, and allocvnode() will do a better
job finding vnodes to reuse without getting blocked by a flush.

allocvnode now uses a real VX lock to sequence vnodes into VRECLAIMED.  All
vnode special state processing now uses a VX lock.

Vnodes are now able to be slowly returned to the memory pool when
kern.maxvnodes is reduced at run time.

Various initialization elements have been moved to CTOR/DTOR and are
no longer in the critical path, improving performance.  However, since
SYSREF uses atomic_cmpset_int() (aka cmpxchgl), which reduces performance
somewhat, overall performance tends to be about the same.

Revision 1.101: download - view: text, markup, annotated - select for diffs
Thu Dec 28 18:29:03 2006 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8
Diff to: previous 1.100: preferred, unified
Changes since revision 1.100: +2 -2 lines
Introduce globals: KvaStart, KvaEnd, and KvaSize.  Used by the kernel
instead of the nutty VADDR and VM_*_KERNEL_ADDRESS macros.  Move extern
declarations for these variables as well as for virtual_start, virtual_end,
and phys_avail[] from MD headers to MI headers.

Make kernel_object a global structure instead of a pointer.

Remove kmem_object and all related code (none of it is used any more).

Revision 1.100: download - view: text, markup, annotated - select for diffs
Sat Dec 23 00:35:04 2006 UTC (7 years, 3 months ago) by swildner
Branches: MAIN
Diff to: previous 1.99: preferred, unified
Changes since revision 1.99: +17 -17 lines
Rename printf -> kprintf in sys/ and add some defines where necessary
(files which are used in userland, too).

Revision 1.99: download - view: text, markup, annotated - select for diffs
Tue Sep 19 11:47:36 2006 UTC (7 years, 6 months ago) by corecode
Branches: MAIN
Diff to: previous 1.98: preferred, unified
Changes since revision 1.98: +0 -4 lines
1:1 Userland threading stage 2.9/4:

Push out p_thread a little bit more

Revision 1.98: download - view: text, markup, annotated - select for diffs
Sun Sep 10 01:26:39 2006 UTC (7 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.97: preferred, unified
Changes since revision 1.97: +12 -12 lines
Change the kernel dev_t, representing a pointer to a specinfo structure,
to cdev_t.  Change struct specinfo to struct cdev.  The name 'cdev' was taken
from FreeBSD.  Remove the dev_t shim for the kernel.

This commit generally removes the overloading of 'dev_t' between userland and
the kernel.

Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being
properly converted to a userland dev_t.

Revision 1.97: download - view: text, markup, annotated - select for diffs
Sat Sep 9 19:34:46 2006 UTC (7 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.96: preferred, unified
Changes since revision 1.96: +8 -8 lines
Rename the kernel NODEV to NOCDEV to avoid conflicts with the userland NODEV.

Revision 1.96: download - view: text, markup, annotated - select for diffs
Sat Sep 9 19:07:28 2006 UTC (7 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.95: preferred, unified
Changes since revision 1.95: +4 -4 lines
Rename struct specinfo into struct cdev.  Add a new typedef 'cdev_t' for cdev
pointers.  Temporarily retain dev_t for cdev pointers until the kernel can
be converted over to cdev_t.

Revision 1.95: download - view: text, markup, annotated - select for diffs
Tue Sep 5 00:55:45 2006 UTC (7 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.94: preferred, unified
Changes since revision 1.94: +6 -6 lines
Rename malloc->kmalloc, free->kfree, and realloc->krealloc.  Pass 1

Revision 1.94: download - view: text, markup, annotated - select for diffs
Sat Aug 12 00:26:20 2006 UTC (7 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.93: preferred, unified
Changes since revision 1.93: +3 -25 lines
VNode sequencing and locking - part 3/4.

VNode aliasing is handled by the namecache (aka nullfs), so there is no
longer a need to have VOP_LOCK, VOP_UNLOCK, or VOP_ISSLOCKED as 'VOP'
functions.  Both NFS and DEADFS have been using standard locking functions
for some time and are no longer special cases.  Replace all uses with
native calls to vn_lock, vn_unlock, and vn_islocked.

We can't have these as VOP functions anyhow because of the introduction of
the new SYSLINK transport layer, since vnode locks are primarily used to
protect the local vnode structure itself.

Revision 1.93: download - view: text, markup, annotated - select for diffs
Fri Aug 11 01:54:59 2006 UTC (7 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.92: preferred, unified
Changes since revision 1.92: +24 -11 lines
VNode sequencing and locking - part 2/4.

Control access to v_usecount and v_holdcnt with the vnode's lock's spinlock.
Use the spinlock to interlock the VRECLAIMED and VINACTIVE flags during
1->0 and 0->1 transitions.  N->N+1 transitions do not need to obtain the
spinlock and simply use a locked bus cycle increment.  Vnode operations
are still not MP safe but this gets further along that road.

The lockmgr can no longer fail when obtaining an exclusive lock, remove
the error code return from vx_lock() and vx_get().  Add special lockmgr
support routines to atomically acquire and release an exclusive lock
when the caller is already holding the spinlock.

The removal of vnodes from the vnode free list is now defered.  Removal
only occurs when allocvnode() encounters a vnode on the list which should
not be on it.  This improves critical code paths for vget(), vput() and
vrele() by removing unnecessary manipulation of the freelist.

Fix a lockmgr bug where wakeup() was being called with a spinlock held.
Instead, defer the wakeup until after the spinlock is released.

Revision 1.92: download - view: text, markup, annotated - select for diffs
Wed Aug 9 22:47:32 2006 UTC (7 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.91: preferred, unified
Changes since revision 1.91: +4 -7 lines
VNode sequencing and locking - part 1/4.

Separate vref() for the case where the ref count is already non-zero (which
is nearly all uses of vref()), vs the case where it might be zero.  Clean
up the code in preparation for putting it under a spinlock.

Revision 1.91: download - view: text, markup, annotated - select for diffs
Tue Jul 18 22:22:12 2006 UTC (7 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.90: preferred, unified
Changes since revision 1.90: +2 -3 lines
Remove several layers in the vnode operations vector init code.  Declare
the operations vector directly instead of via a descriptor array.  Remove
most of the recalculation code, it stopped being needed over a year ago.

This work is similar to what FreeBSD now does, but was developed along a
different line.  Ultimately our vop_ops will become SYSLINK ops for userland
VFS and clustering support.

Revision 1.90: download - view: text, markup, annotated - select for diffs
Mon Jul 10 04:42:56 2006 UTC (7 years, 9 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.89: preferred, unified
Changes since revision 1.89: +22 -12 lines
Disassociate the VM object after calling VOP_INACTIVE instead of before.
VOP_INACTIVE may have to do some work on the vnode that requires a
functional buffer cache.  For example, UFS may have to truncate a removed
file.

Revision 1.89: download - view: text, markup, annotated - select for diffs
Mon Jun 5 21:03:02 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.88: preferred, unified
Changes since revision 1.88: +2 -2 lines
Cleanup crit_*() usage to reduce bogus warnings printed to the console
when a kernel is compiled with DEBUG_CRIT_SECTIONS.

NOTE: DEBUG_CRIT_SECTIONS does a direct pointer comparison rather than a
strcmp in order to reduce overhead.  Supply a string constant in cases
where the string identifier might be (intentionally) different otherwise.

Revision 1.88: download - view: text, markup, annotated - select for diffs
Mon Jun 5 20:56:54 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.87: preferred, unified
Changes since revision 1.87: +8 -4 lines
Remove an inappropriate crit_exit() in ehci.c and add a missing crit_exit()
in kern/vfs_subr.c.  Specify string IDs in vfsync_bp() so we don't get
complaints on the console when the kernel is compiled with
DEBUG_CRIT_SECTIONS.

The missing crit_exit() in kern/vfs_subr.c was causing the kernel to leave
threads in a critical section, causing interrupts to stop operating and
cpu-bound userland programs to lock up the rest of the system.

Reported-by: Sascha Wildner <saw@online.de>, others.

Revision 1.65.2.2: download - view: text, markup, annotated - select for diffs
Mon Jun 5 14:51:29 2006 UTC (7 years, 10 months ago) by dillon
Branches: DragonFly_RELEASE_1_4
CVS tags: DragonFly_RELEASE_1_4_Slip
Diff to: previous 1.65.2.1: preferred, unified; next MAIN 1.66: preferred, unified
Changes since revision 1.65.2.1: +10 -0 lines
Add some diagnostic messages to try to catch a ufs_dirbad panic before it
happens.

MFC: Reorder BUF_UNLOCK() - it must occur after b_flags is modified, not
before.

A newly created non-VMIO buffer is now marked B_INVAL.  Callers of getblk()
now always clear B_INVAL before issuing a READ I/O or when clearing or
overwriting the buffer.  Before this change, a getblk() (getnewbuf),
brelse(), getblk() sequence on a non-VMIO buffer would result in a buffer
with B_CACHE set yet containing uninitialized data.

MFC: B_NOCACHE cannot be set on a clean VMIO-backed buffer as this will
destroy the VM backing store, which might be dirty.

MFC: Reorder vnode_pager_setsize() calls to close a race condition.

Revision 1.87: download - view: text, markup, annotated - select for diffs
Thu May 25 19:31:13 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.86: preferred, unified
Changes since revision 1.86: +19 -7 lines
Fix several buffer cache issues related to B_NOCACHE.

* Do not set B_NOCACHE when calling vinvalbuf(... V_SAVE).  This will
  destroy dirty VM backing store associated with clean buffers before
  the VM system has a chance to check for and flush them.

  Taken-from: FreeBSD

* Properly set B_NOCACHE when destroying buffers related to truncated data.

* Fix a bug in vnode_pager_setsize() that was recently introduced.
  v_filesize was being set before a new/old size comparison, causing a
  file truncation to not destroy related VM pages past the new EOF.

* Remove a bogus B_NOCACHE|B_DIRTY test in brelse().  This was originally
  intended to be a B_NOCACHE|B_DELWRITE test which then cleared B_NOCACHE,
  but now that B_NOCACHE operation has been fixed it really does indicate that
  the buffer, its contents, and its backing store are to be destroyed, even
  if the buffer is marked B_DELWRI.

  Instead of clearing B_NOCACHE when B_DELWRITE is found to be set, clear
  B_DELWRITE when B_NOCACHE is found to be set.

  Note that B_NOCACHE is still cleared when bdirty() is called in order to
  ensure that data is not lost when softupdates and other code do a
  'B_NOCACHE + bwrite' sequence.  Softupdates can redirty a buffer in its
  io completion hook and a write error can also redirty a buffer.

* The VMIO buffer rundown seems to have mophed into a state where the
  distinction between NFS and non-NFS buffers can be removed.  Remove
  the test.

Revision 1.86: download - view: text, markup, annotated - select for diffs
Tue May 16 18:20:30 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.85: preferred, unified
Changes since revision 1.85: +34 -4 lines
Attempt to interlock races between the buffer cache and VM backing store
that might cause new buffers to be instantiated beyond the new file EOF
during a truncate operation.

Truncate the VM object size before attempting to flush the pages and buffers
in order to prevent new VM pages from being created beyond EOF during the
flush.  Add an extra pass on the buffer cache after truncation to make sure
the buffers have been cleaned out.  Generate a warning to the console if
buffers are found during the extra pass.

If an old buffer were left intact during a truncate, then a re-extension of
the file or directory could have resulted in granting access to the old
buffer which might have had an incorrect cached block number translation
(vs the new block allocated by the extension of the file or directory),
causing new data to be written to the wrong disk block and resulting in
file or directory corruption.  The regular file truncation/extension
code had other checks in it prior to this patch so if this problem could
occur at all before it would have been in the directory code.

There is a small chance that this race was related to reported
ufs: dirbad panics.  The M.O. matches but unfortunately there is still
no smoking gun.

Revision 1.85: download - view: text, markup, annotated - select for diffs
Tue May 16 18:09:20 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.84: preferred, unified
Changes since revision 1.84: +2 -2 lines
Remove vnode lock assertions that are no longer used.  Remove the
IS_LOCKING_VFS() macro.  All VFS's are required to be locking VFSs now.

Revision 1.84: download - view: text, markup, annotated - select for diffs
Sat May 6 18:48:52 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.83: preferred, unified
Changes since revision 1.83: +2 -3 lines
Remove the thread argument from all mount->vfs_* function vectors,
replacing it with a ucred pointer when applicable.  This cleans up a
considerable amount of VFS function code that previously delved into
the process structure to get the cred, though some code remains.

Get rid of the compatibility thread argument for hpfs and nwfs.  Our
lockmgr calls are now mostly compatible with NetBSD (which doesn't use a
thread argument either).

Get rid of some complex junk in fdesc_statfs() that nobody uses.

Remove the thread argument from dounmount() as well as various other
filesystem specific procedures (quota calls primarily) which no longer
need it due to the lockmgr, VOP, and VFS cleanups.  These cleanups also
have the effect of making the VFS code slightly less dependant on the
calling thread's context.

Revision 1.83: download - view: text, markup, annotated - select for diffs
Sat May 6 02:43:12 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.82: preferred, unified
Changes since revision 1.82: +15 -14 lines
The thread/proc pointer argument in the VFS subsystem originally existed
for...  well, I'm not sure *WHY* it originally existed when most of the
time the pointer couldn't be anything other then curthread or curproc or
the code wouldn't work.  This is particularly true of lockmgr locks.

Remove the pointer argument from all VOP_*() functions, all fileops functions,
and most ioctl functions.

Revision 1.82: download - view: text, markup, annotated - select for diffs
Fri May 5 21:15:09 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.81: preferred, unified
Changes since revision 1.81: +2 -3 lines
Simplify vn_lock(), VOP_LOCK(), and VOP_UNLOCK() by removing the thread_t
argument.  These calls now always use the current thread as the lockholder.
Passing a thread_t to these functions has always been questionable at best.

Revision 1.81: download - view: text, markup, annotated - select for diffs
Fri May 5 16:35:00 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.80: preferred, unified
Changes since revision 1.80: +5 -5 lines
Remove VOP_BWRITE().  This function provided a way for a VFS to override
the bwrite() function and was used *only* by NFS in order to allow NFS to
handle the B_NEEDCOMMIT flag as part of NFSv3's 2-phase commit operation.
However, over time, the handling of this flag was moved to the strategy code.
Additionally, the kernel now fully supports the redirtying of buffers
during an I/O (which both softupdates and NFS need to be able to do).

The override is no longer needed.  All former calls to VOP_BWRITE() now
simply call bwrite().

Revision 1.80: download - view: text, markup, annotated - select for diffs
Sun Apr 30 18:25:35 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.79: preferred, unified
Changes since revision 1.79: +13 -14 lines
Remove b_xflags.  Fold BX_VNCLEAN and BX_VNDIRTY into b_flags as
B_VNCLEAN and B_VNDIRTY.  Remove BX_AUTOCHAINDONE and recode the
swap pager to use one of the caller data fields in the BIO instead.

Revision 1.79: download - view: text, markup, annotated - select for diffs
Fri Apr 28 16:34:01 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.78: preferred, unified
Changes since revision 1.78: +0 -30 lines
Get rid of pbgetvp() and pbrelvp().  Instead fold the B_PAGING flag directly
into getpbuf() (the only type of buffer that pbgetvp() could be called on
anyway).  Change related b_flags assignments from '=' to '|='.

Get rid of remaining depdendancies on b_vp.  vn_strategy() now relies solely
on the vp passed to it as an argument.  Remove buffer cache code that sets
b_vp for anonymous pbuf's.

Add a stopgap 'vp' argument to vfs_busy_pages().  This is only really needed
by NFS and the clustering code do to the severely hackish nature of the
NFS and clustering code.

Fix a bug in the ext2fs inode code where vfs_busy_pages() was being called
on B_CACHE buffers.  Add an assertion to vfs_busy_pages() to panic if it
encounters a B_CACHE buffer.

Revision 1.78: download - view: text, markup, annotated - select for diffs
Tue Apr 25 22:11:28 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.77: preferred, unified
Changes since revision 1.77: +8 -3 lines
Get rid of the weird FSMID update path in the vnode and namecache code.
Instead, mark the vnode as needing an FSMID update when the vnode is
disconnected from the namecache.

This fixes a bug where FSMID updates were being lost at unmount time.

Revision 1.77: download - view: text, markup, annotated - select for diffs
Mon Apr 24 22:01:18 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.76: preferred, unified
Changes since revision 1.76: +1 -16 lines
vfsync() is not in the business of removing buffers beyond the file EOF.
Remove the procedural argument and related code.

Revision 1.76: download - view: text, markup, annotated - select for diffs
Sun Apr 23 00:47:10 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.75: preferred, unified
Changes since revision 1.75: +0 -1 lines
Remove unused code label.

Revision 1.65.2.1: download - view: text, markup, annotated - select for diffs
Tue Apr 18 17:12:25 2006 UTC (7 years, 11 months ago) by dillon
Branches: DragonFly_RELEASE_1_4
Diff to: previous 1.65: preferred, unified
Changes since revision 1.65: +0 -2 lines
MFC vfs_bio.c 1.57, vfs_subr.c 1.69 - fix race condition in vfs_bio_awrite().

Revision 1.75: download - view: text, markup, annotated - select for diffs
Fri Apr 7 06:38:27 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.74: preferred, unified
Changes since revision 1.74: +5 -23 lines
Due to continuing issues with VOP_READ/VOP_WRITE ops being called without
a VOP_OPEN, particularly by NFS, redo the way VM objects are associated
with vnodes.

* The size of the object is now passed to vinitvmio().  vinitvmio() no
  longer calls VOP_GETATTR().

* Instead of trying to call vinitvmio() conditionally in various places,
  we now call it unconditionally when a vnode is instantiated if
  the filesystem at any time in the future intends to use the buffer
  cache to access that vnode's dataspace.

* Specfs 'disk' devices are an exception.  Since we cannot safely do I/O
  on such vnodes if they have not been VOP_OPEN()'ed anyhow, the VM objects
  for those vnodes are still only associated on open.

The performance impact is limited to the case where large numbers of vnodes
are being created and destroyed.  This case only occurs when a large
directory topology (number of files > kernel's vnode cache) is traversed
and all related inodes are cached by the system.  Being a pure-cpu case
the slight loss of performance due to the VM object allocations is
not really a big dael.

Revision 1.74: download - view: text, markup, annotated - select for diffs
Sat Apr 1 20:46:47 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.73: preferred, unified
Changes since revision 1.73: +17 -11 lines
Use the vnode v_opencount and v_writecount universally.  They were previously
only used by specfs.  Require that VOP_OPEN and VOP_CLOSE calls match.
Assert on boundary errors.

Clean up umount's FORCECLOSE mode.  Adjust deadfs to allow duplicate closes
(which can happen due to a forced unmount or revoke).

Add vop_stdopen() and vop_stdclose() and adjust the default vnode ops to
call them.  All VFSs except DEADFS which supply their own vop_open and
vop_close now call vop_stdopen() and vop_stdclose() to handle v_opencount
and v_writecount adjustments.

Change the VOP_OPEN/fp specs.  VOP_OPEN (aka vop_stdopen) is now responsible
for filling in the file pointer information, rather than the caller of
VOP_OPEN.  Additionally, when supplied a file pointer, VOP_OPEN is now
allowed to populate the file pointer with a different vnode then the one
passed to it, which will be used later on to allow filesystems which
synthesize different vnodes on open, for example so we can create a generic
tty/pty pairing devices rather than scanning for an unused pty, and so we
can create swap-backed generic anonymous file descriptors rather than having
to use /tmp.  And for other purposes as well.

Fix UFS's mount/remount/unmount code to make the proper VOP_OPEN and
VOP_CLOSE calls when a filesystem is remounted read-only or read-write.

Revision 1.73: download - view: text, markup, annotated - select for diffs
Wed Mar 29 20:46:05 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.72: preferred, unified
Changes since revision 1.72: +2 -0 lines
A VM object is now required for vnode-based buffer cache ops.  This
is usually handled by VOP_OPEN but there are a few cases where UFS issues
buffer cache ops on vnodes that have not been opened, such as when creating
a new directory or softlink.

Revision 1.72: download - view: text, markup, annotated - select for diffs
Wed Mar 29 18:44:50 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.71: preferred, unified
Changes since revision 1.71: +70 -21 lines
Remove VOP_GETVOBJECT, VOP_DESTROYVOBJECT, and VOP_CREATEVOBJECT.  Rearrange
the VFS code such that VOP_OPEN is now responsible for associating a VM
object with a vnode.  Add the vinitvmio() helper routine.

Revision 1.71: download - view: text, markup, annotated - select for diffs
Fri Mar 24 18:35:33 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.70: preferred, unified
Changes since revision 1.70: +32 -22 lines
Major BUF/BIO work commit.  Make I/O BIO-centric and specify the disk or
file location with a 64 bit offset instead of a 32 bit block number.

* All I/O is now BIO-centric instead of BUF-centric.

* File/Disk addresses universally use a 64 bit bio_offset now.  bio_blkno
  no longer exists.

* Stackable BIO's hold disk offset translations.  Translations are no longer
  overloaded onto a single structure (BUF or BIO).

* bio_offset == NOOFFSET is now universally used to indicate that a
  translation has not been made.  The old (blkno == lblkno) junk has all
  been removed.

* There is no longer a distinction between logical I/O and physical I/O.

* All driver BUFQs have been converted to BIOQs.

* BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*,
  and findblk all now take and/or return 64 bit byte offsets instead
  of block numbers.  Note that BMAP now returns a byte range for the before
  and after variables.

Revision 1.70: download - view: text, markup, annotated - select for diffs
Sun Mar 5 18:38:34 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.69: preferred, unified
Changes since revision 1.69: +61 -61 lines
Replace the global buffer cache hash table with a per-vnode red-black tree.
Add a B_HASHED b_flags bit as a sanity check.  Remove the invalhash junk
and replace with assertions in several cases where the buffer must already
not be hashed.  Get rid of incore() and gbincore() and replace with a new
function called findblk().

Merge the new RB management with bgetvp(), the two are now fully integrated.

Previous work has turned reassignbuf() into a mostly degenerate call, simplify
its arguments and functionality to match.  Remove an unnecessary reassignbuf()
call from the NFS code.  Get rid of pbreassignbuf().

Adjust the code in several places where it was assumed that calling
BUF_LOCK() with LK_SLEEPFAIL after previously failing with LK_NOWAIT
would always fail.  This code was used to sleep before a retry.  Instead,
if the second lock unexpectedly succeeds, simply issue an unlock and retry
anyway.

Testing-by: Stefan Krueger <skrueger@meinberlikomm.de>

Revision 1.69: download - view: text, markup, annotated - select for diffs
Thu Mar 2 20:28:49 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.68: preferred, unified
Changes since revision 1.68: +0 -2 lines
vfs_bio_awrite() was unconditionally locking a buffer without checking
for races, potentially resulting in the wrong buffer, an invalid buffer,
or a recently replaced buffer being written out.  Change the call semantics
to require a locked buffer to be passed into the function rather then
locking the buffer in the function.

Revision 1.68: download - view: text, markup, annotated - select for diffs
Thu Mar 2 19:26:14 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.67: preferred, unified
Changes since revision 1.67: +4 -1 lines
buftimespinlock is utterly useless since the spinlock is released
within lockmgr().  The only real problem was with lk_prio, which no longer
exists, so get rid of the spin lock and document the remaining passive
races.

Revision 1.67: download - view: text, markup, annotated - select for diffs
Thu Mar 2 19:07:59 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.66: preferred, unified
Changes since revision 1.66: +6 -5 lines
Pass LK_PCATCH instead of trying to store tsleep flags in the lock
structure, so multiple entities competing for the same lock do not
use unexpected flags when sleeping.

Only NFS really uses PCATCH with lockmgr locks.

Revision 1.66: download - view: text, markup, annotated - select for diffs
Fri Feb 17 19:18:06 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.65: preferred, unified
Changes since revision 1.65: +15 -38 lines
Make the entire BUF/BIO system BIO-centric instead of BUF-centric.  Vnode
and device strategy routines now take a BIO and must pass that BIO to
biodone().  All code which previously managed a BUF undergoing I/O now
manages a BIO.

The new BIO-centric algorithms allow BIOs to be stacked, where each layer
represents a block translation, completion callback, or caller or device
private data.  This information is no longer overloaded within the BUF.
Translation layer linkages remain intact as a 'cache' after I/O has completed.

The VOP and DEV strategy routines no longer make assumptions as to which
translated block number applies to them.  The use the block number in the
BIO specifically passed to them.

Change the 'untranslated' constant to NOOFFSET (for bio_offset), and
(daddr_t)-1 (for bio_blkno).  Rip out all code that previously set the
translated block number to the untranslated block number to indicate
that the translation had not been made.

Rip out all the cluster linkage fields for clustered VFS and clustered
paging operations.  Clustering now occurs in a private BIO layer using
private fields within the BIO.

Reformulate the vn_strategy() and dev_dstrategy() abstraction(s).  These
routines no longer assume that bp->b_vp == the vp of the VOP operation, and
the dev_t is no longer stored in the struct buf.  Instead, only the vp passed
to vn_strategy() (and related *_strategy() routines for VFS ops), and
the dev_t passed to dev_dstrateg() (and related *_strategy() routines for
device ops) is used by the VFS or DEV code.  This will allow an arbitrary
number of translation layers in the future.

Create an independant per-BIO tracking entity, struct bio_track, which
is used to determine when I/O is in-progress on the associated device
or vnode.

NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold
the fields describing the data buffer, resid, and error state.

Major-testing-by: Stefan Krueger

Revision 1.65: download - view: text, markup, annotated - select for diffs
Mon Oct 31 21:48:53 2005 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Branch point for: DragonFly_RELEASE_1_4
Diff to: previous 1.64: preferred, unified
Changes since revision 1.64: +14 -3 lines
An exclusive lock on the vnode is required when running vm_object_page_clean(),
otherwise a balloc may occur without the vnode/inode held locked.

There is a possibility that this bug was responsible for some filesystem
corrupted.

Reported-by: numerous people after a sanity assertion was committed to the
	     ffs_balloc code.

Revision 1.64: download - view: text, markup, annotated - select for diffs
Sat Sep 17 07:43:00 2005 UTC (8 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.63: preferred, unified
Changes since revision 1.63: +4 -2 lines
Add an argument to vfs_add_vnodeops() to specify VVF_* flags for the vop_ops
structure.  Add a new flag called VVF_SUPPORTS_FSMID to indicate filesystems
which support persistent storage of FSMIDs.  Rework the FSMID code a bit
to reduce overhead.

Use the spare field in the UFS inode structure to implement a persistent
FSMID.  The FSMID is recursively marked in the namecache but not adjusted
until the next getattr() call on the related inode(s), or when the vnode
is reclaimed.

Revision 1.63: download - view: text, markup, annotated - select for diffs
Sat Aug 27 20:23:05 2005 UTC (8 years, 7 months ago) by joerg
Branches: MAIN
Diff to: previous 1.62: preferred, unified
Changes since revision 1.62: +11 -11 lines
Make struct dirent contain a full 64bit inode. Allow more than 255 byte
filenames by increasing d_namlen to 16bit. Remove UFS specific macros
from sys/dirent.h, programs which really need them should include
vfs/ufs/dir.h. MAXNAMLEN should not be used, but replaced by NAME_MAX.

To keep the impact for older BSD code small, d_ino and d_fileno are kept
in the old meaning when __BSD_VISIBLE is defined, otherwise the POSIX
version d_ino is used. This will be changed later to always define only
d_ino and make d_fileno a compatiblity macro for __BSD_VISIBLE.

d_name is left with hard-coded 256 byte space, this will be changed at
some point in the future and doesn't affect the ABI. Programs should
correctly allocate space themselve, since the maximum directory entry
length can be > 256 byte.

For allocating dirents (e.g. for readdir_r), _DIRENT_RECLEN and
_DIRENT_DIRSIZ should be used. NetBSD has choosen the same names.
Revamp the compatibility code to always use a local kernel buffer and
write out the entries. This will be changed later by passing down the
output function to vop_readdir, elimininating the redundant copy.

Change NFS and CD9660 to use to use vop_write_dirent, for CD9660 ensure
that the buffers are big enough by prepending char arrays of the right
size.

Tested-by & discussed-with: dillon

Revision 1.62: download - view: text, markup, annotated - select for diffs
Sun Aug 14 18:38:27 2005 UTC (8 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.61: preferred, unified
Changes since revision 1.61: +4 -2 lines
Add a sanity check for the length of the file name to vop_write_dirent().

Revision 1.61: download - view: text, markup, annotated - select for diffs
Thu Aug 11 09:27:00 2005 UTC (8 years, 8 months ago) by joerg
Branches: MAIN
Diff to: previous 1.60: preferred, unified
Changes since revision 1.60: +1 -1 lines
Fix merge bug. d_namlen is used by GENERIC_DIRSIZ, when it isn't
initialised, the argument to bzero is wrong.

Revision 1.60: download - view: text, markup, annotated - select for diffs
Wed Aug 10 14:28:34 2005 UTC (8 years, 8 months ago) by joerg
Branches: MAIN
Diff to: previous 1.59: preferred, unified
Changes since revision 1.59: +23 -0 lines
Add vop_write_dirent helper functions, which isolates the caller from
the layout and setup of struct dirent.

Revision 1.59: download - view: text, markup, annotated - select for diffs
Tue Aug 9 19:26:59 2005 UTC (8 years, 8 months ago) by joerg
Branches: MAIN
Diff to: previous 1.58: preferred, unified
Changes since revision 1.58: +7 -2 lines
When allocating memory for the index file, query the filesystem for the
maximum entry name first and use that.

Revision 1.58: download - view: text, markup, annotated - select for diffs
Tue Aug 9 16:53:34 2005 UTC (8 years, 8 months ago) by joerg
Branches: MAIN
Diff to: previous 1.57: preferred, unified
Changes since revision 1.57: +13 -0 lines
Add vn_get_namelen to simplify correct emulation of statfs with maximum
name length field.

Discussed-with: hmp

Revision 1.57: download - view: text, markup, annotated - select for diffs
Mon Jun 6 15:02:28 2005 UTC (8 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.56: preferred, unified
Changes since revision 1.56: +19 -20 lines
Remove spl*() calls from kern, replacing them with critical sections.
Change the meaning of safepri from a cpl mask to a thread priority.
Make a minor adjustment to tests within one of the buffer cache's
critical sections.

Revision 1.53.2.1: download - view: text, markup, annotated - select for diffs
Mon May 23 18:32:52 2005 UTC (8 years, 10 months ago) by dillon
Branches: DragonFly_RELEASE_1_2
CVS tags: DragonFly_RELEASE_1_2_Slip
Diff to: previous 1.53: preferred, unified; next MAIN 1.54: preferred, unified
Changes since revision 1.53: +1 -0 lines
MFC 1.56.  Minor kernel stack memory disclosure.

Security: FreeBSD-SA-05:08.kmem

Revision 1.56: download - view: text, markup, annotated - select for diffs
Fri May 6 11:52:02 2005 UTC (8 years, 11 months ago) by corecode
Branches: MAIN
Diff to: previous 1.55: preferred, unified
Changes since revision 1.55: +1 -0 lines
Bring in fix from FreeBSD/cperciva:
  Log:
  If we are going to
  1. Copy a NULL-terminated string into a fixed-length buffer, and
  2. copyout that buffer to userland,
  we really ought to
  0. Zero the entire buffer
  first.

  Security: FreeBSD-SA-05:08.kmem

Thanks to Colin Percival for notifying us!

Revision 1.55: download - view: text, markup, annotated - select for diffs
Tue Apr 19 17:54:42 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Stable
Diff to: previous 1.54: preferred, unified
Changes since revision 1.54: +42 -107 lines
Abstract out the routines which manipulate the mountlist.

Introduce an MP-safe mountlist scanning function.  This function keeps track
of scans which are in-progress and properly handles ripouts that occur during
the callback by advancing the matching pointers being tracked.  The callback
can safely block without confusing the scan.

This algorithm has already been successfully used for the buffer cache and
will soon be used for the vnode lists hanging off the mount point.

Revision 1.54: download - view: text, markup, annotated - select for diffs
Fri Apr 15 19:08:11 2005 UTC (9 years ago) by dillon
Branches: MAIN
Diff to: previous 1.53: preferred, unified
Changes since revision 1.53: +449 -202 lines
Implement Red-Black trees for the vnode clean/dirty buffer lists.

Implement ranged fsyncs and adjust the syncer to use the new capability.
This capability will also soon be used to replace the write_behind
heuristic.  Rewrite the fsync code for all VFSs to use the new APIs
(generally simplifying them).

Get rid of B_WRITEINPROG, it is no longer useful or needed.
Get rid of B_SCANNED, it is no longer useful or needed.

Rewrite the NFS 2-phase commit protocol to take advantage of the new
Red-Black tree topology.

Add RB_SCAN() for callback-scanning of Red-Black trees.  Give RB_SCAN
the ability to track the 'next' scan node and automatically fix it up
if the callback directly or indirectly or through blocking indirectly
deletes nodes in the tree while the scan is in progress.

Remove most related loop restart conditions, they are no longer necessary.

Disable filesystem background bitmap writes.  This really needs to be
solved a different way and the concept does not work well with red-black
trees.

Revision 1.53: download - view: text, markup, annotated - select for diffs
Fri Mar 4 02:21:48 2005 UTC (9 years, 1 month ago) by hsu
Branches: MAIN
Branch point for: DragonFly_RELEASE_1_2
Diff to: previous 1.52: preferred, unified
Changes since revision 1.52: +1 -1 lines
Convert the struct domain next pointer to an SLIST.

Revision 1.52: download - view: text, markup, annotated - select for diffs
Sat Feb 12 18:56:46 2005 UTC (9 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.51: preferred, unified
Changes since revision 1.51: +4 -1 lines
Fix a cache_resolve() vs cache_inval() race which can result in a livelock.
The namecache invalidation code was being a bit overzealous when asked to
invalidate a subhierarchy.  It was retrying until the subhierarchy was
completely invalidated, re-invalidating new entries created after the initial
call to cache_inval().  This can occur if the filesystem is heavily loaded
and a high level directory is being recursively invalidated.

It is unnecessary to retry in this case... the purpose is to invalidate
as-of the call to cache_inval(), so it is acceptable to allow new entries
to be resolved within the subhierarchy undergoing the invalidation.

Certain higher level entities... rename, and vnode reclamation, require
complete invalidation.  The retry has been moved to a higher level for
these entities.  The basic cache_inval() code is now single-pass.

Reported-by: Richard Nyberg <rnyberg@it.su.se>

Revision 1.51: download - view: text, markup, annotated - select for diffs
Wed Feb 2 21:34:18 2005 UTC (9 years, 2 months ago) by joerg
Branches: MAIN
Diff to: previous 1.50: preferred, unified
Changes since revision 1.50: +1 -2 lines
Don't use the statfs field f_mntonname in filesystems. For the userland
export code, it can synthesized from mnt_ncp.
For debugging code, use f_mntfromname, it should be enough to find
culprit. The vfs_unmountall doesn't use code_fullpath to avoid problems
with resource allocation and to make it more likely that a call from ddb
succeds.
Change getfsstat and fhstatfs to not show directories outside a chroot
path, with the exception of the filesystem counting the chroot root itself.

Revision 1.50: download - view: text, markup, annotated - select for diffs
Fri Dec 17 00:18:07 2004 UTC (9 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.49: preferred, unified
Changes since revision 1.49: +2 -2 lines
VFS messaging/interfacing work stage 10/99:

Start adding the journaling, range locking, and (very slightly) cache
coherency infrastructure.  Continue cleaning up the VOP operations vector.

Expand on past commits that gave each mount structure its own set of VOP
operations vectors by adding additional vector sets for journaling or
cache coherency operations.  Remove the vv_jops and vv_cops fields
from the vnode operations vector in favor of placing those vop_ops directly
in the mount structure.  Reorganize the VOP calls as a double-indirect
and add a field to the mount structure which represents the current
vnode operations set (which will change when e.g. journaling is turned on
or off).  This creates the infrastructure necessary to allow us to stack
a generic journaling implementation on top of a filesystem.

Introduce a hard range-locking API for vnodes.   This API will be used by
high level system/vfs calls in order to handle atomicy guarentees.  It is
a prerequisit for: (1) being able to break I/O's up into smaller pieces
for the vm_page list/direct-to-DMA-without-mapping goal, (2) to support
the parallel write operations on a vnode goal, (3) to support the clustered
(remote) cache coherency goal, and (4) to support massive parallelism in
dispatching operations for the upcoming threaded VFS work.

This commit represents only infrastructure and skeleton/API work.

Revision 1.49: download - view: text, markup, annotated - select for diffs
Tue Dec 14 18:46:08 2004 UTC (9 years, 4 months ago) by hsu
Branches: MAIN
Diff to: previous 1.48: preferred, unified
Changes since revision 1.48: +2 -2 lines
Clean up routing code before I parallelize it.

Revision 1.48: download - view: text, markup, annotated - select for diffs
Thu Nov 18 20:04:24 2004 UTC (9 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.47: preferred, unified
Changes since revision 1.47: +1 -1 lines
Cleanup some dangling issues with cache_inval().  A lot of hard work went
into guarenteeing that the namecache topology would remain connected, but
there were two cases (basically rmdir and rename-over-empty-target-dir)
which disconnected a portion of the hierarchy.

This fixes the remaining cases by having cache_inval() simply mark the
namecache entry as destroyed without actually disconnecting it from the
topology.  The flag tells cache_nlookup() and ".." handlers that a node has
been destroyed and is no longer connected to any parent directory.

The new cache_inval() also now has the ability to mark an entire subhierarchy
as being unresolved, which can be a useful feature to have.

In-discussion-with: Richard Nyberg <rnyberg@it.su.se>

Revision 1.47: download - view: text, markup, annotated - select for diffs
Fri Nov 12 10:58:59 2004 UTC (9 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.46: preferred, unified
Changes since revision 1.46: +1 -1 lines
Default vfs.fastdev to 1 for wider testing, so the vnode bypass for device
read and write is now the default.

This is a precursor to the continued work on a kernel-managed cache layer
on top of the VFS layer.  That is, the intention is to eventually switch the
VM page cache to be ABOVE the VFS layer rather then BELOW the VFS layer for
standard read() and write() calls, with potentially major performance
benefits.

Revision 1.46: download - view: text, markup, annotated - select for diffs
Fri Nov 12 00:09:24 2004 UTC (9 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.45: preferred, unified
Changes since revision 1.45: +2 -35 lines
VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK.

NOTE: unionfs and nullfs are temporarily broken by this commit.

* Remove the old namecache API.  Remove vfs_cache_lookup(), cache_lookup(),
  cache_enter(), namei() and lookup() are all gone.  VOP_LOOKUP() and
  VOP_CACHEDLOOKUP() have been collapsed into a single non-caching
  VOP_LOOKUP().

* Complete the new VFS CACHE (namecache) API.  The new API is able to
  supply topological guarentees and is able to reserve namespaces,
  including negative cache spaces (whether the target name exists or not),
  which the new API uses to reserve namespace for things like NRENAME
  and NCREATE (and others).

* Complete the new namecache API.  VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE,
  NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE.
  These new calls take (typicaly locked) namecache pointers rather then
  combinations of directory vnodes, file vnodes, and name components.  The
  new calls are *MUCH* simpler in concept and implementation.  For example,
  VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments.

  The new namecache API uses the namecache to lock namespaces without having
  to lock the underlying vnodes.  For example, this allows the kernel
  to reserve the target name of a create function trivially.  Namecache
  records are maintained BY THE KERNEL for both positive and negative hits.

  Generally speaking, the kernel layer is now responsible for resolving
  path elements.  NRESOLVE is called when an unresolved namecache record
  needs to be resolved.  Unlike the old VOP_LOOKUP, NRESOLVE is simply
  responsible for associating a vnode to a namecache record (positive hit)
  or telling the system that it's a negative hit, and not responsible for
  handling symlinks or other special cases or doing any of the other
  path lookup work, much unlike the old VOP_LOOKUP.

  It should be particularly noted that the new namecache topology does not
  allow disconnected namecache records.  In rare cases where a vnode must
  be converted to a namecache pointer for new API operation via a file handle
  (i.e. NFS), the cache_fromdvp() function is provided and a new API VOP,
  VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the
  topology leading up to the requested vnode.  These and other topological
  guarentees greatly reduce the complexity of the new namecache API.

  The new namei() is called nlookup().  This function uses a combination
  of cache_n*() calls, VOP_NRESOLVE(), and standard VOP calls resolve the
  supplied path, deal with symlinks, and so forth, in a nice small compact
  compartmentalized procedure.

* The old VFS code is no longer responsible for maintaining namecache records,
  a function which was mostly adhoc cache_purge()s occuring before the VFS
  actually knows whether an operation will succeed or not.

  The new VFS code is typically responsible for adjusting the state of
  locked namecache records passed into it.  For example, if NCREATE succeeds
  it must call cache_setvp() to associate the passed namecache record with
  the vnode representing the successfully created file.  The new requirements
  are much less complex then the old requirements.

* Most VFSs still implement the old API calls, albeit somewhat modified
  and in particular the VOP_LOOKUP function is now *MUCH* simpler.  However,
  the kernel now uses the new API calls almost exclusively and relies on
  compatibility code installed in the default ops (vop_compat_*()) to
  convert the new calls to the old calls.

* All kernel system calls and related support functions which used to do
  complex and confusing namei() operations now do far less complex and
  far less confusing nlookup() operations.

* SPECOPS shortcutting has been implemented.  User reads and writes now go
  directly to supporting functions which talk to the device via fileops
  rather then having to be routed through VOP_READ or VOP_WRITE, saving
  significant overhead.  Note, however, that these only really effect
  /dev/null and /dev/zero.

  Implementing this was fairly easy, we now simply pass an optional
  struct file pointer to VOP_OPEN() and let spec_open() handle the
  override.

SPECIAL NOTES: It should be noted that we must still lock a directory vnode
LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because
a number of VFS's (including UFS) store active directory scanning information
in the directory vnode.  The legacy NAMEI_LOOKUP cases can be changed to
use LK_SHARED once these VFS cases are fixed.  In particular, we are now
organized well enough to actually be able to do record locking within a
directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't
been done yet.

Many thanks to all of the testers and in particular David Rhodus for
finding a large number of panics and other issues.

Revision 1.45: download - view: text, markup, annotated - select for diffs
Mon Oct 25 19:14:32 2004 UTC (9 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.44: preferred, unified
Changes since revision 1.44: +0 -5 lines
Remove the vfs page replacement optimization and its ENABLE_VFS_IOOPT option.
This never worked properly... that is, the semantics are broken compared to
a normal read or write in that the read 'buffer' will be modified out from
under the caller if the underlying file is.

What is really needed here is a copy-on-write feature that works in both
directions, similar to how a shared buffer is copied after a fork() if either
the parent or child modify it.  The optimization will eventually rewritten
with that in mind but not right now.

Revision 1.44: download - view: text, markup, annotated - select for diffs
Fri Oct 22 18:03:50 2004 UTC (9 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.43: preferred, unified
Changes since revision 1.43: +5 -4 lines
The old lookup() API is extremely complex.  Even though it will be ripped out
soon, I'm documenting the procedure so I don't have to keep running through
it to figure out what is going on.  Do a better job describing the new
vgone() API (the old API required the vnode to be in a very weird state.
The new API requires the vnode to be VX locked and refd and returns with the
vnode in the same state).

Revision 1.43: download - view: text, markup, annotated - select for diffs
Tue Oct 12 19:20:46 2004 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.42: preferred, unified
Changes since revision 1.42: +165 -1692 lines
VFS messaging/interfacing work stage 8/99: Major reworking of the vnode
interlock and other miscellanious things.  This patch also fixes FS
corruption due to prior vfs work in head.  In particular, prior to this
patch the namecache locking could introduce blocking conditions that
confuse the old vnode deactivation and reclamation code paths.  With
this patch there appear to be no serious problems even after two days
of continuous testing.

* VX lock all VOP_CLOSE operations.
* Fix two NFS issues.  There was an incorrect assertion (found by
  David Rhodus), and the nfs_rename() code was not properly
  purging the target file from the cache, resulting in Stale file
  handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj.
* Fix a TTY session issue.  Programs which open("/dev/tty" ,...) and
  then run the TIOCNOTTY ioctl were causing the system to lose track
  of the open count, preventing the tty from properly detaching.
  This is actually a very old BSD bug, but it came out of the woodwork
  in DragonFly because I am now attempting to track device opens
  explicitly.
* Gets rid of the vnode interlock.  The lockmgr interlock remains.
* Introduced VX locks, which are mandatory vp->v_lock based locks.
* Rewrites the locking semantics for deactivation and reclamation.
  (A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE,
  and VOP_RECLAIM).  New guarentees emplaced with regard to vnode
  ripouts.
* Recodes the mountlist scanning routines to close timing races.
* Recodes getnewvnode to close timing races (it now returns a
  VX locked and refd vnode rather then a refd but unlocked vnode).
* Recodes VOP_REVOKE- a locked vnode is now mandatory.
* Recodes all VFS inode hash routines to close timing holes.
* Removes cache_leaf_test() - vnodes representing intermediate
  directories are now held so the leaf test should no longer be
  necessary.
* Splits the over-large vfs_subr.c into three additional source
  files, broken down by major function (locking, mount related,
  filesystem syncer).

* Changes splvm() protection to a critical-section in a number of
  places (bleedover from another patch set which is also about to be
  committed).

Known issues not yet resolved:

* Possible vnode/namecache deadlocks.
* While most filesystems now use vp->v_lock, I haven't done a final
  pass to make vp->v_lock mandatory and to clean up the few remaining
  inode based locks (nwfs I think and other obscure filesystems).
* NullFS gets confused when you hit a mount point in the underlying
  filesystem.
* Only UFS and NFS have been well tested
* NFS is not properly timing out namecache entries, causing changes made
  on the server to not be properly detected on the client if the client
  already has a negative-cache hit for the filename in question.

Testing-by: David Rhodus <sdrhodus@gmail.com>,
	    Peter Kadau <peter.kadau@tuebingen.mpg.de>,
	    walt <wa1ter@myrealbox.com>,
	    others

Revision 1.42: download - view: text, markup, annotated - select for diffs
Tue Oct 5 03:24:09 2004 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.41: preferred, unified
Changes since revision 1.41: +3 -3 lines
VFS messaging/interfacing work stage 7d/99: More firming up of stage 7.

Additional work to deal with old-api/new-api issues.  Cut more stuff
out of the old-api's cache_enter() routine to deal with deadlocks, at
the cost of some performance loss (temporary until the VFS's start using
the new APIs).  Change UFS and NFS to not purge whole directories in
*_rename() and *_rmdir().

Add some minor breakage to the API which will not be fixed until the VFS's
get new rename implementations - renaming a directory in which a process
has chdir'd will create problems for that process.  This doesn't happen
normally anyway so this temporary breakage should not cause any significant
problems.

Bug-reports-by: walt, Sascha Wildner, others

Revision 1.41: download - view: text, markup, annotated - select for diffs
Sun Sep 26 20:14:20 2004 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +0 -12 lines
VFS messaging/interfacing work stage 5b/99.  More cleanups, remove the
(unused) ni_ncp and ni_dncp from struct nameidata.  A new structure will
be used for the new API.

Revision 1.40: download - view: text, markup, annotated - select for diffs
Thu Sep 23 01:55:15 2004 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +0 -1 lines
Remove unused variable.

Revision 1.39: download - view: text, markup, annotated - select for diffs
Sat Sep 4 23:12:54 2004 UTC (9 years, 7 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap13Sep2004
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +29 -2 lines
Fix a bug in sillyrename handling in nfs_inactive().  The code was improperly
ignoring the lock state of the passed vp and recursing nfs_inactive() by
calling vrele() from within nfs_inactive().  Since NFS uses real vnode
locking now, this resulted in a panic.

KDE startup problems reported by:  Emiel Kollof <coolvibe@hackerheaven.org>

Revision 1.38: download - view: text, markup, annotated - select for diffs
Sat Aug 28 19:02:05 2004 UTC (9 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +15 -10 lines
VFS messaging/interfacing work stage 4/99.  This stage goes a long ways
towards allowing us to move the vnode locking into a kernel layer.  It
gets rid of a lot of cruft from FreeBSD-4.  FreeBSD-5 has done some of this
stuff too (such as changing the default locking to stdlock from nolock),
but DragonFly is going further.

* Consolidate vnode locks into the vnode structure, add an embedded v_lock,
  and getting rid of both v_vnlock and v_data based head-of-structure locks.

* Change the default vops to use a standard vnode lock rather then a fake
  non-lock.

* Get rid of vop_nolock() and friends, we no longer support non-locking
  vnodes.

* Get rid of vop_sharedlock(), we no longer support non standard shared-only
  locks (only NFS was using it and the mount-crossing lookup code should
  now prevent races to root from dead NFS volumes).

* Integrate lock initialization into getnewvnode().  We do not yet
  incorporate automatically locking into getnewvnode().  getnewvnode()
  now has two additional arguments, lktimeout and lkflags, for lock
  structure initialization.

* Change the sync vnode lock from nolock to stdlock.  This may require more
  tuning down the line.  Fix various sync_inactive() to properly unlock
  the lock as per the VOP API.

* Properly flag the 'rename' vop operation regarding required tdvp and tvp
  unlocks (the flags are only used by nullfs).

* Get rid of all inode-embedded vnode locks

* Remove manual lockinit and use new getnewvnode() args instead.
  Lock the vnode prior to doing anything that might block in
  order to avoid synclist access before the vnode has been properly
  initialize.

* Generally change inode hash insertion to also check
  for a hash collision and return failure if it occurs,
  rather then doing (often non-atomic) relookups and
  other checks.  These sorts of collisions can occur
  if a vnode is being destroyed at the same time a new
  vnode is being created from an inode.  A new vnode is
  not generally accessible, except by the sync code (from
  the mountlist) until it's underlying inode has been hashed
  so dealing with a hash collision should be as simple as
  throwing away the vnode with a vput().

* Do not initialize a new vnode's v_data until after
  the associated inode has been successfully added to
  the hash, and make the xxx_inactive() and xxx_reclaim()
  code friendly towards vnodes with a NULL v_data.

* NFS now uses standard locks rather then shared-only locks.

* PROCFS now uses standard locks rather then non-locks, and PROCFS's
  lookup code now understands VOP lookup semantics.  PROCFS now uses
  a real hash table for its node search rather then a single singly-linked
  list (which should better scale to systems with thousands of processes).

* NULLFS should now properly handle lookup() and rename() locks.  NULLFS's
  node handling code has been rewritten.  NULLFS's bypass code now understands
  vnode unlocks (rename case).

* UFS no longer needs the ffs_inode_hash_lock hacks.  It now uses the new
  collision-on-hash-add methodology.   This will speed up UFS when operating
  on lots of small files (reported by David Rhodus).

Revision 1.37: download - view: text, markup, annotated - select for diffs
Tue Aug 17 18:57:32 2004 UTC (9 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +6 -6 lines
VFS messaging/interfacing work stage 2/99.  This stage retools the vnode ops
vector dispatch, making the vop_ops a per-mount structure rather then a
per-filesystem structure.  Filesystem mount code, typically in blah_vfsops.c,
must now register various vop_ops pointers in the struct mount to compile
its VOP operations set.

This change will allow us to begin adding per-mount hooks to VFSes to support
things like kernel-level journaling, various forms of cache coherency
management, and so forth.

In addition, the vop_*() calls now require a struct vop_ops pointer as the
first argument instead of a vnode pointer (note: in this commit the VOP_*()
macros currently just pull the vop_ops pointer from the vnode in order to
call the vop_*() procedures).  This change is intended to allow us to divorce
ourselves from the requirement that a vnode pointer always be part of a VOP
call.  In particular, this will allow namespace based routines such as
remove(), mkdir(), stat(), and so forth to pass namecache pointers rather then
locked vnodes and is a very important precursor to the goal of using the
namecache for namespace locking.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Fri Aug 13 17:51:09 2004 UTC (9 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +19 -19 lines
VFS messaging/interfacing work stage 1/99.  This stage replaces the old
dynamic VFS descriptor and inlined wrapper mess with a fixed structure
and fixed procedural wrappers.  Most of the work is straightforward except
for vfs_init, which was basically rewritten (and greatly simplified).

It is my intention to make the vop_*() call wrappers eventually handle
range locking and cache coherency issues as well as implementing the
direct call -> messaging interface layer.  The call wrappers will also
API translation as we shift the APIs over to new, more powerful mechanisms
in order to allow the work to be incrementally committed.

This is the first stage of what is likely to be a huge number of stages
to modernize the VFS subsystem.

Revision 1.35: download - view: text, markup, annotated - select for diffs
Sat Jul 10 16:29:45 2004 UTC (9 years, 9 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_1_0_REL, DragonFly_1_0A_REL
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +8 -3 lines
There was a mountlist race in getnewvnode() whereby the system could block
obtaining the mountlist token while adding a vnode to the mountlist prior
to initializing the vnode's v_usecount and v_data fields.  This bug is
possibly responsible for or related to occassional reports of duplicate
inodes in the system.

Fix the potential problem by more completely initializing the vnode prior
to adding it to the mountlist.  Note that FreeBSD-5 also rearranged thei
r code along the same lines (though this change is independant of their
work).

Revision 1.34: download - view: text, markup, annotated - select for diffs
Sun Jul 4 05:16:30 2004 UTC (9 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +2 -1 lines
When booting from CD, check cd1a and acd1a after cd0a and acd0a, allowing
a CD to be booted off of a second CDRom drive.

When checking for the same rootdev selection as already has been set,
only the major number of the existing rootdev was being checked.  This
prevent other unit numbers from being tried properly (e.g. trying cd1 after
having already tried cd0).

Completely parse the rootdev's (unit, slice, partition) tuple before
trying to look up the device, rather then assuming that devname(0,0,0)
will exist (this is no longer necessarily true).

Revision 1.33: download - view: text, markup, annotated - select for diffs
Tue Jun 15 00:30:53 2004 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_1_0_RC1
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +9 -3 lines
Fix a race with the clearing of p->p_session->s_ttyvp.  NULL the pointer
out before calling vrele() rather then after.

Fix a bug with v_opencount accounting on revoke().  The underlying device
was being closed properly but v_opencount was being decremented which causes
it to go negative when close() is called on the descriptor later on.  To
fix the bug we zero out v_opencount() when the underlying vnode's device
is disassociated and spec_close() now only decrements it when the device is
associated.

Reported-by: GeekGod
Testing-by: GeekGod, Hiten, David Rhodus.

Revision 1.32: download - view: text, markup, annotated - select for diffs
Wed May 26 01:29:58 2004 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +98 -143 lines
ANSIfication and general cleanup.  No operational changes.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Fri May 21 15:41:23 2004 UTC (9 years, 10 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +0 -27 lines
Cleanup pass. Removed code that is not needed anymore.
Cleanup VOP_LEASE() uses and document.

Add in a debug function for buffer pool statistical information which can
be toggled via debug.syncprt.

Revision 1.30: download - view: text, markup, annotated - select for diffs
Wed May 19 22:52:58 2004 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +110 -74 lines
Device layer rollup commit.

* cdevsw_add() is now required.  cdevsw_add() and cdevsw_remove() may specify
  a mask/match indicating the range of supported minor numbers.  Multiple
  cdevsw_add()'s using the same major number, but distinctly different
  ranges, may be issued.  All devices that failed to call cdevsw_add() before
  now do.

* cdevsw_remove() now automatically marks all devices within its supported
  range as being destroyed.

* vnode->v_rdev is no longer resolved when the vnode is created.  Instead,
  only v_udev (a newly added field) is resolved.  v_rdev is resolved when
  the vnode is opened and cleared on the last close.

* A great deal of code was making rather dubious assumptions with regards
  to the validity of devices associated with vnodes, primarily due to
  the persistence of a device structure due to being indexed by (major, minor)
  instead of by (cdevsw, major, minor).  In particular, if you run a program
  which connects to a USB device and then you pull the USB device and plug
  it back in, the vnode subsystem will continue to believe that the device
  is open when, in fact, it isn't (because it was destroyed and recreated).

  In particular, note that all the VFS mount procedures now check devices
  via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev
  is NULL prior to the first open.

* The disk layer's device interaction has been rewritten.  The disk layer
  (i.e. the slice and disklabel management layer) no longer overloads
  its data onto the device structure representing the underlying physical
  disk.  Instead, the disk layer uses the new cdevsw_add() functionality
  to register its own cdevsw using the underlying device's major number,
  and simply does NOT register the underlying device's cdevsw.  No
  confusion is created because the device hash is now based on
  (cdevsw,major,minor) rather then (major,minor).

  NOTE: This also means that underlying raw disk devices may use the entire
  device minor number instead of having to reserve the bits used by the disk
  layer, and also means that can we (theoretically) stack a fully
  disklabel-supported 'disk' on top of any block device.

* The new reference counting scheme prevents this by associating a device
  with a cdevsw and disconnecting the device from its cdevsw when the cdevsw
  is removed.  Additionally, all udev2dev() lookups run through the cdevsw
  mask/match and only successfully find devices still associated with an
  active cdevsw.

* Major work on MFS:  MFS no longer shortcuts vnode and device creation.  It
  now creates a real vnode and a real device and implements real open and
  close VOPs.  Additionally, due to the disk layer changes, MFS is no longer
  limited to 255 mounts.  The new limit is 16 million.  Since MFS creates a
  real device node, mount_mfs will now create a real /dev/mfs<PID> device
  that can be read from userland (e.g. so you can dump an MFS filesystem).

* BUF AND DEVICE STRATEGY changes.  The struct buf contains a b_dev field.
  In order to properly handle stacked devices we now require that the b_dev
  field be initialized before the device strategy routine is called.  This
  required some additional work in various VFS implementations.  To enforce
  this requirement, biodone() now sets b_dev to NODEV.  The new disk layer
  will adjust b_dev before forwarding a request to the actual physical
  device.

* A bug in the ISO CD boot sequence which resulted in a panic has been fixed.

Testing by: lots of people, but David Rhodus found the most aggregious bugs.

Revision 1.29: download - view: text, markup, annotated - select for diffs
Thu Apr 8 17:56:48 2004 UTC (10 years ago) by dillon
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +0 -1 lines
namecache work stage 4:

(1) Remove vnode->v_dd, vnode->v_ddid, namecache->nc_dvp_data, and
namecache->nc_dvp_id.  These identifiers were being used to detect stale
parent directory linkages in the namecache and were leftovers from the
original FreeBSD-4.x namecache topology.  The new namecache topology
actively discards such linkages and does not require them.

(2) Cleanup kern/vfs_cache.c, abstracting out allocation and parent
link/unlink operations into their own procedures.

(3) Formally allow a disjoint topology.  That is, allow the case where
nc_parent is NULL.  When constructing namecache entries (dvp,vp), require
that that dvp be associated with a namecache record so we can create the
proper parent->child linkage.  Since no naming information is known for
dbp, formally allow unnamed namecache records to be created in order to
create the association.

(4) Properly relink parent namecache entries when ".." is entered into
the cache.  This is what relinks a disjoint namecache topology after it
has been partially purged or when the namecache is instantiated in the
middle of the logical topology (and thus disjoint).

Note that the original plan was to not allow a disjoint topology, but after
much hair pulling I've come to the conclusion that it is impossible to do
this.  So the work now formally allows a disjoint topology but also, unlike
the original FreeBSD code, takes pains to try to keep the topology intact
by only recycling 'leaf' vnodes.  This is accomplished by vref()ing a vnode
when its namecache records have children.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Sun Mar 28 07:54:00 2004 UTC (10 years ago) by dillon
Branches: MAIN
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +5 -2 lines
Protect v_usecount with a critical section for now (we depend on the BGL),
and assert that it does not drop below 0.

Suggested-by: David Rhodus <drhodus@machdep.com>

Revision 1.27: download - view: text, markup, annotated - select for diffs
Sun Mar 7 12:09:04 2004 UTC (10 years, 1 month ago) by eirikn
Branches: MAIN
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +23 -0 lines
Move the ASSERT_VOP_LOCKED and ASSERT_VOP_UNLOCKED macros into its own
functions.

Idea taken from: FreeBSD

Revision 1.26: download - view: text, markup, annotated - select for diffs
Mon Mar 1 06:33:17 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +597 -402 lines
Newtoken commit.  Change the token implementation as follows:  (1) Obtaining
a token no longer enters a critical section.  (2) tokens can be held through
schedular switches and blocking conditions and are effectively released and
reacquired on resume.  Thus tokens serialize access only while the thread
is actually running.  Serialization is not broken by preemptive interrupts.
That is, interrupt threads which preempt do no release the preempted thread's
tokens.  (3) Unlike spl's, tokens will interlock w/ interrupt threads on
the same or on a different cpu.

The vnode interlock code has been rewritten and the API has changed.  The
mountlist vnode scanning code has been consolidated and all known races have
been fixed.  The vnode interlock is now a pool token.

The code that frees unreferenced vnodes whos last VM page has been freed has
been moved out of the low level vm_page_free() code and moved to the
periodic filesystem sycer code in vfs_msycn().

The SMP startup code and the IPI code has been cleaned up considerably.
Certain early token interactions on AP cpus have been moved to the BSP.

The LWKT rwlock API has been cleaned up and turned on.

Major testing by: David Rhodus

Revision 1.25: download - view: text, markup, annotated - select for diffs
Tue Feb 10 07:34:42 2004 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +2 -0 lines
Use a globaldata_t instead of a cpuid in the lwkt_token structure.  The
LWKT subsystem already uses globaldata_t instead of cpuid for its thread
td_gd reference, and the IPI messaging code will soon be converted to take
a globaldata_t instead of a cpuid as well.  This reduces the number of
memory indirections we have to make to access the per-cpu globaldata space
in various procedures.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Tue Jan 27 23:56:48 2004 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +12 -8 lines
Try to work-around a DFly-specific crash that can occur in ufs_ihashget()
if the underlying vnode is being reclaimed at the same time.  Bump the
vnodes ref count to interlock against vget's VXLOCK test.

Revision 1.23: download - view: text, markup, annotated - select for diffs
Mon Nov 3 17:11:21 2003 UTC (10 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +13 -1 lines
64 bit address space cleanups which are a prerequisit for future 64 bit
address space work and PAE.  Note: this is not PAE.  This patch basically
adds vm_paddr_t, which represents a 'physical address'.  Physical addresses
may be larger then virtual addresses and on IA32 we make vm_paddr_t a 64
bit quantity.

Submitted-by: Hiten Pandya <hmp@backplane.com>

Revision 1.22: download - view: text, markup, annotated - select for diffs
Thu Oct 9 22:27:19 2003 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +17 -2 lines
namecache work stage 3a: Adjust the VFS APIs to include a namecache pointer
where necessary.  For the moment we pass NULL for these parameters (the old
'dvp' vnode pointer's cannot be ripped out quite yet).

Revision 1.21: download - view: text, markup, annotated - select for diffs
Sun Sep 28 03:44:02 2003 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +8 -31 lines
namecache work stage 2: move struct namecache to its own header file and
have vnode.h include it for now.  Re-engineer the namecache topology to make
it possible to track different parent directories and to make it possible
to namei/lookup paths using the namecache structure as the primary placeholder
rather then a directory vnode.  Add a few minor hacks to stabilize the system
that will be removed (no longer be necessary) in stage 3.  Get rid of the
leafonly sysctl and make its effect the default, but in order to avoid
doing too much in this stage it is still possible to disassociate a vnode
from its namecache entry, which a lot of filesystems (e.g. NFS) depend on
as a poor-man's way of invalidating entries.  The namecache topology itself,
however, will be left intact even if a vnode is disassociated in the middle
of a path.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Tue Sep 23 05:03:51 2003 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +6 -6 lines
namecache work stage 1: namespace cleanups.  Add a NAMEI_ prefix to
CREATE, LOOKUP, DELETE, and RENAME.  Add a CNP_ prefix too all the name
lookup flags (nd_flags) e.g. ISDOTDOT->CNP_ISDOTDOT.

Revision 1.19: download - view: text, markup, annotated - select for diffs
Mon Sep 1 00:35:29 2003 UTC (10 years, 7 months ago) by hmp
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +8 -3 lines
1) Add new tunable, kern.syncdelay:

	kern.syncdelay can be used to change the delay time between
	file system data synchronization.  This is useful when you
	have notebooks.

2) Document the following sysctls:

	kern.dirdelay, kern.metadelay and kern.filedelay

Revision 1.18: download - view: text, markup, annotated - select for diffs
Tue Aug 26 21:09:02 2003 UTC (10 years, 7 months ago) by rob
Branches: MAIN
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +17 -17 lines
__P() removal

Revision 1.17: download - view: text, markup, annotated - select for diffs
Tue Aug 19 18:36:38 2003 UTC (10 years, 7 months ago) by hsu
Branches: MAIN
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +4 -2 lines
Properly handle an error return from udev2dev().

Reviewed by:	dillon

Revision 1.16: download - view: text, markup, annotated - select for diffs
Mon Aug 18 16:45:30 2003 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +8 -8 lines
The make release process tries to stat/open a non-existant device, which
causes addalias() to be called with a NULL dev.  Add code to addaliasu()
to check that the device actually exists before trying to add the vnode to
its alias list.

Found by: ROBERT GARRETT <rg70@sbcglobal.net>, Jeffrey Hsu <hsu@FreeBSD.org>

Revision 1.15: download - view: text, markup, annotated - select for diffs
Sat Jul 26 19:42:11 2003 UTC (10 years, 8 months ago) by rob
Branches: MAIN
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +27 -27 lines
Register keyword removal

Approved by: Matt Dillon

Revision 1.14: download - view: text, markup, annotated - select for diffs
Tue Jul 22 17:03:33 2003 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +2 -2 lines
DEV messaging stage 2/4: In this stage all DEV commands are now being
funneled through the message port for action by the port's beginmsg function.
CONSOLE and DISK device shims replace the port with their own and then
forward to the original.  FB (Frame Buffer) shims supposedly do the same
thing but I haven't been able to test it.   I don't expect instability
in mainline code but there might be easy-to-fix, and some drivers still need
to be converted.  See primarily: kern/kern_device.c (new dev_*() functions and
inherits cdevsw code from kern/kern_conf.c), sys/device.h, and kern/subr_disk.c
for the high points.

In this stage all DEV messages are still acted upon synchronously in the
context of the caller.  We cannot create a separate handler thread until
the copyin's (primarily in ioctl functions) are made thread-aware.

Note that the messaging shims are going to look rather messy in these early
days but as more subsystems are converted over we will begin to use
pre-initialized messages and message forwarding to avoid having to constantly
rebuild messages prior to use.

Note that DEV itself is a mess oweing to its 4.x roots and will be cleaned
up in subsequent passes.  e.g. the way sub-devices inherit the main device's
cdevsw was always a bad hack and it still is, and several functions
(mmap, kqfilter, psize, poll) return results rather then error codes, which
will be fixed since now we have a message to store the result in :-)

Revision 1.13: download - view: text, markup, annotated - select for diffs
Tue Jul 22 05:04:41 2003 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +3 -1 lines
Throw better sanity checks into vfs_hang_addrlist() for argp->ex_addrlen
and argp->ex_masklen which are otherwise totally unchecked from userland.

Revision 1.12: download - view: text, markup, annotated - select for diffs
Sat Jul 19 21:14:39 2003 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +13 -13 lines
Remove the priority part of the priority|flags argument to tsleep().  Only
flags are passed now.  The priority was a user scheduler thingy that is not
used by the LWKT subsystem.  For process statistics assume sleeps without
P_SINTR set to be disk-waits, and sleeps with it set to be normal sleeps.

This commit should not contain any operational changes.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Tue Jul 8 17:21:53 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +10 -7 lines
The syncer is not a process any more, deal with it as a thread.

Revision 1.10: download - view: text, markup, annotated - select for diffs
Sun Jul 6 21:23:51 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +180 -132 lines
MP Implementation 1/2: Get the APIC code working again, sweetly integrate the
MP lock into the LWKT scheduler, replace the old simplelock code with
tokens or spin locks as appropriate.  In particular, the vnode interlock
(and most other interlocks) are now tokens.  Also clean up a few curproc/cred
sequences that are no longer needed.

The APs are left in degenerate state with non IPI interrupts disabled as
additional LWKT work must be done before we can really make use of them,
and FAST interrupts are not managed by the MP lock yet.  The main thing
for this stage was to get the system working with an APIC again.

buildworld tested on UP and 2xCPU/MP (Dell 2550)

Revision 1.9: download - view: text, markup, annotated - select for diffs
Thu Jul 3 17:24:02 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +3 -3 lines
Split the struct vmmeter cnt structure into a global vmstats structure and
a per-cpu cnt structure.  Adjust the sysctls to accumulate statistics
over all cpus.

Revision 1.8: download - view: text, markup, annotated - select for diffs
Fri Jun 27 01:53:25 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +7 -8 lines
proc->thread stage 6: kernel threads now create processless LWKT threads.
A number of obvious curproc cases were removed, tsleep/wakeup was made to
work with threads (wmesg, ident, and timeout features moved to threads).
There are probably a few curproc cases left to fix.

Revision 1.7: download - view: text, markup, annotated - select for diffs
Thu Jun 26 05:55:14 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +13 -17 lines
proc->thread stage 5:  BUF/VFS clearance!  Remove the ucred argument from
vop_close, vop_getattr, vop_fsync, and vop_createvobject.  These VOPs can
be called from multiple contexts so the cred is fairly useless, and UFS
ignorse it anyway.  For filesystems (like NFS) that sometimes need a cred
we use proc0.p_ucred for now.

This removal also removed the need for a 'proc' reference in the related
VFS procedures, which greatly helps our proc->thread conversion.

bp->b_wcred and bp->b_rcred have also been removed, and for the same reason.
It makes no sense to have a particular cred when multiple users can
access a file.  This may create issues with certain types of NFS mounts
but if it does we will solve them in a way that doesn't pollute the
struct buf.

Revision 1.6: download - view: text, markup, annotated - select for diffs
Wed Jun 25 03:55:57 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +81 -114 lines
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread
pointers instead of process pointers as arguments, similar to what FreeBSD-5
did.  Note however that ultimately both APIs are going to be message-passing
which means the current thread context will not be useable for creds and
descriptor access.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Mon Jun 23 17:55:41 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +2 -2 lines
proc->thread stage 2: MAJOR revamping of system calls, ucred, jail API,
and some work on the low level device interface (proc arg -> thread arg).
As -current did, I have removed p_cred and incorporated its functions
into p_ucred.  p_prison has also been moved into p_ucred and adjusted
accordingly.  The jail interface tests now uses ucreds rather then processes.

The syscall(p,uap) interface has been changed to just (uap).  This is inclusive
of the emulation code.  It makes little sense to pass a proc pointer around
which confuses the MP readability of the code, because most system call code
will only work with the current process anyway.  Note that eventually
*ALL* syscall emulation code will be moved to a kernel-protected userland
layer because it really makes no sense whatsoever to implement these
emulations in the kernel.

suser() now takes no arguments and only operates with the current process.
The process argument has been removed from suser_xxx() so it now just takes
a ucred and flags.

The sysctl interface was adjusted somewhat.

Revision 1.4: download - view: text, markup, annotated - select for diffs
Sun Jun 22 17:39:42 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +18 -16 lines
proc->thread stage 1: change kproc_*() API to take and return threads.  Note:
we won't be able to turn off the underlying proc until we have a clean thread
path all the way through, which aint now.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Thu Jun 19 01:55:06 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +3 -1 lines
thread stage 5: Separate the inline functions out of sys/buf.h, creating
sys/buf2.h (A methodology that will continue as time passes).  This solves
inline vs struct ordering problems.

Do a major cleanup of the globaldata access methodology.  Create a
gcc-cacheable 'mycpu' macro & inline to access per-cpu data.  Atomicy is not
required because we will never change cpus out from under a thread, even if
it gets preempted by an interrupt thread, because we want to be able to
implement per-cpu caches that do not require locked bus cycles or special
instructions.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Jun 17 04:28:42 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +1 -0 lines
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jun 17 02:55:07 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
CVS tags: FREEBSD_4_FORK
import from FreeBSD RELENG_4 1.249.2.30

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options