DragonFly BSD

CVS log for src/sys/kern/vfs_syscalls.c

[BACK] Up to [DragonFly] / src / sys / kern

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.135: download - view: text, markup, annotated - select for diffs
Tue Nov 11 00:55:49 2008 UTC (5 years, 8 months ago) by pavalos
Branches: MAIN
CVS tags: HEAD
Diff to: previous 1.134: preferred, unified
Changes since revision 1.134: +30 -0 lines
Add the lchflags() syscall.

This is essentially the same as chflags(), but it operates on the symlink,
not on the underlying file.

Documentation-from: FreeBSD
Reviewed-by: dillon

Revision 1.133.2.1: download - view: text, markup, annotated - select for diffs
Thu Sep 25 02:20:46 2008 UTC (5 years, 10 months ago) by dillon
Branches: DragonFly_RELEASE_2_0
CVS tags: DragonFly_RELEASE_2_0_Slip
Diff to: previous 1.133: preferred, unified; next MAIN 1.134: preferred, unified
Changes since revision 1.133: +22 -9 lines
MFC numerous features from HEAD.

* NFS export support for nullfs mounted filesystems,
  intended for nullfs mounted hammer PFSs.

* Each nullfs mount constructs a unique fsid based on
  the underlying mount.

* Each nullfs mount maintains its own netexport structure.

* The mount pointer in the nch (namecache handle) is passed
  into FHTOVP and friends, allowing operations to occur
  on the underlying vnodes but still go through the nullfs
  mount.

Revision 1.134: download - view: text, markup, annotated - select for diffs
Wed Sep 17 21:44:18 2008 UTC (5 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.133: preferred, unified
Changes since revision 1.133: +22 -9 lines
* Implement the ability to export NULLFS mounts via NFS.

* Enforce PFS isolation when exporting a HAMMER PFS via a NULLFS mount.

NOTE: Exporting anything other then HAMMER PFS root's via nullfs does
NOT protect the parent of the exported directory from being accessed via NFS.

Generally speaking this feature is implemented by giving each nullfs mount
a synthesized fsid based on what is being mounted and implementing the
NFS export infrastructure in the nullfs code instead of just bypassing those
functions to the underyling VFS.

Revision 1.133: download - view: text, markup, annotated - select for diffs
Sat Jun 28 17:59:49 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Preview
Branch point for: DragonFly_RELEASE_2_0
Diff to: previous 1.132: preferred, unified
Changes since revision 1.132: +9 -9 lines
Replace the bwillwrite() subsystem to make it more fair to processes.

* Add new API functions, bwillread(), bwillwrite(), bwillinode() which
  the kernel calls when it intends to read, write, or make inode
  modifications.

* Redo the backend.  Add bd_heatup() and bd_wait().  bd_heatup() heats up
  the buf_daemon, starting it flushing before we hit any blocking conditions
  (similar to the previous algorith).

* The new bwill*() blocking functions no longer introduce escalating delays
  to keep the number of dirty buffers under control.  Instead it takes a page
  from HAMMER and estimates the load caused by the caller, then waits for a
  specific number of dirty buffers to complete their write I/O's before
  returning.  If the buffers can be retired quickly these functions will
  return more quickly.

Revision 1.132: download - view: text, markup, annotated - select for diffs
Mon Jun 23 17:21:58 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.131: preferred, unified
Changes since revision 1.131: +4 -0 lines
Support S_IFDIR mknod() calls for HAMMER.  This is used by the Hammer
utility program to create pseudo-filesystem directories inside HAMMER.

Revision 1.131: download - view: text, markup, annotated - select for diffs
Tue Jun 3 16:16:40 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.130: preferred, unified
Changes since revision 1.130: +2 -1 lines
Do not update f_offset on EINVAL.

Reported-by: VOROSKOI Andras <voroskoi@gmail.com>

Revision 1.130: download - view: text, markup, annotated - select for diffs
Mon Jun 2 20:06:36 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.129: preferred, unified
Changes since revision 1.129: +22 -6 lines
Disallow negative seek positions for regular files, directories, and
character-special devices to conform to OpenGroup specifications.

Reported-by: VOROSKOI Andras <voroskoi@gmail.com>

Revision 1.129: download - view: text, markup, annotated - select for diffs
Sun Jun 1 19:55:30 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.128: preferred, unified
Changes since revision 1.128: +107 -0 lines
Implement a new system call: getvfsstat().  This system call returns
an array of statfs and statvfs structures.  Unfortunately there is no way
to just return an array of statvfs structures because the statvfs structure
does not have sufficient information in it to identify the mount point.

    getvfsstat(struct statfs *buf, struct statvfs *vbuf,
	       long vbufsize, int flags);

Revision 1.128: download - view: text, markup, annotated - select for diffs
Sun Jun 1 19:27:35 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.127: preferred, unified
Changes since revision 1.127: +148 -0 lines
* Implement new system calls in the kernel:  statvfs(), fstatvfs(),
  fhstatvfs().

* Implement a new VFS op, VFS_STATVFS().  Implement a default for this new
  op for VFSs which do not implement VFS_STATVFS(), which calls VFS_STATFS()
  and converts the structure (using Joerg's conversion procedure from libc).

* Remove statvfs(), fstatvfs(), and fhstatvfs() from libc.  These functions
  are now system calls.

Revision 1.127: download - view: text, markup, annotated - select for diffs
Sun May 18 05:54:25 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.126: preferred, unified
Changes since revision 1.126: +10 -2 lines
Fix a number of core kernel issues related to HAMMER operation.

* The cluster code was incorrectly using the maximum IO size from
  the filesystem on which /dev is mounted instead of the maximum
  IO size of the block device.  This became evident when HAMMER
  (with 16K blocks) tried to call cluster_read() via /dev/ad6s1h
  (on UFS with 8K blocks).

* Change the way the VNLRU code works to avoid an infinite loop in
  vmntvnodescan().  The vnode LRU recycling code was cycling vnodes
  from the head of mp->mnt_nvnodelist to the tail.  Under certain heavy
  load conditions this could cause a vmntvnodescan() to never finish
  running and eventually hit a count assertion (at 1,000,000 vnodes scanned).

  Instead of cycling the vnodes in the mnt_nvnodelist, use the syncer
  vnode (mount->mnt_syncer) as a placemarker and move *IT* within the
  list to represent the LRU scan.  By not cycling vnodes to the end
  of the list, vmntvnodescan() can no longer get into an infinite loop.

* Change the mount->mnt_syncer logic slightly to avoid races against
  a background sync while unmounting.  The field is no longer cleared
  by the sync_reclaim() call but is instead cleared by the unmount code
  before vrele()ing the special vnode.

Revision 1.126: download - view: text, markup, annotated - select for diffs
Fri May 9 17:52:17 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.125: preferred, unified
Changes since revision 1.125: +27 -61 lines
Fix a HAMMER assertion which turned out to be a bug in VOP_N*().  Sometimes
the dvp passed to these functions can be reclaimed.  The locked leaf
namecache node is not sufficient to prevent its parent directory from
being reclaimed under heavy loads.

Instead of trying to play cute tricks, actually do a formal reference of
the dvp.  We don't have to lock it, though.

Revision 1.125: download - view: text, markup, annotated - select for diffs
Thu May 8 01:41:05 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.124: preferred, unified
Changes since revision 1.124: +26 -35 lines
Fix a race between the namecache and the vnode recycler.  A vnode cannot be
recycled if it's namecache entry represents a directory with locked children.
The various VOP_N*() functions require the parent dvp to be stable.

The main fix is in vrecycle() (kern/vfs_subr.c).  Do not vgone() the vnode
if we can't clean out the children.

Also create an API to assert that the parent dvp is stable, and make it
vhold/vdrop the dvp.

The race primarily effected HAMMER which uses the VOP_N*() API.

Revision 1.124: download - view: text, markup, annotated - select for diffs
Fri Jan 4 12:16:19 2008 UTC (6 years, 6 months ago) by matthias
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_12_Slip, DragonFly_RELEASE_1_12
Diff to: previous 1.123: preferred, unified
Changes since revision 1.123: +2 -2 lines
Move the following entries from kern to security

- kern.ps_showallprocs
- kern.ps_showallthreads
- kern.unprivileged_read_msgbuf
- kern.hardlink_check_uid
- kern.hardlink_check_gid

This is only a cosmetic change helping users to find the right sysctls
more easily.  And it could help if we want to add more security related
function (eg MAC framework etc).

While here add missing description for three of them.

Revision 1.123: download - view: text, markup, annotated - select for diffs
Tue Nov 20 18:35:46 2007 UTC (6 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.122: preferred, unified
Changes since revision 1.122: +10 -4 lines
Adjust getdirentries() to allow basep to be NULL.  Use off_t for the loff
calculation, but we can't change basep's documented type yet.

Revision 1.122: download - view: text, markup, annotated - select for diffs
Tue Nov 6 03:49:58 2007 UTC (6 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.121: preferred, unified
Changes since revision 1.121: +4 -5 lines
Convert the global 'bioops' into per-mount bio_ops.  For now we also have
to have a per buffer b_ops as well since the controlling filesystem cannot
be located from information in struct buf (b_vp could be the backing store
so that can't be used).  This change allows HAMMER to use bio_ops.

Change the ordering of the bio_ops.io_deallocate call so it occurs before
the buffer's B_LOCKED is checked.  This allows the deallocate call to set
B_LOCKED to retain the buffer in situations where the target filesystem
is unable to immediately disassociate the buffer.  Also keep VMIO intact
for B_LOCKED buffers (in addition to B_DELWRI buffers).

HAMMER will use this feature to keep buffers passively associated with
other filesystem structures and thus be able to avoid constantly brelse()ing
and getblk()ing them.

Revision 1.118.2.1: download - view: text, markup, annotated - select for diffs
Mon Sep 10 15:11:55 2007 UTC (6 years, 10 months ago) by dillon
Branches: DragonFly_RELEASE_1_10
Diff to: previous 1.118: preferred, unified; next MAIN 1.119: preferred, unified
Changes since revision 1.118: +6 -5 lines
MFC 1.120 and 1.121 - fix deadlocks when handling ESTALE from NFS mounts.

Reported-by: elekktretterr@exemail.com.au

Revision 1.121: download - view: text, markup, annotated - select for diffs
Mon Sep 10 15:08:43 2007 UTC (6 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.120: preferred, unified
Changes since revision 1.120: +2 -1 lines
kern_access() had the same bug kern_stat() had with regards to a
vnode/namecache deadlock when dealing with stale NFS mounts.

Reported-by: elekktretterr@exemail.com.au

Revision 1.120: download - view: text, markup, annotated - select for diffs
Mon Sep 3 17:06:21 2007 UTC (6 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.119: preferred, unified
Changes since revision 1.119: +9 -4 lines
Add a MNTK_ flag to the mount structure allowing a VFS to specify that
no submounts under the VFS are to be allowed.  Adjust procfs and linprocfs
to use the feature.

Submitted-by: "Nicolas Thery" <nthery@gmail.com>

Revision 1.119: download - view: text, markup, annotated - select for diffs
Mon Aug 13 17:43:55 2007 UTC (6 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.118: preferred, unified
Changes since revision 1.118: +72 -15 lines
The new VOP_N*() (namespace) operations pass a pointer to a namecache
record.  This information is sufficient for resolving the namespace
operation.  In all cases the parent namecache record already had to have
a resolved vnode so the related directory vnode could be easily extracted
by the VFS.  But this also means the target VFSs had to traverse the
namecache topology up one level which introduced API pollution that
is not compatible with directly translating a VOP to a RPC.

To solve this we now pass a directory vnode along with the namecache pointer.
This vnode is only held, not referenced or vget()d so the target VFS must
still vget() the vnode and/or do whatever it needs to do to validate it.
This gives the target VFS full control over directory locking when performing
namespace operations.  The namespaces themselves are already guarenteed
to be locked due to the fact that the related namecache records are locked.

This change is being made to accomodate USERFS, so we can directly translate
the related VOPs to RPCs without having to reproduce the namecache topology
in the target VFS running in userland.

Revision 1.118: download - view: text, markup, annotated - select for diffs
Thu Jul 19 01:16:39 2007 UTC (7 years ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_10_Slip
Branch point for: DragonFly_RELEASE_1_10
Diff to: previous 1.117: preferred, unified
Changes since revision 1.117: +4 -1 lines
Be a little more verbose when reporting unmount errors.

Revision 1.117: download - view: text, markup, annotated - select for diffs
Tue Jun 26 20:39:33 2007 UTC (7 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.116: preferred, unified
Changes since revision 1.116: +8 -4 lines
A file descriptor of -1 is legal when accessing journal status.  Just allow
it generally, the journal command switch will recheck it on a per-command
basis.

Revision 1.116: download - view: text, markup, annotated - select for diffs
Wed May 9 00:53:34 2007 UTC (7 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.115: preferred, unified
Changes since revision 1.115: +8 -5 lines
Give the device major / minor numbers their own separate 32 bit fields
in the kernel.  Change dev_ops to use a RB tree to index major device
numbers and remove the 256 device major number limitation.

Build a dynamic major number assignment feature into dev_ops_add() and
adjust ASR (which already had a hand-rolled one), and MFS to use the
feature.  MFS at least does not require any filesystem visibility to
access its backing device.  Major devices numbers >= 256 are used for
dynamic assignment.

Retain filesystem compatibility for device numbers that fall within the
range that can be represented in UFS or struct stat (which is a single
32 bit field supporting 8 bit major numbers and 24 bit minor numbers).

Revision 1.115: download - view: text, markup, annotated - select for diffs
Sun May 6 19:23:31 2007 UTC (7 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.114: preferred, unified
Changes since revision 1.114: +1 -1 lines
Use SYSREF to reference count struct vnode.  v_usecount is now
v_sysref(.refcnt).  v_holdcnt is now v_auxrefs.  SYSREF's termination state
(using a negative reference count from -0x40000000+) now places the vnode in
a VCACHED or VFREE state and deactivates it.  The vnode is now assigned a
64 bit unique id via SYSREF.

vhold() (which manipulates v_auxrefs) no longer reactivates a vnode and
is explicitly used only to track references from auxillary structures
and references to prevent premature destruction of the vnode.  vdrop()
will now only move a vnode from VCACHED to VFREE on the 1->0 transition
of v_auxrefs if the vnode is in a termination state.

vref() will now panic if used on a vnode in a termination state.  vget()
must now be used to explicitly reactivate a vnode.  These requirements
existed before but are now explicitly asserted.

vlrureclaim() and allocvnode() should now interact a bit better.  In
particular, vlrureclaim() will do a better job of finding vnodes to flush
and transition from VCACHED to VFREE, and allocvnode() will do a better
job finding vnodes to reuse without getting blocked by a flush.

allocvnode now uses a real VX lock to sequence vnodes into VRECLAIMED.  All
vnode special state processing now uses a VX lock.

Vnodes are now able to be slowly returned to the memory pool when
kern.maxvnodes is reduced at run time.

Various initialization elements have been moved to CTOR/DTOR and are
no longer in the critical path, improving performance.  However, since
SYSREF uses atomic_cmpset_int() (aka cmpxchgl), which reduces performance
somewhat, overall performance tends to be about the same.

Revision 1.114: download - view: text, markup, annotated - select for diffs
Sun Feb 18 07:12:19 2007 UTC (7 years, 5 months ago) by swildner
Branches: MAIN
Diff to: previous 1.113: preferred, unified
Changes since revision 1.113: +0 -1 lines
Remove unused variable.

Revision 1.112.2.1: download - view: text, markup, annotated - select for diffs
Fri Jan 26 18:55:31 2007 UTC (7 years, 6 months ago) by dillon
Branches: DragonFly_RELEASE_1_8
CVS tags: DragonFly_RELEASE_1_8_Slip
Diff to: previous 1.112: preferred, unified; next MAIN 1.113: preferred, unified
Changes since revision 1.112: +35 -5 lines
MFC 1.113 - generate fake "/" entry when chrooted into a subdirectory.

Revision 1.113: download - view: text, markup, annotated - select for diffs
Fri Jan 26 18:05:23 2007 UTC (7 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.112: preferred, unified
Changes since revision 1.112: +35 -5 lines
Fix generation of the mount path for "/" when a process is chrooted into
a subdirectory of a mount point.

Reported-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx>

Revision 1.112: download - view: text, markup, annotated - select for diffs
Wed Jan 24 01:25:47 2007 UTC (7 years, 6 months ago) by dillon
Branches: MAIN
Branch point for: DragonFly_RELEASE_1_8
Diff to: previous 1.111: preferred, unified
Changes since revision 1.111: +43 -22 lines
checkdirs() was being passed the wrong mount point, resulting in a panic
when mounting over an already existing mountpoint and improperly adjusting
the current or root directory for processes when finding a matching ncp
whether they were relative to the old mount or not.

Rewrite and document checkdirs() to fix the problems.

Reported-by: "Vincent Labrecque" <vnc@hush.ai>

Revision 1.111: download - view: text, markup, annotated - select for diffs
Sat Dec 23 23:47:54 2006 UTC (7 years, 7 months ago) by swildner
Branches: MAIN
Diff to: previous 1.110: preferred, unified
Changes since revision 1.110: +1 -2 lines
Ansify function declarations and fix some minor style issues.

In-collaboration-with: Alexey Slynko <slynko@tronet.ru>

Revision 1.110: download - view: text, markup, annotated - select for diffs
Sat Dec 23 00:35:04 2006 UTC (7 years, 7 months ago) by swildner
Branches: MAIN
Diff to: previous 1.109: preferred, unified
Changes since revision 1.109: +5 -5 lines
Rename printf -> kprintf in sys/ and add some defines where necessary
(files which are used in userland, too).

Revision 1.109: download - view: text, markup, annotated - select for diffs
Mon Dec 18 20:41:01 2006 UTC (7 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.108: preferred, unified
Changes since revision 1.108: +2 -2 lines
Rename kvprintf  -> kvcprintf (call-back version)
Rename vprintf   -> kvprintf
Rename vsprintf  -> kvsprintf
Rename vsnprintf -> kvsnprintf

Revision 1.108: download - view: text, markup, annotated - select for diffs
Fri Nov 17 22:20:31 2006 UTC (7 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.107: preferred, unified
Changes since revision 1.107: +1 -0 lines
unresolve the vnode associated with the namecache entry for a mount point
before trying to set a new vnode.  This avoids a panic if you are CD'd into
a mount point before the mount occurs.

Revision 1.107: download - view: text, markup, annotated - select for diffs
Fri Oct 27 04:56:31 2006 UTC (7 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.106: preferred, unified
Changes since revision 1.106: +321 -283 lines
Major namecache work primarily to support NULLFS.

* Move the nc_mount field out of the namecache{} record and use a new
  namecache handle structure called nchandle { mount, ncp } for all
  API accesses to the namecache.

* Remove all mount point linkages from the namecache topology.  Each mount
  now has its own namecache topology rooted at the root of the mount point.

  Mount points are flagged in their underlying filesystem's namecache
  topology but instead of linking the mount into the topology, the flag
  simply triggers a mountlist scan to locate the mount.  ".." is handled
  the same way... when the root of a topology is encountered the scan
  can traverse to the underlying filesystem via a field stored in the
  mount structure.

* Ref the mount structure based on the number of nchandle structures
  referencing it, and do not kfree() the mount structure during a forced
  unmount if refs remain.

These changes have the following effects:

* Traversal across mount points no longer require locking of any sort,
  preventing process blockages occuring in one mount from leaking across
  a mount point to another mount.

* Aliased namespaces such as occurs with NULLFS no longer duplicate the
  namecache topology of the underlying filesystem.  Instead, a NULLFS
  mount simply shares the underlying topology (differentiating between
  it and the underlying topology by the fact that the name cache
  handles { mount, ncp } contain NULLFS's mount pointer.

  This saves an immense amount of memory and allows NULLFS to be used
  heavily within a system without creating any adverse impact on kernel
  memory or performance.

* Since the namecache topology for a NULLFS mount is shared with the
  underyling mount, the namecache records are in fact the same records
  and thus full coherency between the NULLFS mount and the underlying
  filesystem is maintained by design.

* Future efforts, such as a unionfs or shadow fs implementation, now
  have a mount structure to work with.  The new API is a lot more
  flexible then the old one.

Revision 1.106: download - view: text, markup, annotated - select for diffs
Tue Sep 19 18:17:46 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.105: preferred, unified
Changes since revision 1.105: +42 -10 lines
Check that namecache references to the mount point are no longer present
before unmounting a filesystem.  Forced unmounts ignore the check but will
print a warning.

This patch is primarily designed to prevent nullfs partitions from being
unmounted while processes are still present within them.  The normal vnode
check does not work for nullfs mounts since nullfs mounts do not hold any
vnodes of their own.

Note that this will cause a warning to be reported for the root filesystem
when rebooting.

Revision 1.105: download - view: text, markup, annotated - select for diffs
Tue Sep 19 16:06:11 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.104: preferred, unified
Changes since revision 1.104: +47 -17 lines
Remove the last bits of code that stored mount point linkages in vnodes.
Mount point linkages are now ENTIRELY a function of the namecache topology,
made possible by DragonFly's advanced namecache.

This fixes a number of problems with NULLFS and adds two major features to
our NULLFS mounting capabilities.

NULLFS mounting paths NO LONGER NEED TO BE DISTINCT.  For example, you
can now safely do things like 'mount_null -o ro / /fubar/jail1' without
creating a recursion and you can now create SUB-MOUNTS within nullfs
mounts, such as 'mount_null -o ro /usr /fubar/jail1/usr', without creating
problems in the original master partitions.

The result is that NULLFS can now be used to glue arbitrary pieces of
filesystems together using a mixture of read-only and read-write NULLFS
mounts for situations where localhost NFS mounts had to be used before.
Jail or chroot construction is now utterly trivial.

With-input-from: Joerg Sonnenberger <joerg@britannica.bec.de>

Revision 1.104: download - view: text, markup, annotated - select for diffs
Mon Sep 18 18:19:33 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.103: preferred, unified
Changes since revision 1.103: +60 -7 lines
Set f_ncp in the struct file unconditionally.  Previously we only set it
when opening directories.  This allows the f*() system calls such as
fchmod() to check the actual mount point instead of the aliased mount
point (in the case of a NULLFS mount).  Also, the fstat program will
properly report the path for descriptors opened via nullfs mounts.

Add code to all f*() system calls such as fchmod() to check f_ncp
in order to detect read-only nullfs mounts.

Revision 1.103: download - view: text, markup, annotated - select for diffs
Mon Sep 18 17:42:27 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.102: preferred, unified
Changes since revision 1.102: +5 -4 lines
Disallow writes to filesystems mounted read-only via NULLFS.  In this case
the ncp->nc_mount in the namecache must be checked since the vnode's
mount point is the actual filesystem and not the NULLFS mount.

Reported-by: Joerg Sonnenberger <joerg@britannica.bec.de>

Revision 1.102: download - view: text, markup, annotated - select for diffs
Tue Sep 5 00:55:45 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.101: preferred, unified
Changes since revision 1.101: +11 -11 lines
Rename malloc->kmalloc, free->kfree, and realloc->krealloc.  Pass 1

Revision 1.101: download - view: text, markup, annotated - select for diffs
Sat Aug 19 17:27:23 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.100: preferred, unified
Changes since revision 1.100: +0 -3 lines
VNode sequencing and locking - part 4/4 - subpart 1 of many.

Move the vnode lock for VOP_READDIR out of the kernel upper layers and
into the filesystem.

Revision 1.100: download - view: text, markup, annotated - select for diffs
Sat Aug 12 00:26:20 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.99: preferred, unified
Changes since revision 1.99: +11 -11 lines
VNode sequencing and locking - part 3/4.

VNode aliasing is handled by the namecache (aka nullfs), so there is no
longer a need to have VOP_LOCK, VOP_UNLOCK, or VOP_ISSLOCKED as 'VOP'
functions.  Both NFS and DEADFS have been using standard locking functions
for some time and are no longer special cases.  Replace all uses with
native calls to vn_lock, vn_unlock, and vn_islocked.

We can't have these as VOP functions anyhow because of the introduction of
the new SYSLINK transport layer, since vnode locks are primarily used to
protect the local vnode structure itself.

Revision 1.99: download - view: text, markup, annotated - select for diffs
Fri Aug 11 01:54:59 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.98: preferred, unified
Changes since revision 1.98: +4 -4 lines
VNode sequencing and locking - part 2/4.

Control access to v_usecount and v_holdcnt with the vnode's lock's spinlock.
Use the spinlock to interlock the VRECLAIMED and VINACTIVE flags during
1->0 and 0->1 transitions.  N->N+1 transitions do not need to obtain the
spinlock and simply use a locked bus cycle increment.  Vnode operations
are still not MP safe but this gets further along that road.

The lockmgr can no longer fail when obtaining an exclusive lock, remove
the error code return from vx_lock() and vx_get().  Add special lockmgr
support routines to atomically acquire and release an exclusive lock
when the caller is already holding the spinlock.

The removal of vnodes from the vnode free list is now defered.  Removal
only occurs when allocvnode() encounters a vnode on the list which should
not be on it.  This improves critical code paths for vget(), vput() and
vrele() by removing unnecessary manipulation of the freelist.

Fix a lockmgr bug where wakeup() was being called with a spinlock held.
Instead, defer the wakeup until after the spinlock is released.

Revision 1.98: download - view: text, markup, annotated - select for diffs
Tue Aug 8 03:52:40 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.97: preferred, unified
Changes since revision 1.97: +1 -1 lines
LK_NOPAUSE no longer serves a purpose, scrap it.

Revision 1.97: download - view: text, markup, annotated - select for diffs
Tue Jul 18 22:22:12 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.96: preferred, unified
Changes since revision 1.96: +10 -10 lines
Remove several layers in the vnode operations vector init code.  Declare
the operations vector directly instead of via a descriptor array.  Remove
most of the recalculation code, it stopped being needed over a year ago.

This work is similar to what FreeBSD now does, but was developed along a
different line.  Ultimately our vop_ops will become SYSLINK ops for userland
VFS and clustering support.

Revision 1.96: download - view: text, markup, annotated - select for diffs
Mon Jun 5 07:26:10 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.95: preferred, unified
Changes since revision 1.95: +53 -53 lines
Modify kern/makesyscall.sh to prefix all kernel system call procedures
with "sys_".  Modify all related kernel procedures to use the new naming
convention.  This gets rid of most of the namespace overloading between
the kernel and standard header files.

Revision 1.95: download - view: text, markup, annotated - select for diffs
Thu Jun 1 06:10:50 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.94: preferred, unified
Changes since revision 1.94: +5 -5 lines
Use the MP friendly objcache instead of zalloc to allocate temporary
MAXPATHLEN space.

Revision 1.94: download - view: text, markup, annotated - select for diffs
Thu May 25 07:36:34 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.93: preferred, unified
Changes since revision 1.93: +70 -19 lines
Convert almost all of the remaining manual traversals of the allproc
list over to allproc_scan().

The allproc_scan() code is MPSAFE, and code which before just cached
a proc pointer now PHOLD's it as well, but access to the various proc
fields is *NOT* yet MPSAFE.  Still, we are closer now.

Revision 1.93: download - view: text, markup, annotated - select for diffs
Wed May 24 03:23:31 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.92: preferred, unified
Changes since revision 1.92: +65 -40 lines
spinlock more of the file descriptor code.  No appreciable difference in
performance on buildworld tests.

Change getvnode() to holdvnode() and use semantics similar to holdsock().
The old getvnode() code wasn't fhold()ing the file pointer.  The new
holdvnode() code does.

Revision 1.92: download - view: text, markup, annotated - select for diffs
Mon May 22 21:21:21 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.91: preferred, unified
Changes since revision 1.91: +27 -49 lines
Do a major cleanup of the file descriptor handling code in preparation for
making the descriptor table MPSAFE.  Introduce a new feature that allows a
file descriptor number to be reserved without having to assign a file
pointer to it.  This allows code such as open(), dup(), etc to reserve
descriptors to work with without having to worry about the related file
being ripped out from under them by another thread sharing the descriptor
table.

falloc() -	This function allocates the file pointer and descriptor as
		before, but does NOT associate the file pointer with the
		descriptor.

		Before this change another thread could access the file
		pointer while the system call creating it was blocked,
		before the system call had a chance to completely initialize
		the file pointer.

		The caller must call fsetfd() to assign or clear the
		reserved descriptor.

fsetfd() -	Is now responsible for associating a file pointer with a
		previously reserved descriptor or clearing the reservation.

fdealloc() -	This hack existed to deal with open/dup races against other
		threads.  The above changes remove the possibility so this
		routine has been deleted.

dup code -	kern_dup() and dupfdopen() have been completely rewritten.
		They are much cleaner and less obtuse now.  Additional race
		conditions in the original code were also found and fixed.

funsetfd() -	Now returns the file pointer that was cleared and takes
		responsibility for adjusting fd_lastfile.

		NOTE: fd_lastfile is inclusive of any reserved descriptors.

fdcopy() -	While not yet MPSAFE, fdcopy now properly handles races
		against other threads.

fdp->fd_lastfile -
		This field was not being properly updated in certain failure
		cases.  This commit fixes that.  Also, if all a process's
		descriptors were closed this field was incorrectly left at
		0 when it should have been set to -1.

fdp->fd_files -	A number of code blocks were trying to optimize a for()
		loop over all file descriptors by caching a pointer to
		fd_files.  This is a problem because fd_files can be
		reallocated if code within the loop blocks.  These loops
		have been rewritten.

Revision 1.91: download - view: text, markup, annotated - select for diffs
Fri May 19 07:33:45 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.90: preferred, unified
Changes since revision 1.90: +17 -27 lines
Convert most manual accesses to filedesc->fd_files[] into the appropriate
holdfp() call.  Fix a number of places where ops were being executed
on the file pointer without holding a private reference to it (mainly
fo_ioctl(), revoke(), and lseek()).

Create procedures in kern_descrip.c to set and clear descriptor flags
and to handle the bootstrap filedesc for proc0.  Replace manual code
elsewhere with calls to the new procedures.

Move getvnode() to kern_descrip.c.  Remove nsmb_getfp().  Use holdfp()
instead.

Revision 1.90: download - view: text, markup, annotated - select for diffs
Fri May 19 05:15:35 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.89: preferred, unified
Changes since revision 1.89: +5 -20 lines
Consolidate the file descriptor destruction code used when a newly created
file descriptor must be destroyed due to an error into a new procedure,
fdealloc(), rather then manually repeating it over and over again.

Move holdsock() and holdfp() into kern/kern_descrip.c.

Revision 1.89: download - view: text, markup, annotated - select for diffs
Sun May 7 19:17:13 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.88: preferred, unified
Changes since revision 1.88: +8 -6 lines
Remove the internal F_FLOCK flag.  Either F_POSIX or F_FLOCK must be set,
so just use F_POSIX to indicate whether its a posix style lock or an flock
style lock.

Revision 1.88: download - view: text, markup, annotated - select for diffs
Sat May 6 18:48:52 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.87: preferred, unified
Changes since revision 1.87: +18 -18 lines
Remove the thread argument from all mount->vfs_* function vectors,
replacing it with a ucred pointer when applicable.  This cleans up a
considerable amount of VFS function code that previously delved into
the process structure to get the cred, though some code remains.

Get rid of the compatibility thread argument for hpfs and nwfs.  Our
lockmgr calls are now mostly compatible with NetBSD (which doesn't use a
thread argument either).

Get rid of some complex junk in fdesc_statfs() that nobody uses.

Remove the thread argument from dounmount() as well as various other
filesystem specific procedures (quota calls primarily) which no longer
need it due to the lockmgr, VOP, and VFS cleanups.  These cleanups also
have the effect of making the VFS code slightly less dependant on the
calling thread's context.

Revision 1.87: download - view: text, markup, annotated - select for diffs
Sat May 6 06:38:38 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.86: preferred, unified
Changes since revision 1.86: +15 -15 lines
The fdrop() procedure no longer needs a thread argument, remove it.

Revision 1.86: download - view: text, markup, annotated - select for diffs
Sat May 6 02:43:12 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.85: preferred, unified
Changes since revision 1.85: +37 -39 lines
The thread/proc pointer argument in the VFS subsystem originally existed
for...  well, I'm not sure *WHY* it originally existed when most of the
time the pointer couldn't be anything other then curthread or curproc or
the code wouldn't work.  This is particularly true of lockmgr locks.

Remove the pointer argument from all VOP_*() functions, all fileops functions,
and most ioctl functions.

Revision 1.85: download - view: text, markup, annotated - select for diffs
Fri May 5 21:27:53 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.84: preferred, unified
Changes since revision 1.84: +6 -6 lines
Remove the thread_t argument from vfs_busy() and vfs_unbusy().  Passing a
thread_t to these functions has always been questionable at best.

Revision 1.84: download - view: text, markup, annotated - select for diffs
Fri May 5 21:15:09 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.83: preferred, unified
Changes since revision 1.83: +20 -20 lines
Simplify vn_lock(), VOP_LOCK(), and VOP_UNLOCK() by removing the thread_t
argument.  These calls now always use the current thread as the lockholder.
Passing a thread_t to these functions has always been questionable at best.

Revision 1.83: download - view: text, markup, annotated - select for diffs
Fri May 5 20:15:01 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.82: preferred, unified
Changes since revision 1.82: +3 -3 lines
Remove the thread pointer argument to lockmgr().  All lockmgr() ops use the
current thread.

Move the lockmgr code in BUF_KERNPROC to lockmgr_kernproc().  This code
allows the lock owner to be set to a special value so any thread can unlock
the lock and is required for B_ASYNC I/O so biodone() can release the lock.

Revision 1.82: download - view: text, markup, annotated - select for diffs
Sun Apr 23 03:08:02 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.81: preferred, unified
Changes since revision 1.81: +3 -3 lines
Remove the now unused interlock argument to the lockmgr() procedure.
This argument has been abused over the years by kernel programmers
attempting to optimize certain locking and data modification sequences,
resulting in a virtually unreadable code in some cases.  The interlock
also made porting between BSDs difficult as each BSD implemented their
interlock differently.  DragonFly has slowly removed use of the interlock
argument and we can now finally be rid of it entirely.

Revision 1.81: download - view: text, markup, annotated - select for diffs
Sun Apr 23 00:54:11 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.80: preferred, unified
Changes since revision 1.80: +4 -3 lines
Get rid of LK_DRAIN in dounmount().  LK_DRAIN locks are not SMP friendly and
can lead to structural pointer races against free() operations.  Struct mount
is protected by the MNTK_UNMOUNT flag.

Revision 1.80: download - view: text, markup, annotated - select for diffs
Sat Apr 1 20:46:47 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.79: preferred, unified
Changes since revision 1.79: +2 -15 lines
Use the vnode v_opencount and v_writecount universally.  They were previously
only used by specfs.  Require that VOP_OPEN and VOP_CLOSE calls match.
Assert on boundary errors.

Clean up umount's FORCECLOSE mode.  Adjust deadfs to allow duplicate closes
(which can happen due to a forced unmount or revoke).

Add vop_stdopen() and vop_stdclose() and adjust the default vnode ops to
call them.  All VFSs except DEADFS which supply their own vop_open and
vop_close now call vop_stdopen() and vop_stdclose() to handle v_opencount
and v_writecount adjustments.

Change the VOP_OPEN/fp specs.  VOP_OPEN (aka vop_stdopen) is now responsible
for filling in the file pointer information, rather than the caller of
VOP_OPEN.  Additionally, when supplied a file pointer, VOP_OPEN is now
allowed to populate the file pointer with a different vnode then the one
passed to it, which will be used later on to allow filesystems which
synthesize different vnodes on open, for example so we can create a generic
tty/pty pairing devices rather than scanning for an unused pty, and so we
can create swap-backed generic anonymous file descriptors rather than having
to use /tmp.  And for other purposes as well.

Fix UFS's mount/remount/unmount code to make the proper VOP_OPEN and
VOP_CLOSE calls when a filesystem is remounted read-only or read-write.

Revision 1.79: download - view: text, markup, annotated - select for diffs
Wed Mar 29 18:44:50 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.78: preferred, unified
Changes since revision 1.78: +14 -13 lines
Remove VOP_GETVOBJECT, VOP_DESTROYVOBJECT, and VOP_CREATEVOBJECT.  Rearrange
the VFS code such that VOP_OPEN is now responsible for associating a VM
object with a vnode.  Add the vinitvmio() helper routine.

Revision 1.78: download - view: text, markup, annotated - select for diffs
Mon Mar 27 16:18:34 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.77: preferred, unified
Changes since revision 1.77: +0 -7 lines
Remove NQNFS support.  The mechanisms are too crude to co-exist with
upcoming cache coherency management work and the original implementation
hacked up the NFS code pretty severely.

Move nqnfs_clientd() out of nfs_nqlease.c to a new file, nfs_kerb.c,
and rename it nfs_clientd().

Revision 1.77: download - view: text, markup, annotated - select for diffs
Thu Mar 2 19:07:59 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.76: preferred, unified
Changes since revision 1.76: +1 -1 lines
Pass LK_PCATCH instead of trying to store tsleep flags in the lock
structure, so multiple entities competing for the same lock do not
use unexpected flags when sleeping.

Only NFS really uses PCATCH with lockmgr locks.

Revision 1.76: download - view: text, markup, annotated - select for diffs
Wed Jan 4 18:11:26 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.75: preferred, unified
Changes since revision 1.75: +15 -18 lines
Clean up unmount() by removing the vnode resolution requirement.  Just
use the namecache to validate the mount point.

Submitted-by: Csaba Henk <csaba.henk@creo.hu>

Revision 1.75: download - view: text, markup, annotated - select for diffs
Fri Dec 2 17:29:45 2005 UTC (8 years, 8 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_4_Slip, DragonFly_RELEASE_1_4
Diff to: previous 1.74: preferred, unified
Changes since revision 1.74: +2 -1 lines
cred may be NULL due to a prior error code.  crhold() handles NULL creds,
but crfree() does not.  Check for NULL.

Reported-by: Stefan Krueger <skrueger@meinberlikomm.de>

Revision 1.74: download - view: text, markup, annotated - select for diffs
Sun Oct 9 18:07:55 2005 UTC (8 years, 9 months ago) by corecode
Branches: MAIN
Diff to: previous 1.73: preferred, unified
Changes since revision 1.73: +5 -4 lines
1:1 Userland threading stage 2.5/4:

Remove compatibility p_dupfd and use the per-lwp one.

Revision 1.73: download - view: text, markup, annotated - select for diffs
Thu Sep 29 20:59:30 2005 UTC (8 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.72: preferred, unified
Changes since revision 1.72: +54 -1 lines
Implement sysctls to restrict a user's ability to hardlink files owned by
other users or groups.  These sysctls are in addition to checks already made
(that the user must also be able to write to the file via user, group,
or world perms).

kern.hardlink_check_uid		If set the user must own the file to
				be able to create a hardlink, or be root.

kern.hardlink_check_gid		If set the user must either own the file
				or be a member of the same group as the
				file, or be root.

				Setting both flags is equivalent to just
				setting the uid flag.

Taken from FreeBSD with slightly different semantics for hardlink_check_gid.
In DragonFly, if hardlink_check_gid is set, the file can still be hardlinked
if the user is not a member of the file's group if the user owns the file.
non-group membership is quite common due to group inheritance from the
parent directory when a file or directory is created by the user and
disallowing the case would make hardlink_check_gid non-useful.

Submitted-by: Matthias Schmidt <schmidtm@mathematik.uni-marburg.de>

Revision 1.72: download - view: text, markup, annotated - select for diffs
Sat Sep 17 07:43:00 2005 UTC (8 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.71: preferred, unified
Changes since revision 1.71: +0 -11 lines
Add an argument to vfs_add_vnodeops() to specify VVF_* flags for the vop_ops
structure.  Add a new flag called VVF_SUPPORTS_FSMID to indicate filesystems
which support persistent storage of FSMIDs.  Rework the FSMID code a bit
to reduce overhead.

Use the spare field in the UFS inode structure to implement a persistent
FSMID.  The FSMID is recursively marked in the namecache but not adjusted
until the next getattr() call on the related inode(s), or when the vnode
is reclaimed.

Revision 1.60.2.1: download - view: text, markup, annotated - select for diffs
Thu Sep 15 19:01:34 2005 UTC (8 years, 10 months ago) by dillon
Branches: DragonFly_RELEASE_1_2
CVS tags: DragonFly_RELEASE_1_2_Slip
Diff to: previous 1.60: preferred, unified; next MAIN 1.61: preferred, unified
Changes since revision 1.60: +15 -3 lines
MFC 1.69 - Fix a rename bug when renaming a hardlink over itself by a
different name.

Revision 1.71: download - view: text, markup, annotated - select for diffs
Fri Sep 2 07:16:58 2005 UTC (8 years, 11 months ago) by hsu
Branches: MAIN
Diff to: previous 1.70: preferred, unified
Changes since revision 1.70: +3 -3 lines
Now that the C language has a "void *", use it instead of caddr_t.

Revision 1.70: download - view: text, markup, annotated - select for diffs
Thu Aug 25 18:34:14 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.69: preferred, unified
Changes since revision 1.69: +11 -0 lines
Implement FSMID.  Use one of the spare 64 bit fields in the stat structure
for the FSMID.   The FSMID is a recursively updated field which allows one
to determine whether a subdirectory hierarchy has changed simply by checking
the base directory of the desired hierarchy.  The new field is st_fsmid.

The initial implementation stores the FSMID in the namecache, which means that
the FSMID will indicate a false change if a namecache entry is destroyed and
recreated.  A more deterministic test can be made by holding a file or
directory descriptor open.  However, it should be noted that DragonFly
implements a coherent and hierarchically consistent namecache so simply having
a subdirectory or file open will prevent the namecache records from that point
through to the root from being destroyed.

The FSMID can be used to greatly reduce the directories that must be searched
when synchronizing a filesystem.  The immediate intention is to use it to
provide a more efficient way to resynchronize a mirror (to generate journal
records 'diff'ing the current filesystem against a mirror), to improve
filesystem mirroring utilities, and to provide for an alternative backup
strategy that involves generating a diff set between two filesystems.
Normally such schemes would require the entire filesystem to be scanned, but
with FSMID the number of directories that must be searched can be greatly
reduced.

TODO: It is desireable for the FSMID information to be stored more permanently
in the inode to survive reboots and to not return false hits due to namecache
thrash.

Note that the FSMID facility does not work on an NFS client if the NFS server
or some other client modifies the filesystem.

Revision 1.69: download - view: text, markup, annotated - select for diffs
Mon Aug 15 07:26:47 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.68: preferred, unified
Changes since revision 1.68: +15 -3 lines
UFS sometimes reports: 'ufs_rename: fvp == tvp (can't happen)'.  The case
is not supposed to be able to happen, and UFS ignores the rename operation
when it sees it.   This is true in both FreeBSD and DragonFly.

But, in fact, the case CAN happen if you rename a file to another that
happens to be a hardlink to the first.  The rename operations appears to
succeed but winds up being a NOP because UFS incorrectly believes that the
case represents renaming a file to itself when it doesn't.  Both files
remain in existance when the source file should have been removed.

Detect the condition and issue VOP_NREMOVE instead of VOP_NRENAME when
the source and target represent different namespaces but wind up pointing
to the same physical vnode.

Reported-by: =?ISO-8859-2?Q?Toma=BE_Bor=B9tnar?= <tomaz.borstnar@amis.net>

Revision 1.68: download - view: text, markup, annotated - select for diffs
Sun Aug 14 18:41:13 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.67: preferred, unified
Changes since revision 1.67: +11 -2 lines
Fix a race in rename when relocking the source namecache entry.  Since we
may have blocked previously it is possible for the namecache entry to become
invalid (not destroyed since we hold a ref, but invalid).  For example, if
the source was removed.  This case only occurs when rename() is racing
against a remove() or another rename that is overwriting the target that
represents our 'from' name.

The race resulted in a NULL pointer dereference.

Reported-by: =?ISO-8859-2?Q?Toma=BE_Bor=B9tnar?= <tomaz.borstnar@amis.net>

Revision 1.67: download - view: text, markup, annotated - select for diffs
Tue Aug 9 20:14:16 2005 UTC (8 years, 11 months ago) by joerg
Branches: MAIN
Diff to: previous 1.66: preferred, unified
Changes since revision 1.66: +5 -4 lines
Pass the direction to kern_getdirentries, it will be used by the
emulation layer soon without transfering the data to userland first.

Revision 1.66: download - view: text, markup, annotated - select for diffs
Wed Aug 3 04:59:53 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.65: preferred, unified
Changes since revision 1.65: +0 -33 lines
BUF/BIO cleanup 2/99:

Localise buffer queue information into kern/vfs_bio.c, it should not be
messed with outside of the named file.  Convert the QUEUE_* #defines
into enum bufq_type, prefix the names with 'B'.  The change to initpbuf()
is acceptable since they are a hack anyway, not to mention that

Move vfs_bufstats() from kern/vfs_syscalls.c into kern/vfs_bio.c since
that's where it should really belong, atleast till its use is cleaned.

Move bufqueues extern from sys/buf.h into kern/vfs_bio.c as it shouldn't
be messed with by anything else.  It was only sitting in sys/buf.h
because of vfs_bufstats().

Note the change to initpbuf() is acceptable since they are a hack anyway,
not to mention that the said function and friends should probably reside
in kern/vfs_bio.c.

Revision 1.65: download - view: text, markup, annotated - select for diffs
Sat Jul 23 23:26:50 2005 UTC (9 years ago) by joerg
Branches: MAIN
Diff to: previous 1.64: preferred, unified
Changes since revision 1.64: +0 -90 lines
Remove partial NetBSD support. It's pointless to have an emulation of
three syscalls (stat, lstat and fstat), the rest was never finished.

Discussed-with: dillon

Revision 1.64: download - view: text, markup, annotated - select for diffs
Wed Jun 22 01:33:21 2005 UTC (9 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.63: preferred, unified
Changes since revision 1.63: +9 -9 lines
File descriptor cleanup stage 2, remove the separate arrays for file
pointers, fileflags, and allocation counts and replace the mess with a
single structural array.  Also revamp the code that checks whether the
file descriptor array is built-in or allocated.

Note that the removed malloc's were doing something weird, allocating
'nf * OFILESIZE + 1' bytes instead of 'nf * OFILESIZE' bytes.  I could
not find any reason at all why it was doing that.  It's gone now anyway.

Revision 1.63: download - view: text, markup, annotated - select for diffs
Tue Jun 21 23:58:53 2005 UTC (9 years, 1 month ago) by hsu
Branches: MAIN
Diff to: previous 1.62: preferred, unified
Changes since revision 1.62: +3 -3 lines
Replace the linear search in file descriptor allocation with an O(log N)
algorithm based on full in-place binary search trees augmented with
subtree free file descriptor counts.

Idea from:	Solaris

Revision 1.62: download - view: text, markup, annotated - select for diffs
Mon Jun 6 15:02:28 2005 UTC (9 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.61: preferred, unified
Changes since revision 1.61: +3 -3 lines
Remove spl*() calls from kern, replacing them with critical sections.
Change the meaning of safepri from a cpl mask to a thread priority.
Make a minor adjustment to tests within one of the buffer cache's
critical sections.

Revision 1.61: download - view: text, markup, annotated - select for diffs
Tue Apr 19 17:54:42 2005 UTC (9 years, 3 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Stable
Diff to: previous 1.60: preferred, unified
Changes since revision 1.60: +122 -109 lines
Abstract out the routines which manipulate the mountlist.

Introduce an MP-safe mountlist scanning function.  This function keeps track
of scans which are in-progress and properly handles ripouts that occur during
the callback by advancing the matching pointers being tracked.  The callback
can safely block without confusing the scan.

This algorithm has already been successfully used for the buffer cache and
will soon be used for the vnode lists hanging off the mount point.

Revision 1.60: download - view: text, markup, annotated - select for diffs
Tue Mar 29 00:35:55 2005 UTC (9 years, 4 months ago) by drhodus
Branches: MAIN
Branch point for: DragonFly_RELEASE_1_2
Diff to: previous 1.59: preferred, unified
Changes since revision 1.59: +44 -44 lines
Remove some uses of the SCARG macro.

Revision 1.59: download - view: text, markup, annotated - select for diffs
Tue Mar 22 22:13:28 2005 UTC (9 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.58: preferred, unified
Changes since revision 1.58: +8 -0 lines
Start working on the full-duplex journaling feature, where the target can
acknowledge the sequence space to prevent information loss if a journaling
stream is interrupted.  Implement a skeleton for the receiver thread.

Delete journals associated with a mount point that is undergoing an unmount.
(reported-by: Fabian <fabian.duelli@bluewin.ch>)

Revision 1.58: download - view: text, markup, annotated - select for diffs
Wed Feb 2 21:34:18 2005 UTC (9 years, 5 months ago) by joerg
Branches: MAIN
Diff to: previous 1.57: preferred, unified
Changes since revision 1.57: +90 -0 lines
Don't use the statfs field f_mntonname in filesystems. For the userland
export code, it can synthesized from mnt_ncp.
For debugging code, use f_mntfromname, it should be enough to find
culprit. The vfs_unmountall doesn't use code_fullpath to avoid problems
with resource allocation and to make it more likely that a call from ddb
succeds.
Change getfsstat and fhstatfs to not show directories outside a chroot
path, with the exception of the filesystem counting the chroot root itself.

Revision 1.57: download - view: text, markup, annotated - select for diffs
Tue Feb 1 21:52:11 2005 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.56: preferred, unified
Changes since revision 1.56: +1 -1 lines
Fix bug in last commit that broke 'df'.  'sfsp' is now a structural pointer
so we increment it by one, not by sizeof(*sp).

Revision 1.56: download - view: text, markup, annotated - select for diffs
Tue Feb 1 13:55:49 2005 UTC (9 years, 6 months ago) by joerg
Branches: MAIN
Diff to: previous 1.55: preferred, unified
Changes since revision 1.55: +5 -6 lines
Remove SCARG junk.

Suggested-by: drhodus

Revision 1.55: download - view: text, markup, annotated - select for diffs
Mon Jan 31 17:20:48 2005 UTC (9 years, 6 months ago) by joerg
Branches: MAIN
Diff to: previous 1.54: preferred, unified
Changes since revision 1.54: +10 -10 lines
Uncomment the entry for kern_chrot in kern_syscall.h and change the
implementation to take the namecache entry directly.

Revision 1.54: download - view: text, markup, annotated - select for diffs
Thu Jan 27 19:46:48 2005 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.53: preferred, unified
Changes since revision 1.53: +1 -0 lines
Mount points use a special empty namecache entry to transition from one
filesystem to another.  It is possible for a stale entry to remain intact
from a prior mount so be sure the new entry is set to an unresolved state
so it is properly re-resolved on later access.

Revision 1.53: download - view: text, markup, annotated - select for diffs
Sun Jan 9 03:04:51 2005 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.52: preferred, unified
Changes since revision 1.52: +2 -2 lines
Add support for retrieving the journal status via mountctl.  Increase some
of the buffer limits.

Revision 1.52: download - view: text, markup, annotated - select for diffs
Wed Dec 29 02:40:02 2004 UTC (9 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.51: preferred, unified
Changes since revision 1.51: +23 -5 lines
Journaling layer work.

* Adjust the new mountctl syscall to make the passed file descriptor an
  explicit argument rather then storing the fd in the control structure.
  Convert the fd to a file pointer to make kern_mountctl() callable from
  a pure thread.

* Get rid of vop_stdmountctl and just have the VOP default ops call
  journal_mountctl(), which makes things less confusing.

* Get more of the journaling infrastructure working.  Basic installation
  and removal of the journaling structure and the creation and destruction
  of the worker thread and stream file pointer now works (with lots of XXX's).

* Add a journaling vector for VOP_NMKDIR to test the journaling VOP ops shim.

Revision 1.51: download - view: text, markup, annotated - select for diffs
Tue Dec 28 04:39:59 2004 UTC (9 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.50: preferred, unified
Changes since revision 1.50: +1 -1 lines
Fix a range check bug in lseek()

Revision 1.50: download - view: text, markup, annotated - select for diffs
Fri Dec 24 05:00:17 2004 UTC (9 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.49: preferred, unified
Changes since revision 1.49: +113 -0 lines
Journaling layer work.  Add a new system call, mountctl, which will be used
to manage the journaling layer.  Add a new VOP, VOP_MOUNTCTL, which will
be used to pass mountctl operations down into the VFS layer.

Revision 1.49: download - view: text, markup, annotated - select for diffs
Fri Dec 17 00:18:07 2004 UTC (9 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.48: preferred, unified
Changes since revision 1.48: +6 -2 lines
VFS messaging/interfacing work stage 10/99:

Start adding the journaling, range locking, and (very slightly) cache
coherency infrastructure.  Continue cleaning up the VOP operations vector.

Expand on past commits that gave each mount structure its own set of VOP
operations vectors by adding additional vector sets for journaling or
cache coherency operations.  Remove the vv_jops and vv_cops fields
from the vnode operations vector in favor of placing those vop_ops directly
in the mount structure.  Reorganize the VOP calls as a double-indirect
and add a field to the mount structure which represents the current
vnode operations set (which will change when e.g. journaling is turned on
or off).  This creates the infrastructure necessary to allow us to stack
a generic journaling implementation on top of a filesystem.

Introduce a hard range-locking API for vnodes.   This API will be used by
high level system/vfs calls in order to handle atomicy guarentees.  It is
a prerequisit for: (1) being able to break I/O's up into smaller pieces
for the vm_page list/direct-to-DMA-without-mapping goal, (2) to support
the parallel write operations on a vnode goal, (3) to support the clustered
(remote) cache coherency goal, and (4) to support massive parallelism in
dispatching operations for the upcoming threaded VFS work.

This commit represents only infrastructure and skeleton/API work.

Revision 1.48: download - view: text, markup, annotated - select for diffs
Wed Nov 24 08:37:16 2004 UTC (9 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.47: preferred, unified
Changes since revision 1.47: +29 -0 lines
Cleanup some ESTALE issues on the client when files are replaced on
an NFS server.  Even though the attribute cache has expired DragonFly
still maintains a vnode in the namecache.  If a file is replaced on the
server the vnode's file handle will become invalid.

Force re-resolution of the namecache entry rather then replacing the vnode's
file handle, so programs with open descriptors to the dead file continue to
get a proper error return while lookups succeed in finding the new version
of the file.

In this patch ESTALE is checked in strategic places:  stat(), access(), and
open().  It is an imperfect solution at the moment but it seems to work pretty
well.  This should bring NFS client operations back up to FreeBSD-4.x
standards.

Revision 1.47: download - view: text, markup, annotated - select for diffs
Tue Nov 23 04:03:26 2004 UTC (9 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.46: preferred, unified
Changes since revision 1.46: +44 -24 lines
Fix a bug in chown, chmod, and chflags.  When the setfflags(), setffown(),
and setfmode() API was cleaned up to not remove vrefs maintained by the
caller it resulted in an incorrect vref+vn_lock combination which fails
to clear the VINACTIVE bit on the vnode.  vget() clears this bit as part of
its work.  This prevented the filesystem from synchronizing the changes
out to the inode unless other modifications were made to the file as well,
which resulted in weird errors such as ./MAKEDEV all creating /dev/null with
perms 600 (the chmod it does afterwords doesn't always take effect), and
other things.

Additional thanks to walt for providing the information that led to the
diagnosis.

Reported-by: "Simon 'corecode' Schubert" <corecode@fs.ei.tum.de>,
             walt <wa1ter@myrealbox.com>,
	     Andreas Hauser <andy@splashground.de>

Revision 1.46: download - view: text, markup, annotated - select for diffs
Fri Nov 12 00:09:24 2004 UTC (9 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.45: preferred, unified
Changes since revision 1.45: +742 -774 lines
VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK.

NOTE: unionfs and nullfs are temporarily broken by this commit.

* Remove the old namecache API.  Remove vfs_cache_lookup(), cache_lookup(),
  cache_enter(), namei() and lookup() are all gone.  VOP_LOOKUP() and
  VOP_CACHEDLOOKUP() have been collapsed into a single non-caching
  VOP_LOOKUP().

* Complete the new VFS CACHE (namecache) API.  The new API is able to
  supply topological guarentees and is able to reserve namespaces,
  including negative cache spaces (whether the target name exists or not),
  which the new API uses to reserve namespace for things like NRENAME
  and NCREATE (and others).

* Complete the new namecache API.  VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE,
  NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE.
  These new calls take (typicaly locked) namecache pointers rather then
  combinations of directory vnodes, file vnodes, and name components.  The
  new calls are *MUCH* simpler in concept and implementation.  For example,
  VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments.

  The new namecache API uses the namecache to lock namespaces without having
  to lock the underlying vnodes.  For example, this allows the kernel
  to reserve the target name of a create function trivially.  Namecache
  records are maintained BY THE KERNEL for both positive and negative hits.

  Generally speaking, the kernel layer is now responsible for resolving
  path elements.  NRESOLVE is called when an unresolved namecache record
  needs to be resolved.  Unlike the old VOP_LOOKUP, NRESOLVE is simply
  responsible for associating a vnode to a namecache record (positive hit)
  or telling the system that it's a negative hit, and not responsible for
  handling symlinks or other special cases or doing any of the other
  path lookup work, much unlike the old VOP_LOOKUP.

  It should be particularly noted that the new namecache topology does not
  allow disconnected namecache records.  In rare cases where a vnode must
  be converted to a namecache pointer for new API operation via a file handle
  (i.e. NFS), the cache_fromdvp() function is provided and a new API VOP,
  VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the
  topology leading up to the requested vnode.  These and other topological
  guarentees greatly reduce the complexity of the new namecache API.

  The new namei() is called nlookup().  This function uses a combination
  of cache_n*() calls, VOP_NRESOLVE(), and standard VOP calls resolve the
  supplied path, deal with symlinks, and so forth, in a nice small compact
  compartmentalized procedure.

* The old VFS code is no longer responsible for maintaining namecache records,
  a function which was mostly adhoc cache_purge()s occuring before the VFS
  actually knows whether an operation will succeed or not.

  The new VFS code is typically responsible for adjusting the state of
  locked namecache records passed into it.  For example, if NCREATE succeeds
  it must call cache_setvp() to associate the passed namecache record with
  the vnode representing the successfully created file.  The new requirements
  are much less complex then the old requirements.

* Most VFSs still implement the old API calls, albeit somewhat modified
  and in particular the VOP_LOOKUP function is now *MUCH* simpler.  However,
  the kernel now uses the new API calls almost exclusively and relies on
  compatibility code installed in the default ops (vop_compat_*()) to
  convert the new calls to the old calls.

* All kernel system calls and related support functions which used to do
  complex and confusing namei() operations now do far less complex and
  far less confusing nlookup() operations.

* SPECOPS shortcutting has been implemented.  User reads and writes now go
  directly to supporting functions which talk to the device via fileops
  rather then having to be routed through VOP_READ or VOP_WRITE, saving
  significant overhead.  Note, however, that these only really effect
  /dev/null and /dev/zero.

  Implementing this was fairly easy, we now simply pass an optional
  struct file pointer to VOP_OPEN() and let spec_open() handle the
  override.

SPECIAL NOTES: It should be noted that we must still lock a directory vnode
LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because
a number of VFS's (including UFS) store active directory scanning information
in the directory vnode.  The legacy NAMEI_LOOKUP cases can be changed to
use LK_SHARED once these VFS cases are fixed.  In particular, we are now
organized well enough to actually be able to do record locking within a
directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't
been done yet.

Many thanks to all of the testers and in particular David Rhodus for
finding a large number of panics and other issues.

Revision 1.45: download - view: text, markup, annotated - select for diffs
Tue Oct 12 19:20:46 2004 UTC (9 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.44: preferred, unified
Changes since revision 1.44: +45 -54 lines
VFS messaging/interfacing work stage 8/99: Major reworking of the vnode
interlock and other miscellanious things.  This patch also fixes FS
corruption due to prior vfs work in head.  In particular, prior to this
patch the namecache locking could introduce blocking conditions that
confuse the old vnode deactivation and reclamation code paths.  With
this patch there appear to be no serious problems even after two days
of continuous testing.

* VX lock all VOP_CLOSE operations.
* Fix two NFS issues.  There was an incorrect assertion (found by
  David Rhodus), and the nfs_rename() code was not properly
  purging the target file from the cache, resulting in Stale file
  handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj.
* Fix a TTY session issue.  Programs which open("/dev/tty" ,...) and
  then run the TIOCNOTTY ioctl were causing the system to lose track
  of the open count, preventing the tty from properly detaching.
  This is actually a very old BSD bug, but it came out of the woodwork
  in DragonFly because I am now attempting to track device opens
  explicitly.
* Gets rid of the vnode interlock.  The lockmgr interlock remains.
* Introduced VX locks, which are mandatory vp->v_lock based locks.
* Rewrites the locking semantics for deactivation and reclamation.
  (A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE,
  and VOP_RECLAIM).  New guarentees emplaced with regard to vnode
  ripouts.
* Recodes the mountlist scanning routines to close timing races.
* Recodes getnewvnode to close timing races (it now returns a
  VX locked and refd vnode rather then a refd but unlocked vnode).
* Recodes VOP_REVOKE- a locked vnode is now mandatory.
* Recodes all VFS inode hash routines to close timing holes.
* Removes cache_leaf_test() - vnodes representing intermediate
  directories are now held so the leaf test should no longer be
  necessary.
* Splits the over-large vfs_subr.c into three additional source
  files, broken down by major function (locking, mount related,
  filesystem syncer).

* Changes splvm() protection to a critical-section in a number of
  places (bleedover from another patch set which is also about to be
  committed).

Known issues not yet resolved:

* Possible vnode/namecache deadlocks.
* While most filesystems now use vp->v_lock, I haven't done a final
  pass to make vp->v_lock mandatory and to clean up the few remaining
  inode based locks (nwfs I think and other obscure filesystems).
* NullFS gets confused when you hit a mount point in the underlying
  filesystem.
* Only UFS and NFS have been well tested
* NFS is not properly timing out namecache entries, causing changes made
  on the server to not be properly detected on the client if the client
  already has a negative-cache hit for the filename in question.

Testing-by: David Rhodus <sdrhodus@gmail.com>,
	    Peter Kadau <peter.kadau@tuebingen.mpg.de>,
	    walt <wa1ter@myrealbox.com>,
	    others

Revision 1.44: download - view: text, markup, annotated - select for diffs
Thu Oct 7 04:20:26 2004 UTC (9 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.43: preferred, unified
Changes since revision 1.43: +8 -0 lines
VFS messaging/interfacing work stage 7f/99: More firming up of stage 7.

unlink, rmdir, rename, and whiteout removal functions use NAMEI_DELETE
namei() lookups.  With the old API this zap'd the namecache entry before
the system actually runs the operation.  If the operation fails we can be
left with a broken namecache hierarchy which is not allowed in the new API.

Change old API cache_lookup() semantics to *NOT* zap the namecache entry and
add explicit zaps after calls to VOP_UNLINK(), VOP_RMDIR(), etc. to replace
the functionality.

rename() attempts to issue a NAMEI_RENAME lookup which zaps the target, but
the same problem occurs if the target is preexisting and being overwritten.
A similar solution is employed for renames.

Revision 1.43: download - view: text, markup, annotated - select for diffs
Tue Oct 5 07:57:40 2004 UTC (9 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.42: preferred, unified
Changes since revision 1.42: +7 -2 lines
VFS messaging/interfacing work stage 7e/99: More firming up of stage 7.

Fix the linux emulation code for [l]stat(), it was not properly
disposing of the nlookupdata structure.

Fix chroot()'s use of the new api, it was horribly broken.

Cleanup cache_alloc().  Rewrite __getcwd() and vn_fullpath() to use newapi
namecache data.  Cleanup nlookup().  Fix bugs in nlookup() related to
stacked mount points.  Fix a bug related to VFS_ROOT() mount errors.

Linux-bugs-reported-by: Bartek Stalewski, walt

Revision 1.42: download - view: text, markup, annotated - select for diffs
Tue Oct 5 03:24:09 2004 UTC (9 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.41: preferred, unified
Changes since revision 1.41: +4 -1 lines
VFS messaging/interfacing work stage 7d/99: More firming up of stage 7.

Additional work to deal with old-api/new-api issues.  Cut more stuff
out of the old-api's cache_enter() routine to deal with deadlocks, at
the cost of some performance loss (temporary until the VFS's start using
the new APIs).  Change UFS and NFS to not purge whole directories in
*_rename() and *_rmdir().

Add some minor breakage to the API which will not be fixed until the VFS's
get new rename implementations - renaming a directory in which a process
has chdir'd will create problems for that process.  This doesn't happen
normally anyway so this temporary breakage should not cause any significant
problems.

Bug-reports-by: walt, Sascha Wildner, others

Revision 1.41: download - view: text, markup, annotated - select for diffs
Sat Oct 2 03:18:26 2004 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +1 -0 lines
VFS messaging/interfacing work stage 7b/99: More firming up of stage 7.

(1) Enhance cache_resolve() to go up the directory chain as far as necessary
    to resolve the chain.  Previously I wimped out and returned an error.

(2) Be sure not to use the parent of a mount point to obtain the vnode
    operations vector for a child of a mount point (which resides on a
    different filesystem!).

Generally speaking the namecache directory chain should contain resolved
vnodes due to the fact that the vnode associated with a namecache entry is
held if any children exist, preventing the vnode from being recycled.
However, the NFS client code as originally written wimps out and does
wholesale namecache flushing of directories when it isn't sure about the
state of things (which is quite often, especially when you are rm'ing
files), and this breaks that assumption and causes some intermediate NFS
directory nodes to revert back into an 'unresolved' state.  This will
eventually be fixed, but not right now.

Add a nc_mount pointer to the namecache structure.  For the moment this is
only used to get at the mount point associated with a NCF_MOUNTPT namecache
node (whether resolved or unresolved), making it easier for us to resolve
the vnode.  But eventually it will be used as the basis for obtaining the
v_ops for (new stlye) VOP calls on an unresolved namecache node, saving us
a few indirections so I don't consider it a hack.

Bugs-and-cores-by: drhodus

Revision 1.40: download - view: text, markup, annotated - select for diffs
Thu Sep 30 18:59:48 2004 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +231 -97 lines
VFS messaging/interfacing work stage 7/99.  BEGIN DESTABILIZATION!

Implement the infrastructure required to allow us to begin switching to the
new nlookup() VFS API.

	filedesc->fd_ncdir, fd_nrdir, fd_njdir

	    File descriptors (associated with processes) now record the
	    namecache pointer related to the current directory, root directory,
	    and jail directory, in addition to the vnode pointers.  These
	    pointers are used as the basis for the new path lookup code
	    (nlookup() and friends).

	file->f_ncp

	    File pointers may now have a referenced+unlocked namecache
	    pointer associated with them.  All fp's representing directories
	    have this attached.  This allows fchdir() to properly record
	    the ncp in fdp->fd_ncdir and friends.

	mount->mnt_ncp

	    The namecache topology for crossing a mount point works as
	    follows: when looking up a path element which is a mount point,
	    cache_nlookup() will locate the ncp for the vnode-under the
	    mount point.  mount->mnt_ncp represents the root of the mount,
	    that is the vnode-over.  nlookup() detects the mount point and
	    accesses mount->mnt_ncp to skip past the vnode-under.  When going
	    backwards (..), nlookup() detects the case and skips backwards.

	    The ncp linkages are: ncp->ncp->ncp[vnode_under]->ncp[vnode_over].
	    That is, when going forwards or backwards nlookup must explicitly
	    skip over the double-ncp when crossing a mount point.  This allows
	    us to keep the namecache topology intact across mount points.

NEW CACHE level API functions:

	cache_get()	Reference and lock a namecache entry
	cache_put()	Dereference and unlock a namecache entry
	cache_lock()	lock an already-referenced namecache entry
	cache_unlock()	unlock a lockednamecache entry

	    NOTE: namecache locks are exclusive and recursive.  These are
	    the 'namespace' locks that we will be using to guarentee namespace
	    operations such as in a CREATE, RENAME, or REMOVE.

	vfs_cache_setroot() 	Set the new system-wide root directory
	cache_allocroot()   	System bootstrap helper function to allocate
			    	 the root namecache node.

	cache_resolve()		Resolve a NCF_UNRESOLVED namecache node.  The
				namecache node should be locked on call.

	cache_setvp()		(resolver) associate a VP or create a negative
				cache entry representation for a namecache
				pointer and clear NCF_UNRESOLVED.  The
				namecache node should be locked on call.

	cache_setunresolved()	Revert a resolved namecache entry back to an
				unresolved state, disassociating any vnode
				but leaving the topology intact.  The
				namecache node should be locked on call.

	cache_vget()		Obtain the locked+refd vnode related to
				a namecache entry, resolving the entry if
				necessary.  Return ENOENT if the entry
				represents a negative cache hit.

	cache_vref()		Obtained a refd (not locked) vnode related to
				a namecache entry, as above.

	cache_nlookup()		The new namecache lookup routine.  This routine
				does a lookup and allocates a new namecache
				node (into an unresolved state) if necessary.
				Returns a namecache record whether or not
				the item can be found and whether or not it
				represents a positive or negative hit.

	cache_lookup()		OLD API CODE DEPRECATED, but must be maintained
				until everything has been converted over.
	cache_enter()		OLD API CODE DEPRECATED, but must be maintained
				until everything has been converted over.

NEW default VOPs

	vop_noresolve()		Implements a namecache resolver for VFSs
				which are still using the old VOP_LOOKUP/
				VOP_CACHEDLOOKUP API (which is all of them
				still).

	VOP_LOOKUP		OLD API CODE DEPRECATED, but must be maintained
				until everything has been converted over.
	VOP_CACHEDLOOKUP	OLD API CODE DEPRECATED, but must be maintained
				until everything has been converted over.

NEW PATHNAME LOOKUP CODE

	nlookup_init()		Similar to NDINIT, initialize a nlookupdata
				structure for nlookup() and nlookup_done().

	nlookup()		Lookup a path.  Unlike the old namei/lookup
				code the new lookup code does not do any
				fancy pre-disposition of the cache for
				create/delete, it simply looks up the requested
				path and returns the appropriate locked
				namecache pointer.  The caller can obtain the
				vnode and directory vnode, as applicable, from
				the one namecache structure that is returned.

				Access checks are done on directories leading
				up to the result but not done on the returned
				namecache node.

	nlookup_done()		Mandatory routine to cleanup a nlookupdata
				structure after it has been initialized and
				all operations have been completed on it.

	nlookup_simple()	(in progress) all-in-one wrapped new lookup.

	nlookup_mp()		helper call for resolving a mount point's
				glue NCP.  hackish, will be cleaned up later.

	nreadsymlink()		helper call to resolve a symlink.  Note that
				the namecache does not yet cache symlink data
				but the intention is to eventually do so to
				avoid having to do VFS ops to get the data.

	naccess()		Perform access checks on a namecache node
				given a mode and cred.

	naccess_va()		Perform access cheks on a vattr given a
				mode and cred.

Begin switching VFS operations from using namei to using nlookup.
In this batch:

	* mount 	(install mnt_ncp for cross-mount-point handling in
			nlookup, simplify the vfs_mount() API to no longer
			pass a nameidata structure)
	* [l]stat	(use nlookup)
	* [f]chdir	(use nlookup, use recorded f_ncp)
	* [f]chroot	(use nlookup, use recorded f_ncp)

Revision 1.39: download - view: text, markup, annotated - select for diffs
Tue Sep 28 00:25:29 2004 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +13 -1 lines
VFS messaging/interfacing work stage 6/99.  Populate and maintain the
namecache pointers previously attached to struct filedesc, giving the new
lookup code a base from which to work.

Implement the new lookup API (it is not yet being used by anything) and
augment the namecache API to handle the new functions, in particular
adding cache_setvp() to resolve an unresolved namecache entry into a
positive or negative hit and set various flags.  Note that we do not yet
cache symlink data but we could very easily.

The new API is greatly simplified.  Basically nlookups need only returned
a locked namecache pointer (guarenteeing namespace atomicy).  Related
vnodes are not locked.  Both the leaf and governing directory vnodes can
be extracted from the returned namecache pointer.  namecache pointers may
also represent negative hits, which means that their namespace locking
feature serves to reserve a filename that has not yet been created (e.g.
open+create, rename).

The kernel is still using the old API as of this commit.  This commit is
primarily introducing the management infrastructure required to actually
start writing code to use the new API.

VOP_RESOLVE() has been added, along with a default function which falls back
to VOP_LOOKUP()/VOP_CACHEDLOOKUP().  This VOP function is not yet being used
as of this commit.  This VOP will be responsible for taking an unresolved
but locked namecache structure (hence the namespace is locked), and actually
does the directory lookup.  But unlike the far more complex
VOP_LOOKUP()/VOP_CACHEDLOOKUP() API the VOP_RESOLVE() API only needs to
attach a vnode (or NULL if the entry does not exist) to the passed-in
namecache structure.  It is likely that timeouts, e.g. for NFS, will also
be attached via this API.

This commit does not implement any of the cache-coherency infrastructure
but keeps this future requirement in mind in its design.

Revision 1.38: download - view: text, markup, annotated - select for diffs
Tue Aug 17 18:57:32 2004 UTC (9 years, 11 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap13Sep2004
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +13 -2 lines
VFS messaging/interfacing work stage 2/99.  This stage retools the vnode ops
vector dispatch, making the vop_ops a per-mount structure rather then a
per-filesystem structure.  Filesystem mount code, typically in blah_vfsops.c,
must now register various vop_ops pointers in the struct mount to compile
its VOP operations set.

This change will allow us to begin adding per-mount hooks to VFSes to support
things like kernel-level journaling, various forms of cache coherency
management, and so forth.

In addition, the vop_*() calls now require a struct vop_ops pointer as the
first argument instead of a vnode pointer (note: in this commit the VOP_*()
macros currently just pull the vop_ops pointer from the vnode in order to
call the vop_*() procedures).  This change is intended to allow us to divorce
ourselves from the requirement that a vnode pointer always be part of a VOP
call.  In particular, this will allow namespace based routines such as
remove(), mkdir(), stat(), and so forth to pass namecache pointers rather then
locked vnodes and is a very important precursor to the goal of using the
namecache for namespace locking.

Revision 1.37: download - view: text, markup, annotated - select for diffs
Wed May 26 19:09:04 2004 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +1 -0 lines
Cleanup warnings.  No operational changes.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Fri May 21 16:21:57 2004 UTC (10 years, 2 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +12 -12 lines
Remove unneeded typecast.

Revision 1.35: download - view: text, markup, annotated - select for diffs
Fri May 21 15:41:23 2004 UTC (10 years, 2 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +37 -7 lines
Cleanup pass. Removed code that is not needed anymore.
Cleanup VOP_LEASE() uses and document.

Add in a debug function for buffer pool statistical information which can
be toggled via debug.syncprt.

Revision 1.34: download - view: text, markup, annotated - select for diffs
Wed May 19 22:52:58 2004 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +1 -1 lines
Device layer rollup commit.

* cdevsw_add() is now required.  cdevsw_add() and cdevsw_remove() may specify
  a mask/match indicating the range of supported minor numbers.  Multiple
  cdevsw_add()'s using the same major number, but distinctly different
  ranges, may be issued.  All devices that failed to call cdevsw_add() before
  now do.

* cdevsw_remove() now automatically marks all devices within its supported
  range as being destroyed.

* vnode->v_rdev is no longer resolved when the vnode is created.  Instead,
  only v_udev (a newly added field) is resolved.  v_rdev is resolved when
  the vnode is opened and cleared on the last close.

* A great deal of code was making rather dubious assumptions with regards
  to the validity of devices associated with vnodes, primarily due to
  the persistence of a device structure due to being indexed by (major, minor)
  instead of by (cdevsw, major, minor).  In particular, if you run a program
  which connects to a USB device and then you pull the USB device and plug
  it back in, the vnode subsystem will continue to believe that the device
  is open when, in fact, it isn't (because it was destroyed and recreated).

  In particular, note that all the VFS mount procedures now check devices
  via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev
  is NULL prior to the first open.

* The disk layer's device interaction has been rewritten.  The disk layer
  (i.e. the slice and disklabel management layer) no longer overloads
  its data onto the device structure representing the underlying physical
  disk.  Instead, the disk layer uses the new cdevsw_add() functionality
  to register its own cdevsw using the underlying device's major number,
  and simply does NOT register the underlying device's cdevsw.  No
  confusion is created because the device hash is now based on
  (cdevsw,major,minor) rather then (major,minor).

  NOTE: This also means that underlying raw disk devices may use the entire
  device minor number instead of having to reserve the bits used by the disk
  layer, and also means that can we (theoretically) stack a fully
  disklabel-supported 'disk' on top of any block device.

* The new reference counting scheme prevents this by associating a device
  with a cdevsw and disconnecting the device from its cdevsw when the cdevsw
  is removed.  Additionally, all udev2dev() lookups run through the cdevsw
  mask/match and only successfully find devices still associated with an
  active cdevsw.

* Major work on MFS:  MFS no longer shortcuts vnode and device creation.  It
  now creates a real vnode and a real device and implements real open and
  close VOPs.  Additionally, due to the disk layer changes, MFS is no longer
  limited to 255 mounts.  The new limit is 16 million.  Since MFS creates a
  real device node, mount_mfs will now create a real /dev/mfs<PID> device
  that can be read from userland (e.g. so you can dump an MFS filesystem).

* BUF AND DEVICE STRATEGY changes.  The struct buf contains a b_dev field.
  In order to properly handle stacked devices we now require that the b_dev
  field be initialized before the device strategy routine is called.  This
  required some additional work in various VFS implementations.  To enforce
  this requirement, biodone() now sets b_dev to NODEV.  The new disk layer
  will adjust b_dev before forwarding a request to the actual physical
  device.

* A bug in the ISO CD boot sequence which resulted in a panic has been fixed.

Testing by: lots of people, but David Rhodus found the most aggregious bugs.

Revision 1.33: download - view: text, markup, annotated - select for diffs
Sat Apr 24 04:32:03 2004 UTC (10 years, 3 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +8 -8 lines
Remove the VREF() macro and uses of it.
Remove uses of 0x20 before ^I inside vnode.h

Revision 1.32: download - view: text, markup, annotated - select for diffs
Wed Apr 21 04:49:00 2004 UTC (10 years, 3 months ago) by hmp
Branches: MAIN
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +1 -0 lines
Add a KKASSERT to mount(2) to make sure we have a proc pointer.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Wed Apr 21 04:47:28 2004 UTC (10 years, 3 months ago) by hmp
Branches: MAIN
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +7 -0 lines
Merge: FreeBSD (RELENG_4) vfs_syscalls.c rev. 1.151.2.19

	* Prohibit mount/umount operations inside a jail.

	* Respect vfs.usermount sysctl for umount(2).

Revision 1.30: download - view: text, markup, annotated - select for diffs
Tue Mar 16 17:53:53 2004 UTC (10 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +72 -49 lines
Separate chroot() into kern_chroot().  Rename change_dir() to checkvp_chdir()
and reorganize the code to avoid doing weird things to the passed vnode's
lock and ref count in deep subroutines (which lead to buggy code).

Fix a bug in chdir()/kern_chdir() (the namei data was not being freed in all
cases), and also fix a bug in symlink() (missing zfree in error case).

Submitted-by: Paul Herman <pherman@frenchfries.net>
Additional-work-by: dillon

Revision 1.29: download - view: text, markup, annotated - select for diffs
Mon Mar 1 06:33:17 2004 UTC (10 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +73 -77 lines
Newtoken commit.  Change the token implementation as follows:  (1) Obtaining
a token no longer enters a critical section.  (2) tokens can be held through
schedular switches and blocking conditions and are effectively released and
reacquired on resume.  Thus tokens serialize access only while the thread
is actually running.  Serialization is not broken by preemptive interrupts.
That is, interrupt threads which preempt do no release the preempted thread's
tokens.  (3) Unlike spl's, tokens will interlock w/ interrupt threads on
the same or on a different cpu.

The vnode interlock code has been rewritten and the API has changed.  The
mountlist vnode scanning code has been consolidated and all known races have
been fixed.  The vnode interlock is now a pool token.

The code that frees unreferenced vnodes whos last VM page has been freed has
been moved out of the low level vm_page_free() code and moved to the
periodic filesystem sycer code in vfs_msycn().

The SMP startup code and the IPI code has been cleaned up considerably.
Certain early token interactions on AP cpus have been moved to the BSP.

The LWKT rwlock API has been cleaned up and turned on.

Major testing by: David Rhodus

Revision 1.28: download - view: text, markup, annotated - select for diffs
Fri Nov 14 19:31:22 2003 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +1 -1 lines
Fix bug in last syscall separation commit, an extra semicolon was causing
mkfifo to return early and leave vnodes locked.

Revision 1.27: download - view: text, markup, annotated - select for diffs
Thu Nov 13 04:04:42 2003 UTC (10 years, 8 months ago) by daver
Branches: MAIN
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +35 -23 lines
Split mkfifo().

Trash the CHECKALT{CREAT,EXIST} macros and friends.  Implement
linux_copyin_path() and linux_free_path() for path translation without
using the stackgap.

Use the above and recently split syscalls to remove stackgap allocations
from linux_creat(), linux_open(), linux_lseek(), linux_llseek(),
linux_access(), linux_unlink(), linux_chdir(), linux_chmod(),
linux_mkdir(), linux_rmdir(), linux_rename(), linux_symlink(),
linux_readlink(), linux_truncate(), linux_link(), linux_chown(),
linux_lchown(), linux_uselib(), linux_utime(), linux_mknod(),
linux_newstat(), linux_newlstat(), linux_statfs(), linux_stat64(),
linux_lstat64(), linux_chown16(), linux_lchown16(), linux_execve().

Split use split syscalls to reimplement linux_fstatfs().

Implement linux_translate_path() for use in exec_linux_imgact_try().

Revision 1.26: download - view: text, markup, annotated - select for diffs
Wed Nov 12 10:11:09 2003 UTC (10 years, 8 months ago) by daver
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +5 -0 lines

Remind myself and others that kern_readlink() isn't properly split yet.
There are copyin() calls burried in VOP_READLINK().

Revision 1.25: download - view: text, markup, annotated - select for diffs
Tue Nov 11 14:33:23 2003 UTC (10 years, 8 months ago) by daver
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +3 -3 lines

The big syscall split commit broke utimes(), lutimes() and futimes() when
passed a NULL timeval structure.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Mon Nov 10 20:57:18 2003 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +2 -2 lines
The last major syscall separation commit completely broke our lseek() as well
as the linux emulated lseek().  It's sheer luck that the system works at
all :-).  Fix lseek's 64 bit return value.

Revision 1.23: download - view: text, markup, annotated - select for diffs
Mon Nov 3 18:49:23 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +1 -1 lines
Fix a minor compile-time bug introduced in 1.22 when DEBUG_VFS_LOCKS is
turned on.

Revision 1.22: download - view: text, markup, annotated - select for diffs
Mon Nov 3 15:57:33 2003 UTC (10 years, 9 months ago) by daver
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +622 -675 lines
Split wait4(), setrlimit(), getrlimit(), statfs(), fstatfs(), chdir(),
open(), mknod(), link(), symlink(), unlink(), lseek(), access(), stat(),
lstat(), readlink(), chmod(), chown(), lchown(), utimes(), lutimes(),
futimes(), truncate(), rename(), mkdir(), rmdir(), getdirentries(),
getdents().

Trash the 4.3BSD numeric filesystem type support in mount().

Move ocreat(), olseek(), otruncate(), ostat(), olstat(), owait(),
ogetrlimit(), and osetrlimit() to the 43bsd subtree and reimplement
using split syscalls.  Move ogetdirentries() to the subtree without
change because it is such a mess.

Convince linux_waitpid(), linux_wait(), linux_setrlimit(),
linux_old_getrlimit(), and linux_getrlimit() to use split syscalls.

The file kern/vfs_syscalls.c is now completely free of COMPAT_43 code.
I believe that execve() is the only pending split before I can tackle
stackgap usage in the linux emulator's CHECKALT{EXIST,CREAT}() macros.

Revision 1.21: download - view: text, markup, annotated - select for diffs
Tue Oct 21 01:05:09 2003 UTC (10 years, 9 months ago) by daver
Branches: MAIN
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +20 -29 lines
Create the kern_fstat() and kern_ftruncate() in-kernel syscalls.

Implement fstat(), nfstat() and ftruncate() using the in-kernel syscalls.

Move ofstat() and oftruncate() to the 43bsd emulation tree and implement
with in-kernel syscalls.

Create the linux_ftruncate() syscall in the linux emulation layer.  This
replaces a direct use of oftruncate() in the linux syscall map.  Rewrite
linux_newfstat() and linux_fstat64() with the in-kernel syscalls.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Thu Oct 9 22:27:19 2003 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +11 -11 lines
namecache work stage 3a: Adjust the VFS APIs to include a namecache pointer
where necessary.  For the moment we pass NULL for these parameters (the old
'dvp' vnode pointer's cannot be ripped out quite yet).

Revision 1.19: download - view: text, markup, annotated - select for diffs
Mon Sep 29 18:52:06 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +7 -9 lines
Cleanup: get rid of the CNP_NOFOLLOW pseudo-flag.  #define 0'd flags are a
really bad idea.

Revision 1.18: download - view: text, markup, annotated - select for diffs
Sun Sep 28 03:44:02 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +2 -1 lines
namecache work stage 2: move struct namecache to its own header file and
have vnode.h include it for now.  Re-engineer the namecache topology to make
it possible to track different parent directories and to make it possible
to namei/lookup paths using the namecache structure as the primary placeholder
rather then a directory vnode.  Add a few minor hacks to stabilize the system
that will be removed (no longer be necessary) in stage 3.  Get rid of the
leafonly sysctl and make its effect the default, but in order to avoid
doing too much in this stage it is still possible to disassociate a vnode
from its namecache entry, which a lot of filesystems (e.g. NFS) depend on
as a poor-man's way of invalidating entries.  The namecache topology itself,
however, will be left intact even if a vnode is disassociated in the middle
of a path.

Revision 1.17: download - view: text, markup, annotated - select for diffs
Tue Sep 23 05:03:51 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +72 -56 lines
namecache work stage 1: namespace cleanups.  Add a NAMEI_ prefix to
CREATE, LOOKUP, DELETE, and RENAME.  Add a CNP_ prefix too all the name
lookup flags (nd_flags) e.g. ISDOTDOT->CNP_ISDOTDOT.

Revision 1.16: download - view: text, markup, annotated - select for diffs
Tue Aug 26 21:09:02 2003 UTC (10 years, 11 months ago) by rob
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +9 -9 lines
__P() removal

Revision 1.15: download - view: text, markup, annotated - select for diffs
Thu Aug 7 21:17:23 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +1 -1 lines
kernel tree reorganization stage 1: Major cvs repository work (not logged as
commits) plus a major reworking of the #include's to accomodate the
relocations.

    * CVS repository files manually moved.  Old directories left intact
      and empty (temporary).

    * Reorganize all filesystems into vfs/, most devices into dev/,
      sub-divide devices by function.

    * Begin to move device-specific architecture files to the device
      subdirs rather then throwing them all into, e.g. i386/include

    * Reorganize files related to system busses, placing the related code
      in a new bus/ directory.  Also move cam to bus/cam though this may
      not have been the best idea in retrospect.

    * Reorganize emulation code and place it in a new emulation/ directory.

    * Remove the -I- compiler option in order to allow #include file
      localization, rename all config generated X.h files to use_X.h to
      clean up the conflicts.

    * Remove /usr/src/include (or /usr/include) dependancies during the
      kernel build, beyond what is normally needed to compile helper
      programs.

    * Make config create 'machine' softlinks for architecture specific
      directories outside of the standard <arch>/include.

    * Bump the config rev.

    WARNING! after this commit /usr/include and /usr/src/sys/compile/*
    should be regenerated from scratch.

Revision 1.14: download - view: text, markup, annotated - select for diffs
Sun Aug 3 10:07:41 2003 UTC (11 years ago) by hmp
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +1 -1 lines
Use FOREACH_PROC_IN_SYSTEM() throughout.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Wed Jul 30 00:19:14 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +14 -14 lines
syscall messaging 3: Expand the 'header' that goes in front of the syscall
arguments in the kernel copy.  The header was previously just an lwkt_msg.
The header is now a 'union sysmsg'.  'union sysmsg' contains an lwkt_msg
plus space for the additional meta data required to asynchronize various
system calls.   We haven't actually asynchronized anything yet and will not
be able to until the reply port and abort processing infrastructure is
in place.  See sys/sysmsg.h for more information on the new header.

Also cleanup syscall generation somewhat and add some ibcs2 stuff I missed.

Revision 1.12: download - view: text, markup, annotated - select for diffs
Sat Jul 26 19:42:11 2003 UTC (11 years ago) by rob
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +1 -1 lines
Register keyword removal

Approved by: Matt Dillon

Revision 1.11: download - view: text, markup, annotated - select for diffs
Sat Jul 26 18:12:44 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +14 -16 lines
syscall messaging 2: Change the standard return value storage for system
calls from proc->p_retval[] to the message structure embedded in the syscall.
System calls used to set their non-error return value in p_retval[] but
must now set it in the message structure.  This is a necessary precursor to
any sort of asynchronizatino, for obvious reasons.

This work was particularly annoying because all the emualtion code declares
and manually fills in syscall argument structures.

This commit could potentially destabilize some of the emulation code but I
went through the most important Linux emulation code three times and tested it
with linux-mozilla, so I am fairly confident that I got it right.

Note: proper linux emulation requires setting the fallback elf brand to 3 or
it will default to SVR4.  It really ought to default to linux (3), not SVR4.

    sysctl -w kern.fallback_elf_brand=3

Revision 1.10: download - view: text, markup, annotated - select for diffs
Thu Jul 24 01:41:25 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +0 -19 lines
Preliminary syscall messaging work.  Adjust all <syscall>_args structures
to include an lwkt_msg at their base which will eventually allow syscalls
to run asynch.  Note that this is for the kernel copy of the arguments, the
userland argument format has not changed for the standard syscall entry
point.

Begin abstracting a messaging syscall interface (#if 0'd out at the moment).

Change the syscall2 entry point to take the new expanded argument structure
into account.  Change sysent argument calculation (AS macro) to take the
new expanded argument structure into account.

Note: existing linux, svr4, and ibcs2 emulation may break with this commit,
though it is not intentional.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Sat Jul 19 21:14:39 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +2 -2 lines
Remove the priority part of the priority|flags argument to tsleep().  Only
flags are passed now.  The priority was a user scheduler thingy that is not
used by the LWKT subsystem.  For process statistics assume sleeps without
P_SINTR set to be disk-waits, and sleeps with it set to be normal sleeps.

This commit should not contain any operational changes.

Revision 1.8: download - view: text, markup, annotated - select for diffs
Sun Jul 6 21:23:51 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +30 -30 lines
MP Implementation 1/2: Get the APIC code working again, sweetly integrate the
MP lock into the LWKT scheduler, replace the old simplelock code with
tokens or spin locks as appropriate.  In particular, the vnode interlock
(and most other interlocks) are now tokens.  Also clean up a few curproc/cred
sequences that are no longer needed.

The APs are left in degenerate state with non IPI interrupts disabled as
additional LWKT work must be done before we can really make use of them,
and FAST interrupts are not managed by the MP lock yet.  The main thing
for this stage was to get the system working with an APIC again.

buildworld tested on UP and 2xCPU/MP (Dell 2550)

Revision 1.7: download - view: text, markup, annotated - select for diffs
Fri Jun 27 01:53:25 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +1 -4 lines
proc->thread stage 6: kernel threads now create processless LWKT threads.
A number of obvious curproc cases were removed, tsleep/wakeup was made to
work with threads (wmesg, ident, and timeout features moved to threads).
There are probably a few curproc cases left to fix.

Revision 1.6: download - view: text, markup, annotated - select for diffs
Thu Jun 26 05:55:14 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +11 -14 lines
proc->thread stage 5:  BUF/VFS clearance!  Remove the ucred argument from
vop_close, vop_getattr, vop_fsync, and vop_createvobject.  These VOPs can
be called from multiple contexts so the cred is fairly useless, and UFS
ignorse it anyway.  For filesystems (like NFS) that sometimes need a cred
we use proc0.p_ucred for now.

This removal also removed the need for a 'proc' reference in the related
VFS procedures, which greatly helps our proc->thread conversion.

bp->b_wcred and bp->b_rcred have also been removed, and for the same reason.
It makes no sense to have a particular cred when multiple users can
access a file.  This may create issues with certain types of NFS mounts
but if it does we will solve them in a way that doesn't pollute the
struct buf.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Wed Jun 25 05:22:32 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +1 -3 lines
proc->thread stage 4: post commit cleanup.  Fix minor issues when recompiling
with GENERIC.

Revision 1.4: download - view: text, markup, annotated - select for diffs
Wed Jun 25 03:55:57 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +316 -265 lines
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread
pointers instead of process pointers as arguments, similar to what FreeBSD-5
did.  Note however that ultimately both APIs are going to be message-passing
which means the current thread context will not be useable for creds and
descriptor access.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Mon Jun 23 17:55:41 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +325 -799 lines
proc->thread stage 2: MAJOR revamping of system calls, ucred, jail API,
and some work on the low level device interface (proc arg -> thread arg).
As -current did, I have removed p_cred and incorporated its functions
into p_ucred.  p_prison has also been moved into p_ucred and adjusted
accordingly.  The jail interface tests now uses ucreds rather then processes.

The syscall(p,uap) interface has been changed to just (uap).  This is inclusive
of the emulation code.  It makes little sense to pass a proc pointer around
which confuses the MP readability of the code, because most system call code
will only work with the current process anyway.  Note that eventually
*ALL* syscall emulation code will be moved to a kernel-protected userland
layer because it really makes no sense whatsoever to implement these
emulations in the kernel.

suser() now takes no arguments and only operates with the current process.
The process argument has been removed from suser_xxx() so it now just takes
a ucred and flags.

The sysctl interface was adjusted somewhat.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Jun 17 04:28:42 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +1 -0 lines
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jun 17 02:55:08 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: FREEBSD_4_FORK
import from FreeBSD RELENG_4 1.151.2.18

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options