DragonFly BSD

CVS log for src/sys/kern/kern_exec.c

[BACK] Up to [DragonFly] / src / sys / kern

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.64: download - view: text, markup, annotated - select for diffs
Sun Oct 26 04:29:19 2008 UTC (5 years, 11 months ago) by sephe
Branches: MAIN
CVS tags: HEAD
Diff to: previous 1.63: preferred, unified
Changes since revision 1.63: +4 -1 lines
- Return the real cluster limit used by the objcache
- For mbuf objcaches, raise backing kmalloc pools' limit according to the
  cluster limits.
  Suggested-by: dillon@

Reviewed-by: aggelos@, nth@

Revision 1.63: download - view: text, markup, annotated - select for diffs
Sun Jan 6 16:55:51 2008 UTC (6 years, 9 months ago) by swildner
Branches: MAIN
CVS tags: DragonFly_RELEASE_2_0_Slip, DragonFly_RELEASE_2_0, DragonFly_RELEASE_1_12_Slip, DragonFly_RELEASE_1_12, DragonFly_Preview
Diff to: previous 1.62: preferred, unified
Changes since revision 1.62: +0 -4 lines
Remove bogus checks after kmalloc(M_WAITOK) which never returns NULL.

Reviewed-by: hasso

Revision 1.62: download - view: text, markup, annotated - select for diffs
Tue Aug 28 01:09:24 2007 UTC (7 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.61: preferred, unified
Changes since revision 1.61: +6 -2 lines
Fix a bug in vnode_pager_generic_getpages().  This function was improperly
setting m->valid to 0 and was also improperly trying to free the page after
it had potentially become wired by the buffer cache.

Add a sysctl to UFS that allows us to force it to call vop_stdgetpages()
for debugging purposes.

Revision 1.59.2.1: download - view: text, markup, annotated - select for diffs
Tue Jul 31 22:40:50 2007 UTC (7 years, 2 months ago) by dillon
Branches: DragonFly_RELEASE_1_10
CVS tags: DragonFly_RELEASE_1_10_Slip
Diff to: previous 1.59: preferred, unified; next MAIN 1.60: preferred, unified
Changes since revision 1.59: +19 -4 lines
Synchronize all changes made in HEAD to date with the 1.10 release branch.

* usbdevs update
* header file fixes
* vinum root
* vinum device I/O fixes
* MD fixes
* New PCI ids for netif rum and ural
* New USB uplcom ids
* linux exec memory leak
* devclass ordering fixes (sound devices)
* rate-limited kprintf support (filesystem full console spams)
* msdosfs fixes
* Manual page work

Revision 1.61: download - view: text, markup, annotated - select for diffs
Mon Jul 30 17:41:23 2007 UTC (7 years, 2 months ago) by pavalos
Branches: MAIN
Diff to: previous 1.60: preferred, unified
Changes since revision 1.60: +3 -3 lines
Spelling corrections.

Revision 1.60: download - view: text, markup, annotated - select for diffs
Mon Jul 30 14:52:40 2007 UTC (7 years, 2 months ago) by corecode
Branches: MAIN
Diff to: previous 1.59: preferred, unified
Changes since revision 1.59: +19 -4 lines
Fix a memory leak when kern_execve() fails fatally.

The callers of execve() are taking care of releasing the argument
buffer for exec.  However, there might be situations where exec
ran into a dead end and can't come out again because it already
has destroyed the calling process image.  In these cases kern_execve
was calling exit1() to destroy the current process.  This however means
that the argument buffers were never freed, thus causing the leak.

In combination with the recent change to using the objcache with a
hard upper allocation limit, this meant that at some point no binaries
could be execed anymore, stalling callers of sys_exec() in objcache_get.

To fix this business, kern_execve() now does not call exit1() on
leathal errors anymore, but instead returns -1 (in contrast to valid
errno values which are >= 0) to indicate that the current process has
to commit suicide after cleaning up used resources.

Revision 1.59: download - view: text, markup, annotated - select for diffs
Thu Jul 12 21:56:22 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
Branch point for: DragonFly_RELEASE_1_10
Diff to: previous 1.58: preferred, unified
Changes since revision 1.58: +35 -17 lines
Fix LWP support on exec.  exec now properly kills all LWPs.

Revision 1.58: download - view: text, markup, annotated - select for diffs
Fri Jun 29 21:54:08 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.57: preferred, unified
Changes since revision 1.57: +6 -0 lines
Implement struct lwp->lwp_vmspace.  Leave p_vmspace intact.  This allows
vkernels to run threaded and to run emulated VM spaces on a per-thread basis.
struct proc->p_vmspace is left intact, making it easy to switch into and out
of an emulated VM space.  This is needed for the virtual kernel SMP work.

This also gives us the flexibility to run emulated VM spaces in their own
threads, or in a limited number of separate threads.  Linux does this and
they say it improved performance.  I don't think it necessarily improved
performance but its nice to have the flexibility to do it in the future.

Revision 1.57: download - view: text, markup, annotated - select for diffs
Thu Jun 7 23:14:25 2007 UTC (7 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.56: preferred, unified
Changes since revision 1.56: +21 -3 lines
Entirely remove exec_map from the kernel.  Use the new vm_fault_object_page()
for the data/bss special case in the elf loader, and use the objcache to
cache arguments for exec.

This in turn removes nearly all of the SMP page invalidation IPIs that
occur during fork/exec/exit/wait sequences by virtue of not needing to
map and unmap so much KVM.

Revision 1.56: download - view: text, markup, annotated - select for diffs
Sun Apr 29 18:25:34 2007 UTC (7 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.55: preferred, unified
Changes since revision 1.55: +2 -1 lines
* Use SYSREF for vmspace structures.  This replaces the vmspace structure's
  roll-your-own refcnt implementation and replaces its zalloc backing store.
  Numerous procedures have been added to handle termination and DTOR
  operations and to properly interlock with vm_exitingcnt, all centered
  around the vmspace_sysref_class declaration.

* Replace pmap_activate() and pmap_deactivate() with add pmap_replacevm().
  This replaces numerous instances where roll-your-own deactivate/activate
  sequences were being used, creating small windows of opportunity where
  an update to the kernel pmap would not be visible to running code.

* Properly deactivate pmaps and add assertions to the fact in the teardown
  code.  Cases had to be fixed in cpu_exit_switch(), the exec code, the
  AIO code, and a few other places.

* Add pmap_puninit() which is called as part of the DTOR sequence for
  vmspaces, allowing the kmem mapping and VM object to be recovered.
  We could not do this with the previous zalloc() implementation.

* Properly initialize the per-cpu sysid allocator (globaldata->gd_sysid_alloc).

Make the following adjustments to the LWP exiting code.

* P_WEXIT interlocks the master exiting thread, eliminating races which can
  occur when it is signaling the 'other' threads.

* LWP_WEXIT interlocks individual exiting threads, eliminating races which
  can occur there and streamlining some of the tests.

* Don't bother queueing the last LWP to the reaper.  Instead, just leave it
  in the p_lwps list (but still decrement nthreads), and add code to
  kern_wait() to reap the last thread.  This improves exit/wait performance
  for unthreaded applications.

* Fix a VMSPACE teardown race in the LWP code.  It turns out that it was
  still possible for the VMSPACE for an exiting LWP to be ripped out from
  under it by the reaper (due to a conditional that was really supposed to
  be a loop), or by kern_wait() (due to not waiting for all the LWPs to
  enter an exiting state).  The fix is to have the LWPs PHOLD() the process
  and then PRELE() it when they are reaped.

This is a little mixed up because the addition of SYSREF revealed a number
of other semi-related bugs in the pmap and LWP code which also had to be
fixed.

Revision 1.55: download - view: text, markup, annotated - select for diffs
Sun Feb 25 23:17:12 2007 UTC (7 years, 7 months ago) by corecode
Branches: MAIN
Diff to: previous 1.54: preferred, unified
Changes since revision 1.54: +7 -12 lines
Get rid of struct user/UAREA.

Merge procsig with sigacts and replace usage of procsig with
sigacts, like it used to be in 4.4BSD.

Put signal-related inline functions in sys/signal2.h.

Reviewed-by: Thomas E. Spanjaard <tgen@netphreax.net>

Revision 1.54: download - view: text, markup, annotated - select for diffs
Sat Feb 24 14:25:06 2007 UTC (7 years, 8 months ago) by corecode
Branches: MAIN
Diff to: previous 1.53: preferred, unified
Changes since revision 1.53: +2 -3 lines
1:1 Userland threading stage 4.4/4:

Implement killlwps() and flesh out lwp_exit().  Lwps which have set the
LWP_WEXIT flag will terminate themselves by calling lwp_exit() from
userret().

Reap lwps in a per-CPU taskqueue.  NOTE: Even the last lwp of an exiting
proc will be reaped by the task queue, effectively leaving a BARE proc
to reap in wait4().

In-collaboration-with: Thomas E. Spanjaard <tgen@netphreax.net>

Revision 1.53: download - view: text, markup, annotated - select for diffs
Sat Feb 3 17:05:57 2007 UTC (7 years, 8 months ago) by corecode
Branches: MAIN
Diff to: previous 1.52: preferred, unified
Changes since revision 1.52: +8 -3 lines
1:1 Userland threading stage 2.11/4:

Move signals into lwps, take p_lwp out of proc.

Originally-Submitted-by:  David Xu <davidxu@freebsd.org>
Reviewed-by: Thomas E. Spanjaard <tgen@netphreax.net>

Revision 1.52: download - view: text, markup, annotated - select for diffs
Thu Dec 28 21:24:01 2006 UTC (7 years, 9 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8
Diff to: previous 1.51: preferred, unified
Changes since revision 1.51: +2 -2 lines
Make kernel_map, buffer_map, clean_map, exec_map, and pager_map direct
structural declarations instead of pointers.  Clean up all related code,
in particular kmem_suballoc().

Remove the offset calculation for kernel_object.  kernel_object's page
indices used to be relative to the start of kernel virtual memory in order
to improve the performance of VM page scanning algorithms.  The optimization
is no longer needed now that VM objects use Red-Black trees.  Removal of
the offset simplifies a number of calculations and makes the code more
readable.

Revision 1.51: download - view: text, markup, annotated - select for diffs
Sat Dec 23 00:35:04 2006 UTC (7 years, 10 months ago) by swildner
Branches: MAIN
Diff to: previous 1.50: preferred, unified
Changes since revision 1.50: +5 -5 lines
Rename printf -> kprintf in sys/ and add some defines where necessary
(files which are used in userland, too).

Revision 1.50: download - view: text, markup, annotated - select for diffs
Tue Nov 7 20:48:14 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.49: preferred, unified
Changes since revision 1.49: +1 -1 lines
More Machine-dependant/Machine-independant code and header file separation.
Numerous machine interfaces have MI APIs and should be declared in MI headers
even though the routines are defined in MD sources.

* Improve rdtsc()'s API so it can be used in MI code.
* Add an explicit enable in machine/${MACHINE}/Makefile.inc for syscons/apm.
* Abstract <machine/reg.h> and provide a MI API for it via <sys/reg.h>.
* Move additional MI API calls from <machine/md_var.h> to <sys/systm.h>.

Revision 1.49: download - view: text, markup, annotated - select for diffs
Tue Nov 7 17:51:23 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.48: preferred, unified
Changes since revision 1.48: +3 -2 lines
Misc cleanups and CVS surgery.  Move a number of header and source files
from machine/pc32 to cpu/i386 as part of the ongoing architectural separation
work and do a bit of cleanup.

Revision 1.48: download - view: text, markup, annotated - select for diffs
Fri Oct 27 04:56:31 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.47: preferred, unified
Changes since revision 1.47: +4 -4 lines
Major namecache work primarily to support NULLFS.

* Move the nc_mount field out of the namecache{} record and use a new
  namecache handle structure called nchandle { mount, ncp } for all
  API accesses to the namecache.

* Remove all mount point linkages from the namecache topology.  Each mount
  now has its own namecache topology rooted at the root of the mount point.

  Mount points are flagged in their underlying filesystem's namecache
  topology but instead of linking the mount into the topology, the flag
  simply triggers a mountlist scan to locate the mount.  ".." is handled
  the same way... when the root of a topology is encountered the scan
  can traverse to the underlying filesystem via a field stored in the
  mount structure.

* Ref the mount structure based on the number of nchandle structures
  referencing it, and do not kfree() the mount structure during a forced
  unmount if refs remain.

These changes have the following effects:

* Traversal across mount points no longer require locking of any sort,
  preventing process blockages occuring in one mount from leaking across
  a mount point to another mount.

* Aliased namespaces such as occurs with NULLFS no longer duplicate the
  namecache topology of the underlying filesystem.  Instead, a NULLFS
  mount simply shares the underlying topology (differentiating between
  it and the underlying topology by the fact that the name cache
  handles { mount, ncp } contain NULLFS's mount pointer.

  This saves an immense amount of memory and allows NULLFS to be used
  heavily within a system without creating any adverse impact on kernel
  memory or performance.

* Since the namecache topology for a NULLFS mount is shared with the
  underyling mount, the namecache records are in fact the same records
  and thus full coherency between the NULLFS mount and the underlying
  filesystem is maintained by design.

* Future efforts, such as a unionfs or shadow fs implementation, now
  have a mount structure to work with.  The new API is a lot more
  flexible then the old one.

Revision 1.47: download - view: text, markup, annotated - select for diffs
Fri Oct 20 17:02:16 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.46: preferred, unified
Changes since revision 1.46: +2 -4 lines
Add a ton of infrastructure for VKERNEL support.   Add code for intercepting
traps and system calls, for switching to and executing a foreign VM space,
and for accessing trap frames.

Revision 1.46: download - view: text, markup, annotated - select for diffs
Tue Sep 19 11:47:35 2006 UTC (8 years, 1 month ago) by corecode
Branches: MAIN
Diff to: previous 1.45: preferred, unified
Changes since revision 1.45: +1 -1 lines
1:1 Userland threading stage 2.9/4:

Push out p_thread a little bit more

Revision 1.45: download - view: text, markup, annotated - select for diffs
Sun Sep 17 21:07:32 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.44: preferred, unified
Changes since revision 1.44: +10 -0 lines
Make some adjustments to low level madvise/mcontrol/mmap support code to
accomodate vmspace_*() calls.

Reformulate the new vmspace_*() calls so they operate similarly to the
MAP_VPAGETABLE and mcontrol() calls.  This also makes vmspace's more
'programmable' in the sense that it will be possible to mix virtual
pagetable mmap()ings with other mmap()ing in a vmspace.

Fill in the code for all the new vmspace_*() calls except for
vmspace_ctl().  NOTE: vmspace calls are effectively disabled unless
vm.vkernel_enable is turned on, just like MAP_VPAGETABLE.

Renumber the new mcontrol() and vmspace_*() calls and regenerate.

Revision 1.44: download - view: text, markup, annotated - select for diffs
Tue Sep 5 00:55:45 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.43: preferred, unified
Changes since revision 1.43: +4 -4 lines
Rename malloc->kmalloc, free->kfree, and realloc->krealloc.  Pass 1

Revision 1.43: download - view: text, markup, annotated - select for diffs
Sun Sep 3 18:29:16 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.42: preferred, unified
Changes since revision 1.42: +1 -1 lines
Rename functions to avoid conflicts with libc.

Revision 1.42: download - view: text, markup, annotated - select for diffs
Sun Sep 3 17:31:54 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.41: preferred, unified
Changes since revision 1.41: +2 -2 lines
Rename functions to avoid conflicts with libc.

Revision 1.41: download - view: text, markup, annotated - select for diffs
Sat Aug 12 00:26:20 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +2 -2 lines
VNode sequencing and locking - part 3/4.

VNode aliasing is handled by the namecache (aka nullfs), so there is no
longer a need to have VOP_LOCK, VOP_UNLOCK, or VOP_ISSLOCKED as 'VOP'
functions.  Both NFS and DEADFS have been using standard locking functions
for some time and are no longer special cases.  Replace all uses with
native calls to vn_lock, vn_unlock, and vn_islocked.

We can't have these as VOP functions anyhow because of the introduction of
the new SYSLINK transport layer, since vnode locks are primarily used to
protect the local vnode structure itself.

Revision 1.40: download - view: text, markup, annotated - select for diffs
Mon Jun 5 07:26:10 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +1 -1 lines
Modify kern/makesyscall.sh to prefix all kernel system call procedures
with "sys_".  Modify all related kernel procedures to use the new naming
convention.  This gets rid of most of the namespace overloading between
the kernel and standard header files.

Revision 1.39: download - view: text, markup, annotated - select for diffs
Wed May 17 20:20:49 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +4 -8 lines
The ktracing code was not properly matching up VOP_OPEN and VOP_CLOSE calls.

Replace the p_tracep tracing vnode in struct proc with a pointer to
a ref-counted ktrace_node.  Ref the node instead of the vnode to prevent
the destruction of the vnode.

Revision 1.38: download - view: text, markup, annotated - select for diffs
Sat May 6 02:43:12 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +3 -4 lines
The thread/proc pointer argument in the VFS subsystem originally existed
for...  well, I'm not sure *WHY* it originally existed when most of the
time the pointer couldn't be anything other then curthread or curproc or
the code wouldn't work.  This is particularly true of lockmgr locks.

Remove the pointer argument from all VOP_*() functions, all fileops functions,
and most ioctl functions.

Revision 1.37: download - view: text, markup, annotated - select for diffs
Fri May 5 21:15:08 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +2 -2 lines
Simplify vn_lock(), VOP_LOCK(), and VOP_UNLOCK() by removing the thread_t
argument.  These calls now always use the current thread as the lockholder.
Passing a thread_t to these functions has always been questionable at best.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Wed Mar 29 18:44:50 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +5 -13 lines
Remove VOP_GETVOBJECT, VOP_DESTROYVOBJECT, and VOP_CREATEVOBJECT.  Rearrange
the VFS code such that VOP_OPEN is now responsible for associating a VM
object with a vnode.  Add the vinitvmio() helper routine.

Revision 1.35: download - view: text, markup, annotated - select for diffs
Sun Oct 9 20:12:34 2005 UTC (9 years ago) by corecode
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_4_Slip, DragonFly_RELEASE_1_4
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +1 -1 lines
1:1 Userland threading stage 2.6/4:

Retire p_upcall compat and use lwp_upcall instead.

Revision 1.34: download - view: text, markup, annotated - select for diffs
Sat Oct 8 14:31:26 2005 UTC (9 years ago) by corecode
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +1 -0 lines
1:1 Userland threading stage 2.3/4:

Use p_comm instead of p_thread->td_comm.

Revision 1.33: download - view: text, markup, annotated - select for diffs
Wed Jun 22 19:40:35 2005 UTC (9 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +31 -2 lines
Randomize the initial stack pointer for a user process.    Introduce a
sysctl, kern.stackgrap_random, to specify the random range.  The value must
be a power of two.

Submitted-by: Craig Dooley <xlnxminusx@gmail.com>
Adapted-from: OpenBSD

Revision 1.32: download - view: text, markup, annotated - select for diffs
Wed Apr 20 16:37:09 2005 UTC (9 years, 6 months ago) by cpressey
Branches: MAIN
CVS tags: DragonFly_Stable
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +4 -8 lines
Style(9) cleanup: use ANSI format for function definitions.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Wed Mar 2 18:42:08 2005 UTC (9 years, 7 months ago) by hmp
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_2_Slip, DragonFly_RELEASE_1_2
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +1 -1 lines
Rename the flags for sf_buf_alloc(9) to be in line with FreeBSD:

	SFBA_PCATCH	-> SFB_CATCH
	SFBA_QUICK	-> SFB_CPUPRIVATE

Discussed-with:	Matthew Dillon <dillon at apollo.backplane.com>

Revision 1.30: download - view: text, markup, annotated - select for diffs
Sat Jan 29 20:54:20 2005 UTC (9 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +38 -35 lines
Redo argv processing to better conform to standards.  A NULL argv is no
longer allowed.  If argv[0] is NULL, we still pass a filename for argv[0]
to the underlying program but we no longer attempt to process any further
arguments.  Also rewrite the loop and get rid of the goto to make the code
more readable.

Suggested-by: Maxim Sobolev <sobomax@portaone.com>

Revision 1.29: download - view: text, markup, annotated - select for diffs
Fri Nov 12 00:09:23 2004 UTC (9 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +45 -29 lines
VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK.

NOTE: unionfs and nullfs are temporarily broken by this commit.

* Remove the old namecache API.  Remove vfs_cache_lookup(), cache_lookup(),
  cache_enter(), namei() and lookup() are all gone.  VOP_LOOKUP() and
  VOP_CACHEDLOOKUP() have been collapsed into a single non-caching
  VOP_LOOKUP().

* Complete the new VFS CACHE (namecache) API.  The new API is able to
  supply topological guarentees and is able to reserve namespaces,
  including negative cache spaces (whether the target name exists or not),
  which the new API uses to reserve namespace for things like NRENAME
  and NCREATE (and others).

* Complete the new namecache API.  VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE,
  NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE.
  These new calls take (typicaly locked) namecache pointers rather then
  combinations of directory vnodes, file vnodes, and name components.  The
  new calls are *MUCH* simpler in concept and implementation.  For example,
  VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments.

  The new namecache API uses the namecache to lock namespaces without having
  to lock the underlying vnodes.  For example, this allows the kernel
  to reserve the target name of a create function trivially.  Namecache
  records are maintained BY THE KERNEL for both positive and negative hits.

  Generally speaking, the kernel layer is now responsible for resolving
  path elements.  NRESOLVE is called when an unresolved namecache record
  needs to be resolved.  Unlike the old VOP_LOOKUP, NRESOLVE is simply
  responsible for associating a vnode to a namecache record (positive hit)
  or telling the system that it's a negative hit, and not responsible for
  handling symlinks or other special cases or doing any of the other
  path lookup work, much unlike the old VOP_LOOKUP.

  It should be particularly noted that the new namecache topology does not
  allow disconnected namecache records.  In rare cases where a vnode must
  be converted to a namecache pointer for new API operation via a file handle
  (i.e. NFS), the cache_fromdvp() function is provided and a new API VOP,
  VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the
  topology leading up to the requested vnode.  These and other topological
  guarentees greatly reduce the complexity of the new namecache API.

  The new namei() is called nlookup().  This function uses a combination
  of cache_n*() calls, VOP_NRESOLVE(), and standard VOP calls resolve the
  supplied path, deal with symlinks, and so forth, in a nice small compact
  compartmentalized procedure.

* The old VFS code is no longer responsible for maintaining namecache records,
  a function which was mostly adhoc cache_purge()s occuring before the VFS
  actually knows whether an operation will succeed or not.

  The new VFS code is typically responsible for adjusting the state of
  locked namecache records passed into it.  For example, if NCREATE succeeds
  it must call cache_setvp() to associate the passed namecache record with
  the vnode representing the successfully created file.  The new requirements
  are much less complex then the old requirements.

* Most VFSs still implement the old API calls, albeit somewhat modified
  and in particular the VOP_LOOKUP function is now *MUCH* simpler.  However,
  the kernel now uses the new API calls almost exclusively and relies on
  compatibility code installed in the default ops (vop_compat_*()) to
  convert the new calls to the old calls.

* All kernel system calls and related support functions which used to do
  complex and confusing namei() operations now do far less complex and
  far less confusing nlookup() operations.

* SPECOPS shortcutting has been implemented.  User reads and writes now go
  directly to supporting functions which talk to the device via fileops
  rather then having to be routed through VOP_READ or VOP_WRITE, saving
  significant overhead.  Note, however, that these only really effect
  /dev/null and /dev/zero.

  Implementing this was fairly easy, we now simply pass an optional
  struct file pointer to VOP_OPEN() and let spec_open() handle the
  override.

SPECIAL NOTES: It should be noted that we must still lock a directory vnode
LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because
a number of VFS's (including UFS) store active directory scanning information
in the directory vnode.  The legacy NAMEI_LOOKUP cases can be changed to
use LK_SHARED once these VFS cases are fixed.  In particular, we are now
organized well enough to actually be able to do record locking within a
directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't
been done yet.

Many thanks to all of the testers and in particular David Rhodus for
finding a large number of panics and other issues.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Tue Oct 12 19:20:46 2004 UTC (10 years ago) by dillon
Branches: MAIN
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +10 -9 lines
VFS messaging/interfacing work stage 8/99: Major reworking of the vnode
interlock and other miscellanious things.  This patch also fixes FS
corruption due to prior vfs work in head.  In particular, prior to this
patch the namecache locking could introduce blocking conditions that
confuse the old vnode deactivation and reclamation code paths.  With
this patch there appear to be no serious problems even after two days
of continuous testing.

* VX lock all VOP_CLOSE operations.
* Fix two NFS issues.  There was an incorrect assertion (found by
  David Rhodus), and the nfs_rename() code was not properly
  purging the target file from the cache, resulting in Stale file
  handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj.
* Fix a TTY session issue.  Programs which open("/dev/tty" ,...) and
  then run the TIOCNOTTY ioctl were causing the system to lose track
  of the open count, preventing the tty from properly detaching.
  This is actually a very old BSD bug, but it came out of the woodwork
  in DragonFly because I am now attempting to track device opens
  explicitly.
* Gets rid of the vnode interlock.  The lockmgr interlock remains.
* Introduced VX locks, which are mandatory vp->v_lock based locks.
* Rewrites the locking semantics for deactivation and reclamation.
  (A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE,
  and VOP_RECLAIM).  New guarentees emplaced with regard to vnode
  ripouts.
* Recodes the mountlist scanning routines to close timing races.
* Recodes getnewvnode to close timing races (it now returns a
  VX locked and refd vnode rather then a refd but unlocked vnode).
* Recodes VOP_REVOKE- a locked vnode is now mandatory.
* Recodes all VFS inode hash routines to close timing holes.
* Removes cache_leaf_test() - vnodes representing intermediate
  directories are now held so the leaf test should no longer be
  necessary.
* Splits the over-large vfs_subr.c into three additional source
  files, broken down by major function (locking, mount related,
  filesystem syncer).

* Changes splvm() protection to a critical-section in a number of
  places (bleedover from another patch set which is also about to be
  committed).

Known issues not yet resolved:

* Possible vnode/namecache deadlocks.
* While most filesystems now use vp->v_lock, I haven't done a final
  pass to make vp->v_lock mandatory and to clean up the few remaining
  inode based locks (nwfs I think and other obscure filesystems).
* NullFS gets confused when you hit a mount point in the underlying
  filesystem.
* Only UFS and NFS have been well tested
* NFS is not properly timing out namecache entries, causing changes made
  on the server to not be properly detected on the client if the client
  already has a negative-cache hit for the filename in question.

Testing-by: David Rhodus <sdrhodus@gmail.com>,
	    Peter Kadau <peter.kadau@tuebingen.mpg.de>,
	    walt <wa1ter@myrealbox.com>,
	    others

Revision 1.27: download - view: text, markup, annotated - select for diffs
Thu May 13 17:40:15 2004 UTC (10 years, 5 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004, DragonFly_Snap13Sep2004, DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +32 -18 lines
Close an interrupt race between vm_page_lookup() and (typically) a
vm_page_sleep_busy() check by using the correct spl protection.
An interrupt can occur inbetween the two operations and unbusy/free
the page in question, causing the busy check to fail and for the code
to fall through and then operate on a page that may have been freed
and possibly even reused.   Also note that vm_page_grab() had the same
issue between the lookup, busy check, and vm_page_busy() call.

Close an interrupt race when scanning a VM object's memq.  Interrupts
can free pages, removing them from memq, which interferes with memq scans
and can cause a page unassociated with the object to be processed as if it
were associated with the object.

Calls to vm_page_hold() and vm_page_unhold() require spl protection.

Rename the passed socket descriptor argument in sendfile() to make the
code more readable.

Fix several serious bugs in procfs_rwmem().  In particular, force it to
block if a page is busy and then retry.

Get rid of vm_pager_map_pag() and vm_pager_unmap_page(), make the functions
that used to use these routines use SFBUF's instead.

Get rid of the (userland?) 4MB page mapping feature in pmap_object_init_pt()
for now.  The code appears to not track the page directory properly and
could result in a non-zero page being freed as PG_ZERO.

This commit also includes updated code comments and some additional
non-operational code cleanups.

Revision 1.26: download - view: text, markup, annotated - select for diffs
Mon May 10 10:37:46 2004 UTC (10 years, 5 months ago) by hmp
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +1 -1 lines
Remove redundant newline in a call to panic(9).

Revision 1.25: download - view: text, markup, annotated - select for diffs
Sat Apr 24 04:32:03 2004 UTC (10 years, 6 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +1 -1 lines
Remove the VREF() macro and uses of it.
Remove uses of 0x20 before ^I inside vnode.h

Revision 1.24: download - view: text, markup, annotated - select for diffs
Sat Apr 24 04:09:21 2004 UTC (10 years, 6 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +4 -1 lines
Count statistics for exec calls.

Revision 1.23: download - view: text, markup, annotated - select for diffs
Mon Apr 19 20:07:16 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +2 -2 lines
Use vm_page_hold() instead of vm_page_wire() for exec's mapping of the first
text page.  vm_page_hold() is cheaper.

Taken-From: Alan Cox / FreeBSD

Revision 1.22: download - view: text, markup, annotated - select for diffs
Sun Apr 11 00:10:30 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +13 -27 lines
Use the sf_buf facility rather then kmem_alloc_wait/pmap_kenter/kmem_free
to map the first page of a binary into memory during an exec.  This results
in 5-10% lower execl() overhead and approximately 2% lower sys time useage
in a buildworld (~15-20 second build time reduction on an AMD64/3200+),
with less code.  And its a nice cleanup as well.

Patch-by: Alan Cox <alc@cs.rice.edu>

Revision 1.21: download - view: text, markup, annotated - select for diffs
Fri Mar 12 23:09:36 2004 UTC (10 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +6 -1 lines
In an rfork'd or vfork'd situation where multiple processes are sharing
the same vmspace, and one process goes zombie, the vmspace's vm_exitingcnt
will be non-zero.  If another process then forks or execs the exitingcnt will
be improperly inherited by the new vmspace.  The solution is to not copy
exitingcnt when copying to a new vmspace.

Additionally, for DragonFly, I also had to fix a few cases where the upcall
list was also being improperly inherited.

Heads-up-by: Xin LI <delphij@frontfree.net>
Obtained-From: Peter Wemm <peter@wemm.org> (FreeBSD-5)

Revision 1.20: download - view: text, markup, annotated - select for diffs
Mon Mar 1 06:33:17 2004 UTC (10 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +2 -2 lines
Newtoken commit.  Change the token implementation as follows:  (1) Obtaining
a token no longer enters a critical section.  (2) tokens can be held through
schedular switches and blocking conditions and are effectively released and
reacquired on resume.  Thus tokens serialize access only while the thread
is actually running.  Serialization is not broken by preemptive interrupts.
That is, interrupt threads which preempt do no release the preempted thread's
tokens.  (3) Unlike spl's, tokens will interlock w/ interrupt threads on
the same or on a different cpu.

The vnode interlock code has been rewritten and the API has changed.  The
mountlist vnode scanning code has been consolidated and all known races have
been fixed.  The vnode interlock is now a pool token.

The code that frees unreferenced vnodes whos last VM page has been freed has
been moved out of the low level vm_page_free() code and moved to the
periodic filesystem sycer code in vfs_msycn().

The SMP startup code and the IPI code has been cleaned up considerably.
Certain early token interactions on AP cpus have been moved to the BSP.

The LWKT rwlock API has been cleaned up and turned on.

Major testing by: David Rhodus

Revision 1.19: download - view: text, markup, annotated - select for diffs
Tue Jan 20 18:41:51 2004 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +25 -8 lines
Resident executable support stage 1/4: Add kernel bits and syscall support
for in-kernel caching of vmspace structures.  The main purpose of this
feature is to make it possible to run dynamically linked programs as fast
as if they were statically linked, by vmspace_fork()ing their vmspace and
saving the copy in the kernel, then using that whenever the program is
exec'd.

Revision 1.18: download - view: text, markup, annotated - select for diffs
Thu Jan 8 18:39:18 2004 UTC (10 years, 9 months ago) by asmodai
Branches: MAIN
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +1 -1 lines
Spell 'separate' and its siblings the way it is supposed to.

Revision 1.17: download - view: text, markup, annotated - select for diffs
Tue Nov 18 01:15:42 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +2 -5 lines
Backout part of 1.16.  It is not necessary to align the stack at this
point.  A hack to align the stack already exists in lib/csu/i386-elf/crt1.c.

After-Discussions-With: Bruce Evans <bde@zeta.org.au>

Revision 1.16: download - view: text, markup, annotated - select for diffs
Sun Nov 16 19:32:31 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +13 -10 lines
Cleanup aux args and 32-byte align the initial user stack pointer.  Note that
newer GCCs use masking ops on the stack pointer and do not need an aligned
stack pointer, but older GCCs will benefit and, besides, it doesn't hurt.

Submitted-by: Alexander Leidinger

Revision 1.15: download - view: text, markup, annotated - select for diffs
Sun Nov 16 02:37:39 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +27 -19 lines
Correct bugs introduced in the last syscall separation commit.  The
argument list for shell scripts was not being properly rewritten, corrupting
the argv list and environment.

Revision 1.14: download - view: text, markup, annotated - select for diffs
Wed Nov 12 01:00:33 2003 UTC (10 years, 11 months ago) by daver
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +160 -87 lines
Split execve().  This required some interesting changes to the shell
image activation code and the image_params structure.

Userland pointers are no longer passed in the image_params structure.
The exec_copyin_args() function now pulls the arguments, environment
and filename of the target being execve()'d into a kernel space buffer
before calling kern_execve().

The exec_shell_imgact() function does some magic to prepend the
interpreter arguments.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Wed Nov 5 23:26:20 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +5 -0 lines
Variant symlink support stage 1/2: Implement support for storing and retrieving
system-specific, user-specific, and process-specific variables.

Revision 1.12: download - view: text, markup, annotated - select for diffs
Tue Sep 23 05:03:51 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +3 -2 lines
namecache work stage 1: namespace cleanups.  Add a NAMEI_ prefix to
CREATE, LOOKUP, DELETE, and RENAME.  Add a CNP_ prefix too all the name
lookup flags (nd_flags) e.g. ISDOTDOT->CNP_ISDOTDOT.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Tue Aug 26 21:09:02 2003 UTC (11 years, 2 months ago) by rob
Branches: MAIN
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +2 -2 lines
__P() removal

Revision 1.10: download - view: text, markup, annotated - select for diffs
Wed Aug 20 04:44:54 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +7 -0 lines
Linux needs %edx to be 0 on entry.  It registers it as an atexit function if
it isn't NULL.  I thought I had maintained this but I forgot that the
syscall exit code is loaded into %eax,%edx after an execve().  This fixes the
problem.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Fri Aug 8 21:47:49 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +6 -2 lines
Add a few missing cratom() calls.  In particular the code in kern_exec()
was modifying the svuid of the parent process when a child exec()'d, which
broke the 'man' program (and probably other things).

Reported-by: Jeroen Ruigrok/asmodai <asmodai@wxs.nl>

Revision 1.8: download - view: text, markup, annotated - select for diffs
Thu Jul 24 01:41:25 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +0 -8 lines
Preliminary syscall messaging work.  Adjust all <syscall>_args structures
to include an lwkt_msg at their base which will eventually allow syscalls
to run asynch.  Note that this is for the kernel copy of the arguments, the
userland argument format has not changed for the standard syscall entry
point.

Begin abstracting a messaging syscall interface (#if 0'd out at the moment).

Change the syscall2 entry point to take the new expanded argument structure
into account.  Change sysent argument calculation (AS macro) to take the
new expanded argument structure into account.

Note: existing linux, svr4, and ibcs2 emulation may break with this commit,
though it is not intentional.

Revision 1.7: download - view: text, markup, annotated - select for diffs
Wed Jul 23 07:14:18 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +2 -3 lines
2003-07-22 Hiten Pandya <hmp@nxad.com>

        * MFC FreeBSD rev. 1.189 of kern_exit.c (DONE)
          (shmexit to take vmspace instead of proc)
          (sort the sys/lock.h include in vm_map.c too)

        * MFC FreeBSD rev. 1.143 of kern_sysctl.c (DONE)
          (don't panic if sysctl is unregistrable)

        * Don't panic when enumerating SYSCTL_NODE()
          without children nodes. (DONE)

        * MFC FreeBSD rev. 1.113 of kern_sysctl.c (DONE)
          (Fix ogetkerninfo() handling  for KINFO_BSD_SYSINFO)

        * MFC FreeBSD rev. 1.103 of kern_sysctl.c (DONE)
          (Never reuse AUTO_OID values)

        * MFC FreeBSD rev 1.21 of i386/include/bus_dma.h
          (BUS_DMAMEM_NOSYNC -> BUS_DMA_COHERENT)

        * MFC FreeBSD rev. 1.19 of i386/include/bus_dma.h (DONE)
          (Implement real read/write barriers for i386)

Submitted by: Hiten Pandya <hmp@FreeBSD.ORG>

Revision 1.6: download - view: text, markup, annotated - select for diffs
Thu Jun 26 05:55:14 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +2 -2 lines
proc->thread stage 5:  BUF/VFS clearance!  Remove the ucred argument from
vop_close, vop_getattr, vop_fsync, and vop_createvobject.  These VOPs can
be called from multiple contexts so the cred is fairly useless, and UFS
ignorse it anyway.  For filesystems (like NFS) that sometimes need a cred
we use proc0.p_ucred for now.

This removal also removed the need for a 'proc' reference in the related
VFS procedures, which greatly helps our proc->thread conversion.

bp->b_wcred and bp->b_rcred have also been removed, and for the same reason.
It makes no sense to have a particular cred when multiple users can
access a file.  This may create issues with certain types of NFS mounts
but if it does we will solve them in a way that doesn't pollute the
struct buf.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Thu Jun 26 02:17:45 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +2 -2 lines
Introduce cratom(), remove crcopy().

Revision 1.4: download - view: text, markup, annotated - select for diffs
Wed Jun 25 03:55:57 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +13 -10 lines
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread
pointers instead of process pointers as arguments, similar to what FreeBSD-5
did.  Note however that ultimately both APIs are going to be message-passing
which means the current thread context will not be useable for creds and
descriptor access.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Mon Jun 23 17:55:41 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +12 -14 lines
proc->thread stage 2: MAJOR revamping of system calls, ucred, jail API,
and some work on the low level device interface (proc arg -> thread arg).
As -current did, I have removed p_cred and incorporated its functions
into p_ucred.  p_prison has also been moved into p_ucred and adjusted
accordingly.  The jail interface tests now uses ucreds rather then processes.

The syscall(p,uap) interface has been changed to just (uap).  This is inclusive
of the emulation code.  It makes little sense to pass a proc pointer around
which confuses the MP readability of the code, because most system call code
will only work with the current process anyway.  Note that eventually
*ALL* syscall emulation code will be moved to a kernel-protected userland
layer because it really makes no sense whatsoever to implement these
emulations in the kernel.

suser() now takes no arguments and only operates with the current process.
The process argument has been removed from suser_xxx() so it now just takes
a ucred and flags.

The sysctl interface was adjusted somewhat.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Jun 17 04:28:41 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +1 -0 lines
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jun 17 02:55:00 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: FREEBSD_4_FORK
import from FreeBSD RELENG_4 1.107.2.15

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options