DragonFly BSD

CVS log for src/sys/vm/vm_map.c

[BACK] Up to [DragonFly] / src / sys / vm

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.56: download - view: text, markup, annotated - select for diffs
Sun Apr 29 18:25:41 2007 UTC (7 years, 7 months ago) by dillon
Branches: MAIN
CVS tags: HEAD, DragonFly_RELEASE_2_0_Slip, DragonFly_RELEASE_2_0, DragonFly_RELEASE_1_12_Slip, DragonFly_RELEASE_1_12, DragonFly_RELEASE_1_10_Slip, DragonFly_RELEASE_1_10, DragonFly_Preview
Diff to: previous 1.55: preferred, unified
Changes since revision 1.55: +113 -72 lines
* Use SYSREF for vmspace structures.  This replaces the vmspace structure's
  roll-your-own refcnt implementation and replaces its zalloc backing store.
  Numerous procedures have been added to handle termination and DTOR
  operations and to properly interlock with vm_exitingcnt, all centered
  around the vmspace_sysref_class declaration.

* Replace pmap_activate() and pmap_deactivate() with add pmap_replacevm().
  This replaces numerous instances where roll-your-own deactivate/activate
  sequences were being used, creating small windows of opportunity where
  an update to the kernel pmap would not be visible to running code.

* Properly deactivate pmaps and add assertions to the fact in the teardown
  code.  Cases had to be fixed in cpu_exit_switch(), the exec code, the
  AIO code, and a few other places.

* Add pmap_puninit() which is called as part of the DTOR sequence for
  vmspaces, allowing the kmem mapping and VM object to be recovered.
  We could not do this with the previous zalloc() implementation.

* Properly initialize the per-cpu sysid allocator (globaldata->gd_sysid_alloc).

Make the following adjustments to the LWP exiting code.

* P_WEXIT interlocks the master exiting thread, eliminating races which can
  occur when it is signaling the 'other' threads.

* LWP_WEXIT interlocks individual exiting threads, eliminating races which
  can occur there and streamlining some of the tests.

* Don't bother queueing the last LWP to the reaper.  Instead, just leave it
  in the p_lwps list (but still decrement nthreads), and add code to
  kern_wait() to reap the last thread.  This improves exit/wait performance
  for unthreaded applications.

* Fix a VMSPACE teardown race in the LWP code.  It turns out that it was
  still possible for the VMSPACE for an exiting LWP to be ripped out from
  under it by the reaper (due to a conditional that was really supposed to
  be a loop), or by kern_wait() (due to not waiting for all the LWPs to
  enter an exiting state).  The fix is to have the LWPs PHOLD() the process
  and then PRELE() it when they are reaped.

This is a little mixed up because the addition of SYSREF revealed a number
of other semi-related bugs in the pmap and LWP code which also had to be
fixed.

Revision 1.55: download - view: text, markup, annotated - select for diffs
Sun Jan 7 08:37:37 2007 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8
Diff to: previous 1.54: preferred, unified
Changes since revision 1.54: +3 -0 lines
Implement nearly all the remaining items required to allow the virtual kernel
to actually execute code on behalf of a virtualized user process.  The
virtual kernel is now able to execute the init binary through to the point
where it sets up a TLS segment.

* Create a pseudo tf_trapno called T_SYSCALL80 to indicate system call traps.

* Add MD shims when creating or destroying a struct vmspace, allowing the
  virtual kernel to create and destroy real-kernel vmspaces along with.

  Add appropriate calls to vmspace_mmap() and vmspace_mcontrol() to map
  memory inside the user process vmspace.  The memory is mapped VPAGETABLE
  and the page table directory is set to point to the pmap page directory.

* Clean up user_trap, handle T_PAGEFLT properly.

* Implement go_user().  It calls vmspace_ctl(... VMSPACE_CTL_RUN) and
  user_trap() in a loop, allowing the virtual kernel to 'run' a user
  mode context under its control.

* Reduce VM_MAX_USER_ADDRESS to 0xb8000000 for now, until I figure out the
  best way to have the virtual kernel query the actual max user address from
  the real kernel.

* Correct a pm_pdirpte assignment.  We can't look up the PTE until after
  we have entered it into the kernel pmap.

Revision 1.54: download - view: text, markup, annotated - select for diffs
Thu Dec 28 21:24:02 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.53: preferred, unified
Changes since revision 1.53: +8 -9 lines
Make kernel_map, buffer_map, clean_map, exec_map, and pager_map direct
structural declarations instead of pointers.  Clean up all related code,
in particular kmem_suballoc().

Remove the offset calculation for kernel_object.  kernel_object's page
indices used to be relative to the start of kernel virtual memory in order
to improve the performance of VM page scanning algorithms.  The optimization
is no longer needed now that VM objects use Red-Black trees.  Removal of
the offset simplifies a number of calculations and makes the code more
readable.

Revision 1.53: download - view: text, markup, annotated - select for diffs
Thu Dec 28 18:29:08 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.52: preferred, unified
Changes since revision 1.52: +1 -1 lines
Introduce globals: KvaStart, KvaEnd, and KvaSize.  Used by the kernel
instead of the nutty VADDR and VM_*_KERNEL_ADDRESS macros.  Move extern
declarations for these variables as well as for virtual_start, virtual_end,
and phys_avail[] from MD headers to MI headers.

Make kernel_object a global structure instead of a pointer.

Remove kmem_object and all related code (none of it is used any more).

Revision 1.52: download - view: text, markup, annotated - select for diffs
Tue Nov 7 17:51:24 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.51: preferred, unified
Changes since revision 1.51: +1 -1 lines
Misc cleanups and CVS surgery.  Move a number of header and source files
from machine/pc32 to cpu/i386 as part of the ongoing architectural separation
work and do a bit of cleanup.

Revision 1.51: download - view: text, markup, annotated - select for diffs
Sat Oct 21 04:28:22 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.50: preferred, unified
Changes since revision 1.50: +2 -0 lines
Reformulate the way the kernel updates the PMAPs in the system when adding
a new page table page to expand kernel memory.  Keep track of the PMAPs in
their own list rather then scanning the process list to locate them.  This
allows PMAPs managed on behalf of virtual kernels to be properly updated.

VM spaces can now be allocated from scratch and may not have a parent
template to inherit certain fields from.  Make sure these fields are
properly cleared.

Revision 1.50: download - view: text, markup, annotated - select for diffs
Wed Sep 13 22:25:00 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.49: preferred, unified
Changes since revision 1.49: +67 -50 lines
Collapse some bits of repetitive code into their own procedures and
allocate a maximally sized default object to back MAP_VPAGETABLE
mappings, allowing us to access logical memory beyond the size of the
original mmap() call by programming the page table to point at it.

This gives us an abstraction and capability similar to a real kernel's
ability to map e.g. 2GB of physical memory into its 1GB address space.

Revision 1.49: download - view: text, markup, annotated - select for diffs
Wed Sep 13 17:10:42 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.48: preferred, unified
Changes since revision 1.48: +73 -19 lines
MAP_VPAGETABLE support part 3/3.

Implement a new system call called mcontrol() which is an extension of
madvise(), adding an additional 64 bit argument.  Add two new advisories,
MADV_INVAL and MADV_SETMAP.

MADV_INVAL will invalidate the pmap for the specified virtual address
range.  You need to do this for the virtual addresses effected by changes
made in a virtual page table.

MADV_SETMAP sets the top-level page table entry for the virtual page table
governing the mapped range.  It only works for memory governed by a virtual
page table and strange things will happen if you only set the root
page table entry for part of the virtual range.

Further refine the virtual page table format.  Keep with 32 bit VPTE's for
the moment, but properly implement VPTE_PS and VPTE_V.  VPTE_PS can be
used to suport 4MB linear maps in the top level page table and it can also
be used when specifying the 'root' VPTE to disable the page table entirely
and just linear map the backing store.  VPTE_V is the 'valid' bit (before
it was inverted, now it is normal).

Revision 1.48: download - view: text, markup, annotated - select for diffs
Tue Sep 12 18:41:32 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.47: preferred, unified
Changes since revision 1.47: +11 -2 lines
MAP_VPAGETABLE support part 1/3.

Reorganize vm_fault() to get more direct access to the VM page resolved by
a VM fault.  Move vm_fault()'s core shadow object traversal and fault I/O
code to a new procedure called vm_fault_object().

Begin adding support for memory mappings which are backed by a virtualized
page table under userland control.

Revision 1.47: download - view: text, markup, annotated - select for diffs
Mon Sep 11 20:25:31 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.46: preferred, unified
Changes since revision 1.46: +117 -52 lines
Move flag(s) representing the type of vm_map_entry into its own vm_maptype_t
type.  This is a precursor to adding a new VM mapping type for virtualized
page tables.

Revision 1.46: download - view: text, markup, annotated - select for diffs
Sat Aug 12 00:26:22 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.45: preferred, unified
Changes since revision 1.45: +1 -1 lines
VNode sequencing and locking - part 3/4.

VNode aliasing is handled by the namecache (aka nullfs), so there is no
longer a need to have VOP_LOCK, VOP_UNLOCK, or VOP_ISSLOCKED as 'VOP'
functions.  Both NFS and DEADFS have been using standard locking functions
for some time and are no longer special cases.  Replace all uses with
native calls to vn_lock, vn_unlock, and vn_islocked.

We can't have these as VOP functions anyhow because of the introduction of
the new SYSLINK transport layer, since vnode locks are primarily used to
protect the local vnode structure itself.

Revision 1.45: download - view: text, markup, annotated - select for diffs
Tue Aug 8 03:52:45 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.44: preferred, unified
Changes since revision 1.44: +1 -1 lines
LK_NOPAUSE no longer serves a purpose, scrap it.

Revision 1.44: download - view: text, markup, annotated - select for diffs
Wed May 17 17:47:58 2006 UTC (8 years, 7 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.43: preferred, unified
Changes since revision 1.43: +0 -76 lines
Remove the (unused) copy-on-write support for a vnode's VM object.  This
support originally existed to support the badly implemented and severely
hacked ENABLE_VFS_IOOPT I/O optimization which was removed long ago.

This also removes a bunch of cross-module pollution in UFS.

Revision 1.43: download - view: text, markup, annotated - select for diffs
Fri May 5 21:15:11 2006 UTC (8 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.42: preferred, unified
Changes since revision 1.42: +2 -4 lines
Simplify vn_lock(), VOP_LOCK(), and VOP_UNLOCK() by removing the thread_t
argument.  These calls now always use the current thread as the lockholder.
Passing a thread_t to these functions has always been questionable at best.

Revision 1.42: download - view: text, markup, annotated - select for diffs
Mon Mar 27 01:54:18 2006 UTC (8 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.41: preferred, unified
Changes since revision 1.41: +2 -2 lines
Change *_pager_allocate() to take off_t instead of vm_ooffset_t.  The
actual underlying type (a 64 bit signed integer) is the same.   Recent and
upcoming work is standardizing on off_t.

Move object->un_pager.vnp.vnp_size to vnode->v_filesize.  As before, the
field is still only valid when a VM object is associated with the vnode.

Revision 1.41: download - view: text, markup, annotated - select for diffs
Thu Mar 2 19:08:00 2006 UTC (8 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +1 -1 lines
Pass LK_PCATCH instead of trying to store tsleep flags in the lock
structure, so multiple entities competing for the same lock do not
use unexpected flags when sleeping.

Only NFS really uses PCATCH with lockmgr locks.

Revision 1.40: download - view: text, markup, annotated - select for diffs
Fri Jan 13 20:45:30 2006 UTC (8 years, 11 months ago) by swildner
Branches: MAIN
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +1 -1 lines
* Remove (void) casts for discarded return values.

* Ansify function definitions.

In-collaboration-with: Alexey Slynko <slynko@tronet.ru>

Revision 1.39: download - view: text, markup, annotated - select for diffs
Sun Mar 13 15:58:56 2005 UTC (9 years, 9 months ago) by eirikn
Branches: MAIN
CVS tags: DragonFly_Stable, DragonFly_RELEASE_1_4_Slip, DragonFly_RELEASE_1_4, DragonFly_RELEASE_1_2_Slip, DragonFly_RELEASE_1_2
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +0 -1 lines
There is no need to set *entry on each entry traversed in the red-black tree
when looking up a record.

Revision 1.38: download - view: text, markup, annotated - select for diffs
Mon Feb 7 20:39:01 2005 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +2 -1 lines
gdb-6 uses /dev/kmem exclusively for kernel addresses when gdb'ing a live
kernel, but the globaldata mapping is outside the bounds of kernel_map.
Make sure that the globaldata mapping is visible to it.

Revision 1.37: download - view: text, markup, annotated - select for diffs
Thu Jan 20 18:00:38 2005 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +82 -69 lines
Replace the cache-point linear search algorithm for VM map entries with
a red-black tree.  This makes VM map lookups O(log N) in all cases.

Note that FreeBSD seems to have gone the splay-tree route, but I really
dislike the fact that splay trees are constantly writing to memory even
for simple lookups.  This would also limit our ability to implement a
separate hinting/caching mechanism.  A red-black tree is basically a
binary tree with internal nodes containing real data in addition to the leafs,
simlar to a B+Tree.  A red-black tree is very similar to a splay tree but it
does not attempt to modify the data structure for pure lookups.

Caveat: we tried to revive the map->hint mechanism but there is currently
a serious crash/lockup bug related to it so it is disabled in this commit.

Submitted-by: Eirik Nygaard <eirikn@kerneled.com>
Using-Red-Black-Macros-From: NetBSD (sys/tree.h)

Revision 1.36: download - view: text, markup, annotated - select for diffs
Tue Dec 21 02:42:41 2004 UTC (9 years, 11 months ago) by hsu
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +1 -1 lines
Fix whitespace.

Revision 1.35: download - view: text, markup, annotated - select for diffs
Tue Oct 26 04:33:11 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +40 -26 lines
Fix bugs in the vm_map_entry reservation and zalloc code.  This code is a bit
sticky because zalloc must be able to call kmem_alloc*() in order to extend
mapentzone to allocate a new chunk of vm_map_entry structures, and
kmem_alloc*() *needs* two vm_map_entry structures in order to map the new
data block into the kernel.  To avoid a chicken-and-egg recursion there must
already be some vm_map_entry structures available for kmem_alloc*() to use.

To ensure that structures are available the vm_map_entry cache maintains
a 'reserve'.  This cache is initially populated from the vm_map_entry's
allocated via zbootinit() in vm_map.c.  However, since this is a per-cpu
cache there are situations where the vm_map subsystem will be used on other
cpus before the cache can be populated on those cpus, but after the static
zbootinit structures have all been used up.  To fix this we statically
allocate two vm_map_entry structures for each cpu which is sufficient for
zalloc to call kmem_alloc*() to allocate the remainder of the reserve.
Having a lot preloaded modules seems to be able to trigger the bug.

Also get rid of gd_vme_kdeficit which was a confusing methodology to
keep track of kernel reservations.  Now we just have gd_vme_avail and
a negative count indicates a deficit (the reserve is being dug into).

From-panic-reported-by: Adam K Kirchhoff <adamk@voicenet.com>

Revision 1.34: download - view: text, markup, annotated - select for diffs
Mon Oct 25 19:14:33 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +0 -249 lines
Remove the vfs page replacement optimization and its ENABLE_VFS_IOOPT option.
This never worked properly... that is, the semantics are broken compared to
a normal read or write in that the read 'buffer' will be modified out from
under the caller if the underlying file is.

What is really needed here is a copy-on-write feature that works in both
directions, similar to how a shared buffer is copied after a fork() if either
the parent or child modify it.  The optimization will eventually rewritten
with that in mind but not right now.

Revision 1.33: download - view: text, markup, annotated - select for diffs
Tue Oct 12 19:21:16 2004 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +18 -19 lines
VFS messaging/interfacing work stage 8/99: Major reworking of the vnode
interlock and other miscellanious things.  This patch also fixes FS
corruption due to prior vfs work in head.  In particular, prior to this
patch the namecache locking could introduce blocking conditions that
confuse the old vnode deactivation and reclamation code paths.  With
this patch there appear to be no serious problems even after two days
of continuous testing.

* VX lock all VOP_CLOSE operations.
* Fix two NFS issues.  There was an incorrect assertion (found by
  David Rhodus), and the nfs_rename() code was not properly
  purging the target file from the cache, resulting in Stale file
  handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj.
* Fix a TTY session issue.  Programs which open("/dev/tty" ,...) and
  then run the TIOCNOTTY ioctl were causing the system to lose track
  of the open count, preventing the tty from properly detaching.
  This is actually a very old BSD bug, but it came out of the woodwork
  in DragonFly because I am now attempting to track device opens
  explicitly.
* Gets rid of the vnode interlock.  The lockmgr interlock remains.
* Introduced VX locks, which are mandatory vp->v_lock based locks.
* Rewrites the locking semantics for deactivation and reclamation.
  (A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE,
  and VOP_RECLAIM).  New guarentees emplaced with regard to vnode
  ripouts.
* Recodes the mountlist scanning routines to close timing races.
* Recodes getnewvnode to close timing races (it now returns a
  VX locked and refd vnode rather then a refd but unlocked vnode).
* Recodes VOP_REVOKE- a locked vnode is now mandatory.
* Recodes all VFS inode hash routines to close timing holes.
* Removes cache_leaf_test() - vnodes representing intermediate
  directories are now held so the leaf test should no longer be
  necessary.
* Splits the over-large vfs_subr.c into three additional source
  files, broken down by major function (locking, mount related,
  filesystem syncer).

* Changes splvm() protection to a critical-section in a number of
  places (bleedover from another patch set which is also about to be
  committed).

Known issues not yet resolved:

* Possible vnode/namecache deadlocks.
* While most filesystems now use vp->v_lock, I haven't done a final
  pass to make vp->v_lock mandatory and to clean up the few remaining
  inode based locks (nwfs I think and other obscure filesystems).
* NullFS gets confused when you hit a mount point in the underlying
  filesystem.
* Only UFS and NFS have been well tested
* NFS is not properly timing out namecache entries, causing changes made
  on the server to not be properly detected on the client if the client
  already has a negative-cache hit for the filename in question.

Testing-by: David Rhodus <sdrhodus@gmail.com>,
	    Peter Kadau <peter.kadau@tuebingen.mpg.de>,
	    walt <wa1ter@myrealbox.com>,
	    others

Revision 1.32: download - view: text, markup, annotated - select for diffs
Tue Aug 17 18:57:36 2004 UTC (10 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004, DragonFly_Snap13Sep2004
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +2 -1 lines
VFS messaging/interfacing work stage 2/99.  This stage retools the vnode ops
vector dispatch, making the vop_ops a per-mount structure rather then a
per-filesystem structure.  Filesystem mount code, typically in blah_vfsops.c,
must now register various vop_ops pointers in the struct mount to compile
its VOP operations set.

This change will allow us to begin adding per-mount hooks to VFSes to support
things like kernel-level journaling, various forms of cache coherency
management, and so forth.

In addition, the vop_*() calls now require a struct vop_ops pointer as the
first argument instead of a vnode pointer (note: in this commit the VOP_*()
macros currently just pull the vop_ops pointer from the vnode in order to
call the vop_*() procedures).  This change is intended to allow us to divorce
ourselves from the requirement that a vnode pointer always be part of a VOP
call.  In particular, this will allow namespace based routines such as
remove(), mkdir(), stat(), and so forth to pass namecache pointers rather then
locked vnodes and is a very important precursor to the goal of using the
namecache for namespace locking.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Wed Jul 28 20:40:35 2004 UTC (10 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +1 -1 lines
(From Alan):
 Correct a very old error in both vm_object_madvise() (originating in
 vm/vm_object.c revision 1.88) and vm_object_sync() (originating in
 vm/vm_map.c revision 1.36): When descending a chain of backing objects,
 both use the wrong object's backing offset.  Consequently, both may
 operate on the wrong pages.

(From Matt):
 In DragonFly the code needing correction is in vm_object_madvise() and
 vm_map_clean() (that code in vm_map_clean() was moved to vm_object_sync()
 in FreebSD-5 hence the FreeBSD-5 correction made by Alan was slight
 different).

 The madvise case could produce corrupted user memory when MADV_FREE was
 used, primarily on server-forked processes (where shadow objects exist)
 PLUS a special set of additional circumstances:  (1) The deeper shadow
 layers had to no longer be shared, (2) Either the memory had been swapped
 out in deeper shadow layers (not just the first shadow layer), resulting
 in the wrong swap space being freed, or (2) the forked memory had not yet
 been COW'd (and the deeper shadow layer is no longer shared) AND also had
 not yet been collapsed backed into the parent (e.g.  the original parent
 and/or other forks had exited and/or the memory had been isolated from
 them already).

 This bug could be responsible for all of the sporatic madvise oddness
 that has been reported over the years, especially in earlier days when
 systems had less memory and paged to swap a lot more then they do today.
 These weird failure cases have led people to generally not use MADV_FREE
 (in particular the 'H' malloc.conf option) as much as they could.  Also
 note that I tightened up the VM object collapse code considerably in
 FreeBSD-4.x making the failure cases above even less likely to occur.

 The vm_map_clean() (vm_object_sync() in FreeBSD-5) case is not likely
 to produce failures and it might not even be possible for it to occur
 in the first place since it requires PROT_WRITE mapped vnodes to exist
 in a backing object, which either might not be possible or might only occur
 under extrodinary circumstances.  Plus the worst that happens is that the
 vnode's data doesn't get written out immediately (but always will later on).

 Kudos to Alan for finding this old bug!

Noticed and corrected by: Alan Cox <alc@cs.rice.edu>
See also: FreeBSD vm_object.c/1.329

Revision 1.30: download - view: text, markup, annotated - select for diffs
Sat Jul 24 20:25:47 2004 UTC (10 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +2 -2 lines
Adjust gd_vme_avail after ensuring that sufficient entries exist rather
then before.  This should solve a panic where the userland
vm_map_entry_reserve() was eating out of the kernel's reserve and causing
a recursive zalloc() to panic.

Revision 1.29: download - view: text, markup, annotated - select for diffs
Wed Jul 21 01:25:18 2004 UTC (10 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +3 -1 lines
Fix a device pager leak for the case where the page already exists in the
VM object (typical case: multiple mappings of the device?).  If the page
already exists we simply update its physical address.  It is unclear whether
the physical address would ever actually be different, however.

This is an untested patch.

Original-patch-written-by: Christian Zander @ NVIDIA
Workaround-suggested-by: Tor Egge <tegge@freebsd.org>
Submitted-by: Emiel Kollof <coolvibe@hackerheaven.org>

Revision 1.28: download - view: text, markup, annotated - select for diffs
Thu May 27 00:38:58 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +18 -26 lines
Bring in the fictitious page wiring bug fixes from FreeBSD-5.  Make additional
major changes to the APIs to clean them up (so this commit is substantially
different than what was committed to FreeBSD-5).

Obtained-from: Alan Cox <alc@cs.rice.edu> (FreeBSD-5)

Revision 1.27: download - view: text, markup, annotated - select for diffs
Thu May 13 17:40:19 2004 UTC (10 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +30 -5 lines
Close an interrupt race between vm_page_lookup() and (typically) a
vm_page_sleep_busy() check by using the correct spl protection.
An interrupt can occur inbetween the two operations and unbusy/free
the page in question, causing the busy check to fail and for the code
to fall through and then operate on a page that may have been freed
and possibly even reused.   Also note that vm_page_grab() had the same
issue between the lookup, busy check, and vm_page_busy() call.

Close an interrupt race when scanning a VM object's memq.  Interrupts
can free pages, removing them from memq, which interferes with memq scans
and can cause a page unassociated with the object to be processed as if it
were associated with the object.

Calls to vm_page_hold() and vm_page_unhold() require spl protection.

Rename the passed socket descriptor argument in sendfile() to make the
code more readable.

Fix several serious bugs in procfs_rwmem().  In particular, force it to
block if a page is busy and then retry.

Get rid of vm_pager_map_pag() and vm_pager_unmap_page(), make the functions
that used to use these routines use SFBUF's instead.

Get rid of the (userland?) 4MB page mapping feature in pmap_object_init_pt()
for now.  The code appears to not track the page directory properly and
could result in a non-zero page being freed as PG_ZERO.

This commit also includes updated code comments and some additional
non-operational code cleanups.

Revision 1.26: download - view: text, markup, annotated - select for diffs
Mon Apr 26 20:26:59 2004 UTC (10 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +3 -2 lines
Bring in the following revs from FreeBS-4:

    1.250.2.25  +3 -2      src/sys/i386/i386/pmap.c
    1.33.2.6    +2 -2      src/sys/vm/pmap.h
    1.187.2.25  +3 -2      src/sys/vm/vm_map.c

Suggested-by: Alan Cox <alc@cs.rice.edu>

Revision 1.25: download - view: text, markup, annotated - select for diffs
Fri Apr 23 06:23:45 2004 UTC (10 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +1 -1 lines
msync(..., MS_INVALIDATE) will incorrectly remove dirty pages without
synchronizing them to their backing store under certain circumstances,
and can also cause struct buf's to become inconsistent.  This can be
particularly gruesome when MS_INVALIDATE is used on a range of memory that
is mmap()'d to be read-only.

Fix MS_INVALIDATE's operation (1) by making UFS honor the invalidation
request when flushing to backing store to destroy the related struct buf
and (2) by never removing pages wired into the buffer cache and never
removing pages that are found to still be dirty.

Note that NFS was already coded to honor invalidation requests in
nfs_write().  Filesystems other then NFS and UFS do not currently support
buffer-invalidation-on-write but all that means now is that the pages
will remain in cache, rather then be incorrectly removed and cause corruption.

Reported-by: Stephan Uphoff <ups@tree.com>, Julian Elischer <julian@elischer.org>

Revision 1.24: download - view: text, markup, annotated - select for diffs
Tue Mar 23 22:54:32 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +11 -28 lines
ANSIfication (procedure args) cleanup.

Submitted-by: Andre Nathan <andre@digirati.com.br>

Revision 1.23: download - view: text, markup, annotated - select for diffs
Fri Mar 12 23:09:37 2004 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +12 -4 lines
In an rfork'd or vfork'd situation where multiple processes are sharing
the same vmspace, and one process goes zombie, the vmspace's vm_exitingcnt
will be non-zero.  If another process then forks or execs the exitingcnt will
be improperly inherited by the new vmspace.  The solution is to not copy
exitingcnt when copying to a new vmspace.

Additionally, for DragonFly, I also had to fix a few cases where the upcall
list was also being improperly inherited.

Heads-up-by: Xin LI <delphij@frontfree.net>
Obtained-From: Peter Wemm <peter@wemm.org> (FreeBSD-5)

Revision 1.22: download - view: text, markup, annotated - select for diffs
Mon Mar 1 06:33:24 2004 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +19 -2 lines
Newtoken commit.  Change the token implementation as follows:  (1) Obtaining
a token no longer enters a critical section.  (2) tokens can be held through
schedular switches and blocking conditions and are effectively released and
reacquired on resume.  Thus tokens serialize access only while the thread
is actually running.  Serialization is not broken by preemptive interrupts.
That is, interrupt threads which preempt do no release the preempted thread's
tokens.  (3) Unlike spl's, tokens will interlock w/ interrupt threads on
the same or on a different cpu.

The vnode interlock code has been rewritten and the API has changed.  The
mountlist vnode scanning code has been consolidated and all known races have
been fixed.  The vnode interlock is now a pool token.

The code that frees unreferenced vnodes whos last VM page has been freed has
been moved out of the low level vm_page_free() code and moved to the
periodic filesystem sycer code in vfs_msycn().

The SMP startup code and the IPI code has been cleaned up considerably.
Certain early token interactions on AP cpus have been moved to the BSP.

The LWKT rwlock API has been cleaned up and turned on.

Major testing by: David Rhodus

Revision 1.21: download - view: text, markup, annotated - select for diffs
Tue Jan 20 18:41:52 2004 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +15 -6 lines
Resident executable support stage 1/4: Add kernel bits and syscall support
for in-kernel caching of vmspace structures.  The main purpose of this
feature is to make it possible to run dynamically linked programs as fast
as if they were statically linked, by vmspace_fork()ing their vmspace and
saving the copy in the kernel, then using that whenever the program is
exec'd.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Tue Jan 20 05:04:08 2004 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +2 -2 lines
Retool the M_* flags to malloc() and the VM_ALLOC_* flags to
vm_page_alloc(), and vm_page_grab() and friends.

The M_* flags now have more flexibility, with the intent that we will start
using some of it to deal with NULL pointer return problems in the codebase
(CAM is especially bad at dealing with unexpected return values).  In
particular, add M_USE_INTERRUPT_RESERVE and M_FAILSAFE, and redefine
M_NOWAIT as a combination of M_ flags instead of its own flag.

The VM_ALLOC_* macros are now flags (0x01, 0x01, 0x04) rather then states
(1, 2, 3), which allows us to create combinations that the old interface
could not handle.

Revision 1.19: download - view: text, markup, annotated - select for diffs
Sun Jan 18 12:32:04 2004 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +4 -0 lines
vm_uiomove() is a VFS_IOOPT related procedure, conditionalize it
appropriately.

Revision 1.18: download - view: text, markup, annotated - select for diffs
Wed Jan 14 23:26:14 2004 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +14 -11 lines
Cleanup the vm_map_entry_[k]reserve/[k]release() API.  This API is used to
guarentee that sufficient vm_map_entry structures are available for certain
atomci VM operations.

The kreserve/krelease API is only supposed to be used to dig into the kernel
reserve, used when zalloc() must recurse into kmem_alloc() in order to
allocate a new page.  Without this we can get into a kmem_alloc -> zalloc ->
kmem_alloc deadlock.

kreserve/krelease was being used improperly in my original work, causing it
to be unable to guarentee the reserve and resulting in an occassional panic.
This commit converts the improper usage back to using the non-k version of
the API and (should) properly handle the zalloc() recursion case.

Reported-by: David Rhodus <drhodus@catpa.com>

Revision 1.17: download - view: text, markup, annotated - select for diffs
Sat Dec 27 05:13:32 2003 UTC (10 years, 11 months ago) by hsu
Branches: MAIN
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +1 -2 lines
Merge from FreeBSD:

alc         2003/12/26 13:54:45 PST

  FreeBSD src repository

  Modified files:
    sys/vm               vm_map.c
  Log:
  Minor correction to revision 1.258: Use the proc pointer that is passed to
  vm_map_growstack() in the RLIMIT_VMEM check rather than curthread.

  Revision  Changes    Path
  1.324     +1 -2      src/sys/vm/vm_map.c

Revision 1.16: download - view: text, markup, annotated - select for diffs
Sat Nov 29 18:56:22 2003 UTC (11 years ago) by drhodus
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +11 -0 lines

*	Prevent leakage of wired pages by setting start_entry
	during vm_map_wire().

Revision 1.15: download - view: text, markup, annotated - select for diffs
Fri Nov 21 05:29:08 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +2 -0 lines
Implement an upcall mechanism to support userland LWKT.  This mechanism will
allow multiple processes sharing the same VM space (aka clone/threading)
to send each other what are basically IPIs.

Two new system calls have been added, upc_register() and upc_control().
Documentation is forthcoming.  The upcalls are nicely abstracted and a
program can register as many as it wants up to the kernel limit (which
is 32 at the moment).

The upcalls will be used for passing asynch data from kernel to userland,
such as asynch syscall message replies, for thread preemption timing,
software interrupts, IPIs between virtual cpus (e.g. between the processes
that are sharing the single VM space).

Revision 1.14: download - view: text, markup, annotated - select for diffs
Sun Oct 19 00:23:30 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +0 -37 lines
Entirely remove the old kernel malloc and kmem_map code.  The slab allocator
is now mandatory.  Also remove the related conf options, USE_KMEM_MAP and
NO_SLAB_ALLOCATOR.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Thu Oct 2 21:00:20 2003 UTC (11 years, 2 months ago) by hmp
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +3 -3 lines
Rename:

	- vm_map_pageable() -> vm_map_wire()
	- vm_map_user_pageable() -> vm_map_unwire()

Revision 1.12: download - view: text, markup, annotated - select for diffs
Fri Sep 26 19:23:34 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +9 -9 lines
Remove the NO_KMEM_MAP and USE_SLAB_ALLOCATOR kernel options.  Temporarily
add the USE_KMEM_MAP and NO_SLAB_ALLOCATOR kernel options, which developers
should generally not use.

We now use the slab allocator (and no kmem_map) by default.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Wed Aug 27 01:43:08 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +333 -163 lines
SLAB ALLOCATOR Stage 1.  This brings in a slab allocator written from scratch
by your's truely.  A detailed explanation of the allocator is included but
first, other changes:

* Instead of having vm_map_entry_insert*() and friends allocate the
  vm_map_entry structures a new mechanism has been emplaced where by
  the vm_map_entry structures are reserved at a higher level, then
  expected to exist in the free pool in deep vm_map code.  This preliminary
  implementation may eventually turn into something more sophisticated that
  includes things like pmap entries and so forth.  The idea is to convert
  what should be low level routines (VM object and map manipulation)
  back into low level routines.

* vm_map_entry structure are now per-cpu cached, which is integrated into
  the the reservation model above.

* The zalloc 'kmapentzone' has been removed.  We now only have 'mapentzone'.

* There were race conditions between vm_map_findspace() and actually
  entering the map_entry with vm_map_insert().  These have been closed
  through the vm_map_entry reservation model described above.

* Two new kernel config options now work.  NO_KMEM_MAP has been fleshed out
  a bit more and a number of deadlocks related to having only the kernel_map
  now have been fixed.  The USE_SLAB_ALLOCATOR option will cause the kernel
  to compile-in the slab allocator instead of the original malloc allocator.
  If you specify USE_SLAB_ALLOCATOR you must also specify NO_KMEM_MAP.

* vm_poff_t and vm_paddr_t integer types have been added.  These are meant
  to represent physical addresses and offsets (physical memory might be
  larger then virtual memory, for example Intel PAE).  They are not heavily
  used yet but the intention is to separate physical representation from
  virtual representation.

			    SLAB ALLOCATOR FEATURES

The slab allocator breaks allocations up into approximately 80 zones based
on their size.  Each zone has a chunk size (alignment).  For example, all
allocations in the 1-8 byte range will allocate in chunks of 8 bytes.  Each
size zone is backed by one or more blocks of memory.  The size of these
blocks is fixed at ZoneSize, which is calculated at boot time to be between
32K and 128K.  The use of a fixed block size allows us to locate the zone
header given a memory pointer with a simple masking operation.

The slab allocator operates on a per-cpu basis.  The cpu that allocates a
zone block owns it.  free() checks the cpu that owns the zone holding the
memory pointer being freed and forwards the request to the appropriate cpu
through an asynchronous IPI.  This request is not currently optimized but it
can theoretically be heavily optimized ('queued') to the point where the
overhead becomes inconsequential.  As of this commit the malloc_type
information is not MP safe, but the core slab allocation and deallocation
algorithms, non-inclusive the having to allocate the backing block,
*ARE* MP safe.  The core code requires no mutexes or locks, only a critical
section.

Each zone contains N allocations of a fixed chunk size.  For example, a
128K zone can hold approximately 16000 or so 8 byte allocations.  The zone
is initially zero'd and new allocations are simply allocated linearly out
of the zone.  When a chunk is freed it is entered into a linked list and
the next allocation request will reuse it.  The slab allocator heavily
optimizes M_ZERO operations at both the page level and the chunk level.

The slab allocator maintains various undocumented malloc quirks such as
ensuring that small power-of-2 allocations are aligned to their size,
and malloc(0) requests are also allowed and return a non-NULL result.
kern_tty.c depends heavily on the power-of-2 alignment feature and ahc
depends on the malloc(0) feature.  Eventually we may remove the malloc(0)
feature.

			    PROBLEMS AS OF THIS COMMIT

NOTE!  This commit may destabilize the kernel a bit.  There are issues
with the ISA DMA area ('bounce' buffer allocation) due to the large backing
block size used by the slab allocator and there are probably some deadlock
issues do to the removal of kmem_map that have not yet been resolved.

Revision 1.10: download - view: text, markup, annotated - select for diffs
Mon Aug 25 19:50:33 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +18 -2 lines
Add the NO_KMEM_MAP kernel configuration option.  This is a temporary option
that will allow developers to test kmem_map removal and also the upcoming
(not this commit) slab allocator.  Currently this option removes kmem_map
and causes the malloc and zalloc subsystems to use kernel_map exclusively.

Change gd_intr_nesting_level.  This variable is now only bumped while we
are in a FAST interrupt or processing an IPIQ message.  This variable is
not bumped while we are in a normal interrupt or software interrupt thread.

Add warning printf()s if malloc() and related functions detect attempts to
use them from within a FAST interrupt or IPIQ.

Remove references to the no-longer-used zalloci() and zfreei() functions.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Mon Aug 25 17:01:13 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +40 -10 lines
Add an alignment feature to vm_map_findspace().  This feature will be used
primarily by the upcoming slab allocator but has many applications.

Use the alignment feature in the buffer cache to hopefully reduce
fragmentation.

Revision 1.8: download - view: text, markup, annotated - select for diffs
Wed Aug 20 08:03:01 2003 UTC (11 years, 4 months ago) by rob
Branches: MAIN
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +10 -10 lines
__P()!=wanted, clean up the vm subsystem

Revision 1.7: download - view: text, markup, annotated - select for diffs
Wed Jul 23 07:14:19 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +8 -1 lines
2003-07-22 Hiten Pandya <hmp@nxad.com>

        * MFC FreeBSD rev. 1.189 of kern_exit.c (DONE)
          (shmexit to take vmspace instead of proc)
          (sort the sys/lock.h include in vm_map.c too)

        * MFC FreeBSD rev. 1.143 of kern_sysctl.c (DONE)
          (don't panic if sysctl is unregistrable)

        * Don't panic when enumerating SYSCTL_NODE()
          without children nodes. (DONE)

        * MFC FreeBSD rev. 1.113 of kern_sysctl.c (DONE)
          (Fix ogetkerninfo() handling  for KINFO_BSD_SYSINFO)

        * MFC FreeBSD rev. 1.103 of kern_sysctl.c (DONE)
          (Never reuse AUTO_OID values)

        * MFC FreeBSD rev 1.21 of i386/include/bus_dma.h
          (BUS_DMAMEM_NOSYNC -> BUS_DMA_COHERENT)

        * MFC FreeBSD rev. 1.19 of i386/include/bus_dma.h (DONE)
          (Implement real read/write barriers for i386)

Submitted by: Hiten Pandya <hmp@FreeBSD.ORG>

Revision 1.6: download - view: text, markup, annotated - select for diffs
Sat Jul 19 21:14:53 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +3 -3 lines
Remove the priority part of the priority|flags argument to tsleep().  Only
flags are passed now.  The priority was a user scheduler thingy that is not
used by the LWKT subsystem.  For process statistics assume sleeps without
P_SINTR set to be disk-waits, and sleeps with it set to be normal sleeps.

This commit should not contain any operational changes.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Sun Jul 6 21:23:56 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +22 -20 lines
MP Implementation 1/2: Get the APIC code working again, sweetly integrate the
MP lock into the LWKT scheduler, replace the old simplelock code with
tokens or spin locks as appropriate.  In particular, the vnode interlock
(and most other interlocks) are now tokens.  Also clean up a few curproc/cred
sequences that are no longer needed.

The APs are left in degenerate state with non IPI interrupts disabled as
additional LWKT work must be done before we can really make use of them,
and FAST interrupts are not managed by the MP lock yet.  The main thing
for this stage was to get the system working with an APIC again.

buildworld tested on UP and 2xCPU/MP (Dell 2550)

Revision 1.4: download - view: text, markup, annotated - select for diffs
Thu Jul 3 17:24:04 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +9 -9 lines
Split the struct vmmeter cnt structure into a global vmstats structure and
a per-cpu cnt structure.  Adjust the sysctls to accumulate statistics
over all cpus.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Wed Jun 25 03:56:12 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +3 -3 lines
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread
pointers instead of process pointers as arguments, similar to what FreeBSD-5
did.  Note however that ultimately both APIs are going to be message-passing
which means the current thread context will not be useable for creds and
descriptor access.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Jun 17 04:29:00 2003 UTC (11 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +1 -0 lines
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jun 17 02:55:55 2003 UTC (11 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: FREEBSD_4_FORK
import from FreeBSD RELENG_4 1.187.2.19

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options