DragonFly BSD

CVS log for src/sys/vm/vm_fault.c

[BACK] Up to [DragonFly] / src / sys / vm

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.47: download - view: text, markup, annotated - select for diffs
Tue Jul 1 02:02:56 2008 UTC (6 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: HEAD, DragonFly_RELEASE_2_0_Slip, DragonFly_RELEASE_2_0, DragonFly_Preview
Diff to: previous 1.46: preferred, unified
Changes since revision 1.46: +1 -1 lines
Fix numerous pageout daemon -> buffer cache deadlocks in the main system.
These issues usually only occur on systems with small amounts of ram
but it is possible to trigger them on any system.

* Get rid of the IO_NOBWILL hack.  Just have the VN device use IO_DIRECT,
  which will clean out the buffer on completion of the write.

* Add a timeout argument to vm_wait().

* Add a thread->td_flags flag called TDF_SYSTHREAD.  kmalloc()'s made
  from designated threads are allowed to dip into the system reserve
  when allocating pages.  Only the pageout daemon and buf_daemon[_hw] use
  the flag.

* Add a new static procedure, recoverbufpages(), which explicitly tries to
  free buffers and their backing pages on the clean queue.

* Add a new static procedure, bio_page_alloc(), to do all the nasty work
  of allocating a page on behalf of a buffer cache buffer.

  This function will call vm_page_alloc() with VM_ALLOC_SYSTEM to allow
  it to dip into the system reserve.  If the allocation fails this
  function will call recoverbufpages() to try to recycle from VM pages
  from clean buffer cache buffers, and will then attempt to reallocate
  using VM_ALLOC_SYSTEM | VM_ALLOC_INTERRUPT to allow it to dip into
  the interrupt reserve as well.

  Warnings will blare on the console.  If the effort still fails we
  sleep for 1/20 of a second and retry.  The idea though is for all
  the effort above to not result in a failure at the end.

Reported-by: Gergo Szakal <bastyaelvtars@gmail.com>

Revision 1.46: download - view: text, markup, annotated - select for diffs
Fri May 9 07:24:48 2008 UTC (6 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.45: preferred, unified
Changes since revision 1.45: +40 -28 lines
Fix many bugs and issues in the VM system, particularly related to
heavy paging.

* (cleanup) PG_WRITEABLE is now set by the low level pmap code and not by
  high level code.  It means 'This page may contain a managed page table
  mapping which is writeable', meaning that hardware can dirty the page
  at any time.  The page must be tested via appropriate pmap calls before
  being disposed of.

* (cleanup) PG_MAPPED is now handled by the low level pmap code and only
  applies to managed mappings.  There is still a bit of cruft left over
  related to the pmap code's page table pages but the high level code is now
  clean.

* (bug) Various XIO, SFBUF, and MSFBUF routines which bypass normal paging
  operations were not properly dirtying pages when the caller intended
  to write to them.

* (bug) vfs_busy_pages in kern/vfs_bio.c had a busy race.  Separate the code
  out to ensure that we have marked all the pages as undergoing IO before we
  call vm_page_protect().  vm_page_protect(... VM_PROT_NONE) can block
  under very heavy paging conditions and if the pages haven't been marked
  for IO that could blow up the code.

* (optimization) Make a minor optimization.  When busying pages for write
  IO, downgrade the page table mappings to read-only instead of removing
  them entirely.

* (bug) In platform/pc32/i386/pmap.c fix various places where
  pmap_inval_add() was being called at the wrong point.  Only one was
  critical, in pmap_enter(), where pmap_inval_add() was being called so far
  away from the pmap entry being modified that it could wind up being flushed
  out prior to the modification, breaking the cpusync required.

  pmap.c also contains most of the work involved in the PG_MAPPED and
  PG_WRITEABLE changes.

* (bug) Close numerous pte updating races with hardware setting the
  modified bit.  There is still one race left (in pmap_enter()).

* (bug) Disable pmap_copy() entirely.   Fix most of the bugs anyway, but
  there is still one left in the handling of the srcmpte variable.

* (cleanup) Change vm_page_dirty() from an inline to a real procedure, and
  move the code which set the object to writeable/maybedirty into
  vm_page_dirty().

* (bug) Calls to vm_page_protect(... VM_PROT_NONE) can block.  Fix all cases
  where this call was made with a non-busied page.  All such calls are
  now made with a busied page, preventing blocking races from re-dirtying
  or remapping the page unexpectedly.

  (Such blockages could only occur during heavy paging activity where the
  underlying page table pages are being actively recycled).

* (bug) Fix the pageout code to properly mark pages as undergoing I/O before
  changing their protection bits.

* (bug) Busy pages undergoing zeroing or partial zeroing in the vnode pager
  (vm/vnode_pager.c) to avoid unexpected effects.

Revision 1.44.2.1: download - view: text, markup, annotated - select for diffs
Wed Apr 16 18:05:09 2008 UTC (6 years, 7 months ago) by dillon
Branches: DragonFly_RELEASE_1_12
CVS tags: DragonFly_RELEASE_1_12_Slip
Diff to: previous 1.44: preferred, unified; next MAIN 1.45: preferred, unified
Changes since revision 1.44: +2 -0 lines
MFC - Fix a bug in umtx_sleep().

Revision 1.45: download - view: text, markup, annotated - select for diffs
Mon Apr 14 20:00:29 2008 UTC (6 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.44: preferred, unified
Changes since revision 1.44: +2 -0 lines
Fix a bug in umtx_sleep().  This function sleeps on the mutex's physical
address and will get lost if the physical page underlying the VM address is
copied on write.  This case can occur when a threaded program fork()'s.

Introduce a VM page event notification mechanism and use it to wake-up
the umtx_sleep() if the underlying page takes a COW fault.

Reported-by: Jordan Gordeev <jgordeev@dir.bg>,
	     "Simon 'corecode' Schubert" <corecode@xxxxxxxxxxxx>

Revision 1.44: download - view: text, markup, annotated - select for diffs
Tue Aug 28 01:09:07 2007 UTC (7 years, 2 months ago) by dillon
Branches: MAIN
Branch point for: DragonFly_RELEASE_1_12
Diff to: previous 1.43: preferred, unified
Changes since revision 1.43: +6 -2 lines
Fix a bug in vnode_pager_generic_getpages().  This function was improperly
setting m->valid to 0 and was also improperly trying to free the page after
it had potentially become wired by the buffer cache.

Add a sysctl to UFS that allows us to force it to call vop_stdgetpages()
for debugging purposes.

Revision 1.43: download - view: text, markup, annotated - select for diffs
Fri Jun 29 21:54:15 2007 UTC (7 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_10_Slip, DragonFly_RELEASE_1_10
Diff to: previous 1.42: preferred, unified
Changes since revision 1.42: +2 -1 lines
Implement struct lwp->lwp_vmspace.  Leave p_vmspace intact.  This allows
vkernels to run threaded and to run emulated VM spaces on a per-thread basis.
struct proc->p_vmspace is left intact, making it easy to switch into and out
of an emulated VM space.  This is needed for the virtual kernel SMP work.

This also gives us the flexibility to run emulated VM spaces in their own
threads, or in a limited number of separate threads.  Linux does this and
they say it improved performance.  I don't think it necessarily improved
performance but its nice to have the flexibility to do it in the future.

Revision 1.42: download - view: text, markup, annotated - select for diffs
Thu Jun 7 23:00:39 2007 UTC (7 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.41: preferred, unified
Changes since revision 1.41: +140 -3 lines
Implement vm_fault_object_page().  This function returns a held VM page
for the specified offset in the specified object and does all I/O necessary
to validate the page (as if it had been faulted in).

This function allows us to bypass the vm_map*() code when all we want is
the VM page.

Revision 1.41: download - view: text, markup, annotated - select for diffs
Fri Jan 12 22:12:53 2007 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +2 -1 lines
Fix the recently committed (and described) page writability nerf.  The real
kernel was unconditionally mapping writable pages read-only on read faults
in order to be able to take another fault on a write attempt.  This was needed
for early virtual kernel support in order to set the Modify bit in the
virtualized page table, but was being applied to ALL mappings rather then
just those installed by the virtual kernel.

Now the real kernel only does this for virtual kernel mappings.  Additionally,
the real kernel no longer makes the page read-only when clearing the Modify
bit in the real page table (in order to rearm the write fault).  When this
case occurs VPTE_M has already been set in the virtual page table and no
re-fault is required.

The virtual kernel now only needs to invalidate the real kernel's page
mapping when clearing the virtualized Modify bit in the virtual page table
(VPTE_M), in order to rearm the real kernel's write fault so it can detect
future modifications via the virtualized Modify bit.  Also, the virtual kernel
no longer needs to install read-only pages to detect the write fault.  This
allows the real kernel to do ALL the work required to handle VPTE_M and
make the actual page writable.  This greatly reduces the number of real
page faults that occur and greatly reduces the number of page faults which
have to be passed through to the virtual kernel.

This fix reduces fork() overhead for processes running under a virtual
kernel by 70%, from around 2100uS to around 650uS.

Revision 1.40: download - view: text, markup, annotated - select for diffs
Thu Jan 11 20:53:42 2007 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +0 -15 lines
Replace remaining uses of vm_fault_quick() with vm_fault_page_quick().
Do not directly access userland virtual addresses in the kernel UMTX code.

Revision 1.39: download - view: text, markup, annotated - select for diffs
Thu Jan 11 10:15:21 2007 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +12 -12 lines
Fix a bug vm_fault_page().  PG_MAPPED was not getting set, causing the
system to fail to remove pmap entries related to a VM page when reusing
the VM page.

General cleaning of vm_fault*() routines.  These routines now expect all
appropriate VM_PROT_* flags to be specified instead of just one.  Also
clean up the VM_FAULT_* flags.

Remove VM_FAULT_HOLD - it is no longer used.  vm_fault_page() handles the
functionality in a far cleaner fashion then vm_fault().

Revision 1.38: download - view: text, markup, annotated - select for diffs
Mon Jan 8 23:41:31 2007 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +7 -0 lines
Add a missing pmap_enter() in vm_fault_page().  If a write fault does a COW
and must replace a read-only page, the pmap must be updated so the process
sees the new page.

Revision 1.37: download - view: text, markup, annotated - select for diffs
Mon Jan 8 19:41:01 2007 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +26 -3 lines
Implement vm_fault_page_quick(), which will soon be replacing
vm_fault_quick().  vm_fault_quick() does not hold the underlying page
in any way and is not SMP friendly.  It also uses architecture-specific
tricks to force the page into a pmap which do not work with the VKERNEL.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Mon Jan 8 03:33:43 2007 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +81 -11 lines
Modify the trapframe sigcontext, ucontext, etc.  Add %gs to the trapframe
and xflags and an expanded floating point save area to sigcontext/ucontext
so traps can be fully specified.

Remove all the %gs hacks in the system code and signal trampoline and handle
%gs faults natively, like we do %fs faults.

Implement writebacks to the virtual page table to set VPTE_M and VPTE_A and
add checks for VPTE_R and VPTE_W.

Consolidate the TLS save area into a MD structure that can be accessed by MI
code.

Reformulate the vmspace_ctl() system call to allow an extended context to be
passed (for TLS info and soon the FP and eventually the LDT).

Adjust the GDB patches to recognize the new location of %gs.

Properly detect non-exception returns to the virtual kernel when the virtual
kernel is running an emulated user process and receives a signal.

And misc other work on the virtual kernel.

Revision 1.35: download - view: text, markup, annotated - select for diffs
Sat Jan 6 22:35:47 2007 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +166 -0 lines
Add a new procedure, vm_fault_page(), which does all actions related to
faulting in a VM page given a vm_map and virtual address, include any
necessary I/O, but returns the held page instead of entering it into a pmap.

Use the new function in procfs_rwmem, allowing gdb to 'see' memory that
is governed by a virtual page table.

Revision 1.34: download - view: text, markup, annotated - select for diffs
Mon Jan 1 22:51:18 2007 UTC (7 years, 10 months ago) by corecode
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +3 -4 lines
1:1 Userland threading stage 2.10/4:

Separate p_stats into p_ru and lwp_ru.

proc.p_ru keeps track of all statistics directly related to a proc.  This
consists of RSS usage and nswap information and aggregate numbers for all
former lwps of this proc.

proc.p_cru is the sum of all stats of reaped children.

lwp.lwp_ru contains the stats directly related to one specific lwp, meaning
packet, scheduler switch or page fault counts, etc.  This information gets
added to lwp.lwp_proc.p_ru when the lwp exits.

Revision 1.33: download - view: text, markup, annotated - select for diffs
Thu Dec 28 21:24:02 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +1 -1 lines
Make kernel_map, buffer_map, clean_map, exec_map, and pager_map direct
structural declarations instead of pointers.  Clean up all related code,
in particular kmem_suballoc().

Remove the offset calculation for kernel_object.  kernel_object's page
indices used to be relative to the start of kernel virtual memory in order
to improve the performance of VM page scanning algorithms.  The optimization
is no longer needed now that VM objects use Red-Black trees.  Removal of
the offset simplifies a number of calculations and makes the code more
readable.

Revision 1.32: download - view: text, markup, annotated - select for diffs
Thu Dec 28 18:29:08 2006 UTC (7 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +1 -2 lines
Introduce globals: KvaStart, KvaEnd, and KvaSize.  Used by the kernel
instead of the nutty VADDR and VM_*_KERNEL_ADDRESS macros.  Move extern
declarations for these variables as well as for virtual_start, virtual_end,
and phys_avail[] from MD headers to MI headers.

Make kernel_object a global structure instead of a pointer.

Remove kmem_object and all related code (none of it is used any more).

Revision 1.31: download - view: text, markup, annotated - select for diffs
Sat Dec 23 00:41:31 2006 UTC (7 years, 11 months ago) by swildner
Branches: MAIN
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +4 -4 lines
Rename printf -> kprintf in sys/ and add some defines where necessary
(files which are used in userland, too).

Revision 1.30: download - view: text, markup, annotated - select for diffs
Wed Sep 13 22:25:00 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +2 -5 lines
Collapse some bits of repetitive code into their own procedures and
allocate a maximally sized default object to back MAP_VPAGETABLE
mappings, allowing us to access logical memory beyond the size of the
original mmap() call by programming the page table to point at it.

This gives us an abstraction and capability similar to a real kernel's
ability to map e.g. 2GB of physical memory into its 1GB address space.

Revision 1.29: download - view: text, markup, annotated - select for diffs
Wed Sep 13 18:12:18 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +44 -40 lines
More cleanups + fix a bug when taking a write fault on a mapping that uses
a virtual page table.  The page was not being pmap'd with the correct
permissions.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Wed Sep 13 17:10:42 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +65 -66 lines
MAP_VPAGETABLE support part 3/3.

Implement a new system call called mcontrol() which is an extension of
madvise(), adding an additional 64 bit argument.  Add two new advisories,
MADV_INVAL and MADV_SETMAP.

MADV_INVAL will invalidate the pmap for the specified virtual address
range.  You need to do this for the virtual addresses effected by changes
made in a virtual page table.

MADV_SETMAP sets the top-level page table entry for the virtual page table
governing the mapped range.  It only works for memory governed by a virtual
page table and strange things will happen if you only set the root
page table entry for part of the virtual range.

Further refine the virtual page table format.  Keep with 32 bit VPTE's for
the moment, but properly implement VPTE_PS and VPTE_V.  VPTE_PS can be
used to suport 4MB linear maps in the top level page table and it can also
be used when specifying the 'root' VPTE to disable the page table entirely
and just linear map the backing store.  VPTE_V is the 'valid' bit (before
it was inverted, now it is normal).

Revision 1.27: download - view: text, markup, annotated - select for diffs
Tue Sep 12 22:03:12 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +102 -105 lines
MAP_VPAGETABLE support part 2/3.

Implement preliminary virtual page table handling code in vm_fault.  This
code is strictly temporary so subsystem and userland interactions can be
tested, but the real code will be very similar.

Revision 1.26: download - view: text, markup, annotated - select for diffs
Tue Sep 12 18:41:32 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +370 -260 lines
MAP_VPAGETABLE support part 1/3.

Reorganize vm_fault() to get more direct access to the VM page resolved by
a VM fault.  Move vm_fault()'s core shadow object traversal and fault I/O
code to a new procedure called vm_fault_object().

Begin adding support for memory mappings which are backed by a virtualized
page table under userland control.

Revision 1.25: download - view: text, markup, annotated - select for diffs
Wed May 17 17:47:58 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +0 -6 lines
Remove the (unused) copy-on-write support for a vnode's VM object.  This
support originally existed to support the badly implemented and severely
hacked ENABLE_VFS_IOOPT I/O optimization which was removed long ago.

This also removes a bunch of cross-module pollution in UFS.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Sat May 6 23:53:34 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +7 -4 lines
Fix a null pointer indirection, the VM fault rate limiting code only
applies to processes.

Revision 1.23: download - view: text, markup, annotated - select for diffs
Fri May 5 20:15:02 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +1 -1 lines
Remove the thread pointer argument to lockmgr().  All lockmgr() ops use the
current thread.

Move the lockmgr code in BUF_KERNPROC to lockmgr_kernproc().  This code
allows the lock owner to be set to a special value so any thread can unlock
the lock and is required for B_ASYNC I/O so biodone() can release the lock.

Revision 1.22: download - view: text, markup, annotated - select for diffs
Sun Apr 23 03:08:04 2006 UTC (8 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +1 -1 lines
Remove the now unused interlock argument to the lockmgr() procedure.
This argument has been abused over the years by kernel programmers
attempting to optimize certain locking and data modification sequences,
resulting in a virtually unreadable code in some cases.  The interlock
also made porting between BSDs difficult as each BSD implemented their
interlock differently.  DragonFly has slowly removed use of the interlock
argument and we can now finally be rid of it entirely.

Revision 1.21: download - view: text, markup, annotated - select for diffs
Wed Mar 15 07:58:37 2006 UTC (8 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +53 -0 lines
Implement a VM load heuristic.  sysctl vm.vm_load will return an indication
of the load on the VM system in the range 0-1000.

Implement a page allocation rate limit in vm_fault which is based on
vm_load, and enabled via vm.vm_load_enable (default on).  As the system
becomes more and more memory bound, those processes whos page faults
require a page allocation will start to allocate pages in smaller bursts
and with greater and greater enforced delays, up to 1/10 of a second.

Implement vm.vm_load_debug (for kernels with INVARIANTS), which outputs
the burst calculations to the console when enabled.

Increase the minimum guarenteed run time without swapping from 2 to 15
seconds.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Mon Nov 14 18:50:15 2005 UTC (9 years ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_4_Slip, DragonFly_RELEASE_1_4
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +2 -1 lines
Make tsleep/wakeup() MP SAFE for kernel threads and get us closer to
making it MP SAFE for user processes.  Currently the code is operating
under the rule that access to a thread structure requires cpu locality of
reference, and access to a proc structure requires the Big Giant Lock.  The
two are not mutually exclusive so, for example, tsleep/wakeup on a proc
needs both cpu locality of reference *AND* the BGL.  This was true with the
old tsleep/wakeup and has now been documented.

The new tsleep/wakeup algorithm is quite simple in concept.  Each cpu has its
own ident based hash table and each hash slot has a cpu mask which tells
wakeup() which cpu's might have the ident.  A wakeup iterates through all
candidate cpus simply by chaining the IPI message through them until either
all candidate cpus have been serviced, or (with wakeup_one()) the requested
number of threads have been woken up.

Other changes made in this patch set:

* The sense of P_INMEM has been reversed.  It is now P_SWAPPEDOUT.  Also,
  P_SWAPPING, P_SWAPINREQ are not longer relevant and have been removed.

* The swapping code has been cleaned up and seriously revamped.  The new
  swapin code staggers swapins to give the VM system a chance to respond
  to new conditions.  Also some lwp-related fixes were made (more
  p_rtprio vs lwp_rtprio confusion).

* As mentioned above, tsleep/wakeup have been rewritten.  The process
  p_stat no longer does crazy transitions from SSLEEP to SSTOP.  There is
  now only SSLEEP and SSTOP is synthesized from P_SWAPPEDOUT for userland
  consumpion.  Additionally, tsleep() with PCATCH will NO LONGER STOP THE
  PROCESS IN THE TSLEEP CALL.  Instead, the actual stop is deferred until
  the process tries to return to userland.  This removes all remaining cases
  where a stopped process can hold a locked kernel resource.

* A P_BREAKTSLEEP flag has been added.  This flag indicates when an event
  occurs that is allowed to break a tsleep with PCATCH.  All the weird
  undocumented setrunnable() rules have been removed and replaced with a
  very simple algorithm based on this flag.

* Since the UAREA is no longer swapped, we no longer faultin() on PHOLD().
  This also incidently fixes the 'ps' command's tendancy to try to swap
  all processes back into memory.

* speedup_syncer() no longer does hackish checks on proc0's tsleep channel
  (td_wchan).

* Userland scheduler acquisition and release has now been tightened up and
  KKASSERT's have been added (one of the bugs Stefan found was related
  to an improper lwkt_schedule() that was found by one of the new assertions).
  We also have added other assertions related to expected conditions.

* A serious race in pmap_release_free_page() has been corrected.  We
  no longer couple the object generation check with a failed
  pmap_release_free_page() call.  Instead the two conditions are checked
  independantly.  We no longer loop when pmap_release_free_page() succeeds
  (it is unclear how that could ever have worked properly).

Major testing by: Stefan Krueger <skrueger@meinberlikomm.de>

Revision 1.19: download - view: text, markup, annotated - select for diffs
Mon Oct 24 20:02:09 2005 UTC (9 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +9 -0 lines
Avoid a recursive kernel fault and subsequent double fault if the VM fault
code gets a KVM map_entry with a NULL object.  Such entries exist in system
maps managed directly by the kernel, such as the buffer cache and kernel_map.
Instead, we check for the condition and panic immediately.  Programs which
access /dev/[k]mem can hit this race/failure.

Reported-by: =?ISO-8859-1?Q?Stefan_Kr=FCger?= <skrueger@meinberlikomm.de>

Revision 1.18: download - view: text, markup, annotated - select for diffs
Tue Oct 12 19:29:34 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Stable, DragonFly_RELEASE_1_2_Slip, DragonFly_RELEASE_1_2
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +18 -13 lines
Try to close an occassional VM page related panic that is believed to occur
due to the VM page queues or free lists being indirectly manipulated by
interrupts that are not protected by splvm().  Do this by replacing splvm()'s
with critical sections in a number of places.

Note: some of this work bled over into the "VFS messaging/interfacing work
stage 8/99" commit.

Revision 1.17: download - view: text, markup, annotated - select for diffs
Mon May 31 11:43:49 2004 UTC (10 years, 5 months ago) by hmp
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004, DragonFly_Snap13Sep2004, DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +0 -13 lines
Remove an unimplemented advisory function, pmap_pageable(); there is
no pmap implementation in existance that requires it implemented.

Discussed-with: 	Alan Cox <alc at freebsd.org>,
                	Matthew Dillon <dillon at backplane.com>

Revision 1.16: download - view: text, markup, annotated - select for diffs
Thu May 27 00:38:58 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +46 -59 lines
Bring in the fictitious page wiring bug fixes from FreeBSD-5.  Make additional
major changes to the APIs to clean them up (so this commit is substantially
different than what was committed to FreeBSD-5).

Obtained-from: Alan Cox <alc@cs.rice.edu> (FreeBSD-5)

Revision 1.15: download - view: text, markup, annotated - select for diffs
Thu May 20 22:42:25 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +3 -3 lines
Get rid of VM_WAIT and VM_WAITPFAULT crud, replace with calls to
vm_wait() and vm_waitpfault().  This is a non-operational change.

vm_page.c now uses the _vm_page_list_find() inline (which itself is only
in vm_page.c) for various critical path operations.

Revision 1.14: download - view: text, markup, annotated - select for diffs
Thu May 13 17:40:19 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +45 -14 lines
Close an interrupt race between vm_page_lookup() and (typically) a
vm_page_sleep_busy() check by using the correct spl protection.
An interrupt can occur inbetween the two operations and unbusy/free
the page in question, causing the busy check to fail and for the code
to fall through and then operate on a page that may have been freed
and possibly even reused.   Also note that vm_page_grab() had the same
issue between the lookup, busy check, and vm_page_busy() call.

Close an interrupt race when scanning a VM object's memq.  Interrupts
can free pages, removing them from memq, which interferes with memq scans
and can cause a page unassociated with the object to be processed as if it
were associated with the object.

Calls to vm_page_hold() and vm_page_unhold() require spl protection.

Rename the passed socket descriptor argument in sendfile() to make the
code more readable.

Fix several serious bugs in procfs_rwmem().  In particular, force it to
block if a page is busy and then retry.

Get rid of vm_pager_map_pag() and vm_pager_unmap_page(), make the functions
that used to use these routines use SFBUF's instead.

Get rid of the (userland?) 4MB page mapping feature in pmap_object_init_pt()
for now.  The code appears to not track the page directory properly and
could result in a non-zero page being freed as PG_ZERO.

This commit also includes updated code comments and some additional
non-operational code cleanups.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Mon Mar 29 17:30:23 2004 UTC (10 years, 7 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +15 -0 lines
Move vm_fault_quick() out from the machine specific location
as the function is now cpu agnostic.

Revision 1.12: download - view: text, markup, annotated - select for diffs
Tue Mar 23 22:54:32 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +7 -20 lines
ANSIfication (procedure args) cleanup.

Submitted-by: Andre Nathan <andre@digirati.com.br>

Revision 1.11: download - view: text, markup, annotated - select for diffs
Mon Mar 1 06:33:24 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +1 -1 lines
Newtoken commit.  Change the token implementation as follows:  (1) Obtaining
a token no longer enters a critical section.  (2) tokens can be held through
schedular switches and blocking conditions and are effectively released and
reacquired on resume.  Thus tokens serialize access only while the thread
is actually running.  Serialization is not broken by preemptive interrupts.
That is, interrupt threads which preempt do no release the preempted thread's
tokens.  (3) Unlike spl's, tokens will interlock w/ interrupt threads on
the same or on a different cpu.

The vnode interlock code has been rewritten and the API has changed.  The
mountlist vnode scanning code has been consolidated and all known races have
been fixed.  The vnode interlock is now a pool token.

The code that frees unreferenced vnodes whos last VM page has been freed has
been moved out of the low level vm_page_free() code and moved to the
periodic filesystem sycer code in vfs_msycn().

The SMP startup code and the IPI code has been cleaned up considerably.
Certain early token interactions on AP cpus have been moved to the BSP.

The LWKT rwlock API has been cleaned up and turned on.

Major testing by: David Rhodus

Revision 1.10: download - view: text, markup, annotated - select for diffs
Tue Jan 20 05:04:08 2004 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +1 -1 lines
Retool the M_* flags to malloc() and the VM_ALLOC_* flags to
vm_page_alloc(), and vm_page_grab() and friends.

The M_* flags now have more flexibility, with the intent that we will start
using some of it to deal with NULL pointer return problems in the codebase
(CAM is especially bad at dealing with unexpected return values).  In
particular, add M_USE_INTERRUPT_RESERVE and M_FAILSAFE, and redefine
M_NOWAIT as a combination of M_ flags instead of its own flag.

The VM_ALLOC_* macros are now flags (0x01, 0x01, 0x04) rather then states
(1, 2, 3), which allows us to create combinations that the old interface
could not handle.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Mon Nov 3 17:11:23 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +3 -2 lines
64 bit address space cleanups which are a prerequisit for future 64 bit
address space work and PAE.  Note: this is not PAE.  This patch basically
adds vm_paddr_t, which represents a 'physical address'.  Physical addresses
may be larger then virtual addresses and on IA32 we make vm_paddr_t a 64
bit quantity.

Submitted-by: Hiten Pandya <hmp@backplane.com>

Revision 1.8: download - view: text, markup, annotated - select for diffs
Thu Oct 2 21:00:20 2003 UTC (11 years, 1 month ago) by hmp
Branches: MAIN
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +1 -1 lines
Rename:

	- vm_map_pageable() -> vm_map_wire()
	- vm_map_user_pageable() -> vm_map_unwire()

Revision 1.7: download - view: text, markup, annotated - select for diffs
Wed Aug 27 01:43:08 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +1 -1 lines
SLAB ALLOCATOR Stage 1.  This brings in a slab allocator written from scratch
by your's truely.  A detailed explanation of the allocator is included but
first, other changes:

* Instead of having vm_map_entry_insert*() and friends allocate the
  vm_map_entry structures a new mechanism has been emplaced where by
  the vm_map_entry structures are reserved at a higher level, then
  expected to exist in the free pool in deep vm_map code.  This preliminary
  implementation may eventually turn into something more sophisticated that
  includes things like pmap entries and so forth.  The idea is to convert
  what should be low level routines (VM object and map manipulation)
  back into low level routines.

* vm_map_entry structure are now per-cpu cached, which is integrated into
  the the reservation model above.

* The zalloc 'kmapentzone' has been removed.  We now only have 'mapentzone'.

* There were race conditions between vm_map_findspace() and actually
  entering the map_entry with vm_map_insert().  These have been closed
  through the vm_map_entry reservation model described above.

* Two new kernel config options now work.  NO_KMEM_MAP has been fleshed out
  a bit more and a number of deadlocks related to having only the kernel_map
  now have been fixed.  The USE_SLAB_ALLOCATOR option will cause the kernel
  to compile-in the slab allocator instead of the original malloc allocator.
  If you specify USE_SLAB_ALLOCATOR you must also specify NO_KMEM_MAP.

* vm_poff_t and vm_paddr_t integer types have been added.  These are meant
  to represent physical addresses and offsets (physical memory might be
  larger then virtual memory, for example Intel PAE).  They are not heavily
  used yet but the intention is to separate physical representation from
  virtual representation.

			    SLAB ALLOCATOR FEATURES

The slab allocator breaks allocations up into approximately 80 zones based
on their size.  Each zone has a chunk size (alignment).  For example, all
allocations in the 1-8 byte range will allocate in chunks of 8 bytes.  Each
size zone is backed by one or more blocks of memory.  The size of these
blocks is fixed at ZoneSize, which is calculated at boot time to be between
32K and 128K.  The use of a fixed block size allows us to locate the zone
header given a memory pointer with a simple masking operation.

The slab allocator operates on a per-cpu basis.  The cpu that allocates a
zone block owns it.  free() checks the cpu that owns the zone holding the
memory pointer being freed and forwards the request to the appropriate cpu
through an asynchronous IPI.  This request is not currently optimized but it
can theoretically be heavily optimized ('queued') to the point where the
overhead becomes inconsequential.  As of this commit the malloc_type
information is not MP safe, but the core slab allocation and deallocation
algorithms, non-inclusive the having to allocate the backing block,
*ARE* MP safe.  The core code requires no mutexes or locks, only a critical
section.

Each zone contains N allocations of a fixed chunk size.  For example, a
128K zone can hold approximately 16000 or so 8 byte allocations.  The zone
is initially zero'd and new allocations are simply allocated linearly out
of the zone.  When a chunk is freed it is entered into a linked list and
the next allocation request will reuse it.  The slab allocator heavily
optimizes M_ZERO operations at both the page level and the chunk level.

The slab allocator maintains various undocumented malloc quirks such as
ensuring that small power-of-2 allocations are aligned to their size,
and malloc(0) requests are also allowed and return a non-NULL result.
kern_tty.c depends heavily on the power-of-2 alignment feature and ahc
depends on the malloc(0) feature.  Eventually we may remove the malloc(0)
feature.

			    PROBLEMS AS OF THIS COMMIT

NOTE!  This commit may destabilize the kernel a bit.  There are issues
with the ISA DMA area ('bounce' buffer allocation) due to the large backing
block size used by the slab allocator and there are probably some deadlock
issues do to the removal of kmem_map that have not yet been resolved.

Revision 1.6: download - view: text, markup, annotated - select for diffs
Wed Aug 20 08:03:01 2003 UTC (11 years, 3 months ago) by rob
Branches: MAIN
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +2 -2 lines
__P()!=wanted, clean up the vm subsystem

Revision 1.5: download - view: text, markup, annotated - select for diffs
Sat Jul 26 22:10:02 2003 UTC (11 years, 4 months ago) by rob
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +7 -7 lines
Register keyword removal

Approved by: Matt Dillon

Revision 1.4: download - view: text, markup, annotated - select for diffs
Thu Jul 3 17:24:04 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +9 -8 lines
Split the struct vmmeter cnt structure into a global vmstats structure and
a per-cpu cnt structure.  Adjust the sysctls to accumulate statistics
over all cpus.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Wed Jun 25 03:56:12 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +2 -2 lines
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread
pointers instead of process pointers as arguments, similar to what FreeBSD-5
did.  Note however that ultimately both APIs are going to be message-passing
which means the current thread context will not be useable for creds and
descriptor access.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Jun 17 04:29:00 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +1 -0 lines
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jun 17 02:55:55 2003 UTC (11 years, 5 months ago) by dillon
Branches: MAIN
CVS tags: FREEBSD_4_FORK
import from FreeBSD RELENG_4 1.108.2.8

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options