DragonFly BSD

CVS log for src/sys/sys/thread.h

[BACK] Up to [DragonFly] / src / sys / sys

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.97: download - view: text, markup, annotated - select for diffs
Sat Sep 20 04:31:02 2008 UTC (6 years, 1 month ago) by sephe
Branches: MAIN
CVS tags: HEAD
Diff to: previous 1.96: preferred, unified
Changes since revision 1.96: +1 -0 lines
Add TDF_NETWORK lwkt flag, so various assertion could be performed to make sure
that packets are processed in network threads (i.e. controlled enviroment)

Revision 1.96: download - view: text, markup, annotated - select for diffs
Tue Sep 9 07:21:57 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.95: preferred, unified
Changes since revision 1.95: +1 -0 lines
Add a MSGF_NORESCHED feature for lwkt thread-based message ports.  The
idea is to use it to allow certain async messages to be queued to higher
priority system threads and schedule those threads without forcing an
immediate reschedule.

The feature will be used by the new socket code to prevent cavitation
between a user process and system protocol thread when the user process
is write()ing a lot of data over the network.

Revision 1.95: download - view: text, markup, annotated - select for diffs
Tue Sep 9 04:06:20 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.94: preferred, unified
Changes since revision 1.94: +1 -2 lines
Fix issues with the scheduler that were causing unnecessary reschedules
between tightly coupled processes as well as inefficient reschedules under
heavy loads.

The basic problem is that a process entering the kernel is 'passively
released', meaning its thread priority is left at TDPRI_USER_NORM.  The
thread priority is only raised to TDPRI_KERN_USER if the thread switches
out.  This has the side effect of forcing a LWKT reschedule when any other
user process woke up from a blocked condition in the kernel, regardless of
its user priority, because it's LWKT thread was at the higher
TDPRI_KERN_USER priority.   This resulted in some significant switching
cavitation under load.

There is a twist here because we do not want to starve threads running in
the kernel acting on behalf of a very low priority user process, because
doing so can deadlock the namecache or other kernel elements that sleep with
lockmgr locks held.  In addition, the 'other' LWKT thread might be associated
with a much higher priority user process that we *DO* in fact want to give
cpu to.

The solution is elegant.  First, do not force a LWKT reschedule for the
above case.  Second, force a LWKT reschedule on every hard clock.  Remove
all the old hacks.  That's it!

The result is that the current thread is allowed to return to user
mode and run until the next hard clock even if other LWKT threads (running
on behalf of a user process) are runnable.  Pure kernel LWKT threads still
get absolute priority, of course.  When the hard clock occurs the other LWKT
threads get the cpu and at the end of that whole mess most of those
LWKT threads will be trying to return to user mode and the user scheduler
will be able to select the best one.  Doing this on a hardclock boundary
prevents cavitation from occuring at the syscall enter and return boundary.

With this change the TDF_NORESCHED and PNORESCHED flags and their associated
code hacks have also been removed, along with lwkt_checkpri_self() which
is no longer needed.

Revision 1.94: download - view: text, markup, annotated - select for diffs
Tue Jul 1 02:02:55 2008 UTC (6 years, 3 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_2_0_Slip, DragonFly_RELEASE_2_0, DragonFly_Preview
Diff to: previous 1.93: preferred, unified
Changes since revision 1.93: +1 -1 lines
Fix numerous pageout daemon -> buffer cache deadlocks in the main system.
These issues usually only occur on systems with small amounts of ram
but it is possible to trigger them on any system.

* Get rid of the IO_NOBWILL hack.  Just have the VN device use IO_DIRECT,
  which will clean out the buffer on completion of the write.

* Add a timeout argument to vm_wait().

* Add a thread->td_flags flag called TDF_SYSTHREAD.  kmalloc()'s made
  from designated threads are allowed to dip into the system reserve
  when allocating pages.  Only the pageout daemon and buf_daemon[_hw] use
  the flag.

* Add a new static procedure, recoverbufpages(), which explicitly tries to
  free buffers and their backing pages on the clean queue.

* Add a new static procedure, bio_page_alloc(), to do all the nasty work
  of allocating a page on behalf of a buffer cache buffer.

  This function will call vm_page_alloc() with VM_ALLOC_SYSTEM to allow
  it to dip into the system reserve.  If the allocation fails this
  function will call recoverbufpages() to try to recycle from VM pages
  from clean buffer cache buffers, and will then attempt to reallocate
  using VM_ALLOC_SYSTEM | VM_ALLOC_INTERRUPT to allow it to dip into
  the interrupt reserve as well.

  Warnings will blare on the console.  If the effort still fails we
  sleep for 1/20 of a second and retry.  The idea though is for all
  the effort above to not result in a failure at the end.

Reported-by: Gergo Szakal <bastyaelvtars@gmail.com>

Revision 1.93: download - view: text, markup, annotated - select for diffs
Mon May 26 17:11:09 2008 UTC (6 years, 4 months ago) by nth
Branches: MAIN
Diff to: previous 1.92: preferred, unified
Changes since revision 1.92: +3 -11 lines
Allocate lwkt threads from objcache instead of custom per-cpu cache backed
by zone.

Reviewed-by: dillon@

Revision 1.90.2.1: download - view: text, markup, annotated - select for diffs
Fri May 9 15:38:32 2008 UTC (6 years, 5 months ago) by dillon
Branches: DragonFly_RELEASE_1_12
Diff to: previous 1.90: preferred, unified; next MAIN 1.91: preferred, unified
Changes since revision 1.90: +1 -0 lines
MFC - Fix a nasty memory corruption issue related to the kernel's use
of the FP registers for large copies.

Revision 1.92: download - view: text, markup, annotated - select for diffs
Fri May 9 06:35:10 2008 UTC (6 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.91: preferred, unified
Changes since revision 1.91: +1 -0 lines
Fix a nasty memory corruption issue which can occur due to the kernel bcopy's
use of the FP unit.  If the destination address faults the NPX code can
lose track of the fact that the kernel was using the FP unit.  When the
fault is resolved the kernel bcopy resumes with corrupted FP registers.

The most common situation where this could occur is with pipes, and generally
only when the system is paging heavily and causing multiple processes to
fault in the kernel FP bcopy code.

Revision 1.91: download - view: text, markup, annotated - select for diffs
Sat Mar 1 06:21:26 2008 UTC (6 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.90: preferred, unified
Changes since revision 1.90: +14 -16 lines
Clean up the token code and implement lwkt_token_is_stale().  Users of
the token code are now able to detect if the token was acquired and released
by someone else while they were blocked.

Submitted-by: Michael Neumann <mneumann@ntecs.de>

Revision 1.90: download - view: text, markup, annotated - select for diffs
Wed Dec 12 23:49:24 2007 UTC (6 years, 10 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_12_Slip
Branch point for: DragonFly_RELEASE_1_12
Diff to: previous 1.89: preferred, unified
Changes since revision 1.89: +1 -0 lines
Save and restore the FP context in the signal stack frame.

Revision 1.89: download - view: text, markup, annotated - select for diffs
Sun Nov 18 09:53:19 2007 UTC (6 years, 11 months ago) by sephe
Branches: MAIN
Diff to: previous 1.88: preferred, unified
Changes since revision 1.88: +1 -0 lines
Add a new light weight function to synchronize IPI queues on other CPUs by
broadcasting a NOP IPI to other CPUs; this is used be make sure that all
IPIs before the NOP one are processed.

Use this new function to fix a possible race between kfree() and
malloc_uninit():
kfree() may be in transitting state when malloc_uninit() is running.

Ideas-from: dillon@
Reviewed-by: dillon@

Revision 1.88: download - view: text, markup, annotated - select for diffs
Wed Apr 25 11:45:28 2007 UTC (7 years, 5 months ago) by swildner
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_10_Slip, DragonFly_RELEASE_1_10
Diff to: previous 1.87: preferred, unified
Changes since revision 1.87: +45 -50 lines
style(9) cleanup: Remove parameter names from prototypes.

Submitted-by: Hasso Tepper <hasso@estpak.ee>

Revision 1.87: download - view: text, markup, annotated - select for diffs
Mon Jan 22 19:37:05 2007 UTC (7 years, 9 months ago) by corecode
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8
Diff to: previous 1.86: preferred, unified
Changes since revision 1.86: +1 -1 lines
Pass structs by reference if you expect the callee to modify them.

This fixes kernel boot with gcc41.  The gpfault people were seeing comes from
vm86_bioscall() in init386().  The cause is that the assembler code passes the
struct vm86frame by value, i.e. simply creating it on the stack.  This worked
up to gcc34, but gcc41 now optimizes stores to unused memory locations away,
whis is allowed per the standards.  This led to an uninitialized stack frame
which in turn panicked the box.

Oooohh...-please-commit-by: dillon@

Revision 1.86: download - view: text, markup, annotated - select for diffs
Sun Jun 4 21:09:50 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.85: preferred, unified
Changes since revision 1.85: +0 -39 lines
Remove LWKT reader-writer locks (kern/lwkt_rwlock.c).  Remove lwkt_wait
queues (only RW locks used them).  Convert remaining uses of RW locks to
LOCKMGR locks.

In recent months lockmgr locks have been simplified to the point where we
no longer need a lighter-weight fully blocking lock.  The removal also
simplifies lwkt_schedule() in that it no longer needs a special case to
deal with wait lists.

Revision 1.85: download - view: text, markup, annotated - select for diffs
Thu Jun 1 05:38:46 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.84: preferred, unified
Changes since revision 1.84: +1 -0 lines
gd_tdallq is not protected by the BGL any more, it can only be manipulated
on the current cpu.  Remove the thread when it exits rather then when it is
freed.

Revision 1.84: download - view: text, markup, annotated - select for diffs
Mon May 29 22:57:24 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.83: preferred, unified
Changes since revision 1.83: +1 -0 lines
Fix numerous bugs in the BSD4 scheduler introduced in recent commits.
Primarily, do not try to get a spinlock from a hard interrupt (e.g. IPI)
if spinlocks are already being held by the cpu.

This will probably have to be made an absolute rule - no spinlocks at all
in a hard interrupt / IPI (vs an interrupt thread).

Revision 1.83: download - view: text, markup, annotated - select for diffs
Mon May 29 07:29:15 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.82: preferred, unified
Changes since revision 1.82: +1 -0 lines
Add two KTR (kernel trace) options: KTR_GIANT_CONTENTION and
KTR_SPIN_CONTENTION.  These will cause MP lock contention and spin lock
contention to be KTR-logged.

Revision 1.82: download - view: text, markup, annotated - select for diffs
Mon May 29 03:57:21 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.81: preferred, unified
Changes since revision 1.81: +2 -1 lines
Further isolate the user process scheduler data by moving more variables
from the globaldata structure to the scheduler module(s).

Make the user process scheduler MP safe.  Make the LWKT 'pull thread'
(to a different cpu) feature MP safe.  Streamline the user process
scheduler API.

Do a near complete rewrite of the BSD4 scheduler.  Remote reschedules
(reschedules to other cpus), cpu pickup of queued processes, and locality
of reference handling should make the new BSD4 scheduler a lot more
responsive.

Add a demonstration user process scheduler called 'dummy'
(kern/usched_dummy.c).  Add a kenv variable 'kern.user_scheduler' that
can be set to the desired scheduler on boot (i.e. 'bsd4' or 'dummy').

NOTE: Until more of the system is taken out from under the MP lock,
these changes actually slow things down slightly.  Buildworlds are
about ~2.7% slower.

Revision 1.81: download - view: text, markup, annotated - select for diffs
Sun May 21 20:23:27 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.80: preferred, unified
Changes since revision 1.80: +1 -1 lines
Implement a much faster spinlock.

* Spinlocks can't conflict with FAST interrupts without deadlocking anyway,
  so instead of using a critical section simply do not allow an interrupt
  thread to preempt the current thread if it is holding a spinlock.  This
  cuts spinlock overhead in half.

* Implement shared spinlocks in addition to exclusive spinlocks.  Shared
  spinlocks would be used, e.g. for file descriptor table lookups.

* Cache a shared spinlock by using the spinlock's lock field as a bitfield,
  one for each cpu (bit 31 for exclusive locks).  A shared spinlock sets
  its cpu's shared bit and does not bother clearing it on unlock.

  This means that multiple, parallel shared spinlock accessors do NOT incur
  a cache conflict on the spinlock.  ALL parallel shared accessors operate
  at full speed (~10ns vs ~40-100ns in overhead).  90% of the 10ns in
  overhead is due to a necessary MFENCE to interlock against exclusive
  spinlocks on the mutex.  However, this MFENCE only has to play with
  pending cpu-local memory writes so it will always run at near full speed.

* Exclusive spinlocks in the face of previously cached shared spinlocks
  are now slightly more expensive because they have to clear the cached
  shared spinlock bits by checking the globaldata structure for each
  conflicting cpu to see if it is still holding a shared spinlock.  However,
  only the initial (unavoidable) atomic swap involves potential cache
  conflicts.  The shared bit checks involve only memory reads and the
  situation should be self-correcting from a performance standpoint since
  the shared bits then get cleared.

* Add sysctl's for basic spinlock performance testing.  Setting
  debug.spin_lock_test issues a test.  Tests #2 and #3 loop
  debug.spin_test_count times.  p.s. these tests will stall the whole
   machine.

	1       Test the indefinite wait code
	2       Time the best-case exclusive lock overhead
	3       Time the best-case shared lock overhead

* TODO: A shared->exclusive spinlock upgrade inline with positive feedback,
  and an exclusive->shared spinlock downgrade inline.

Revision 1.80: download - view: text, markup, annotated - select for diffs
Sat May 20 02:42:13 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.79: preferred, unified
Changes since revision 1.79: +3 -0 lines
I'm growing tired of having to add #include lines for header files that
the include file(s) I really want depend on.

Go through nearly all major system include files and add appropriately
#ifndef'd #include lines to include all dependant header files.  Kernel
source files now only need to #include the header files they directly
depend on.

So, for example, if I wanted to add a SYSCTL to a kernel source file,
I would only have to #include <sys/sysctl.h> to bring in the support for
it, rather then four or five header files in addition to <sys/sysctl.h>.

Revision 1.79: download - view: text, markup, annotated - select for diffs
Fri May 19 18:26:29 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.78: preferred, unified
Changes since revision 1.78: +20 -3 lines
Recent lwkt_token work broke UP builds.  Fix the token code to operate
properly for both UP and SMP builds.  The SMP build uses spinlocks to
control access and also to do the preemption check.  The tokens are
explicitly obtained when a thread is switched in and released when a
thread is (non-preemptively) switched out.  Spinlocks cannot be
used for this purpose on UP because they are coded to a degenerate
case on a UP build.

On a UP build an explicit preemption check is needed, but no spinlock or
per-thread counter is required because the definition of a token is that
it is only 'held' while a thread is actually running or preempted.  So,
by definition, a token can always be obtained and held by a thread on UP
EXCEPT in the case where a preempting thread is trying to obtain a token
held by the preempted thread.

Conditionalize elements in the lwkt_token structure definition to guarentee
that SMP fields cannot be used in UP builds or vise-versa.  The lwkt_token
structure is made the same size for both builds.  Also remove some of
the degenerate spinlock functions (spin_trylock() and spin_tryunlock())
for UP builds to force a compile-time error if an attempt is made to use
them.  spin_lock*() and spin_unlock*() are retained as degenerate cases
on UP.

Reported-by: Sascha Wildner <saw@online.de>, walt <wa1ter@myrealbox.com>

Revision 1.78: download - view: text, markup, annotated - select for diffs
Thu May 18 16:25:20 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.77: preferred, unified
Changes since revision 1.77: +12 -32 lines
Replace the LWKT token code's passive management of token ownership with
active management based on Jeff's spin locks (which themselves are an
adaptation of Sun spinlocks, I tihnk).

LWKT tokens still have the same behavior.  That is, even though tokens now
use a spinlock internally, they are still active only while the thread
is running (or preempted).  When a thread non-preemptively switches away
all held tokens are released as before and when a thread
switches back in all held tokens are reacquired.

Use spinlocks instead of tokens to manage access to LWKT RW lock structures.
Use spinlocks instead of tokens to manage LWKT wait lists.

Tokens are designed to fill a niche between spinlocks and lockmgr locks.
Spinlocks are only to be used for short bits of low level code.  Tokens
are designed to be used when broad serialization is desired but when the
caller may be making calls to procedures which might block.  Lockmgr locks
are designed to be used when strict serialization is desired even across
blocking conditions.

It should be noted that token overhead is only slightly greater than
core spinlock overhead.  The only real difference is due to the extra
structural management required to record the token in the thread structure
so it can be released and reacquired.  The overhead of saving and restoring
tokens in a thread switch is very rarely exercised (i.e. only when the
underlying code actually blocks while holding a token).

This patch reduces buildworld -j 8 times by about 5 seconds (1400->1395
seconds on my test box), about 0.3%, but is expected to have a more
pronounced effect as further MP work is accomplished.

Revision 1.77: download - view: text, markup, annotated - select for diffs
Tue Jan 31 19:05:44 2006 UTC (8 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.76: preferred, unified
Changes since revision 1.76: +1 -0 lines
Bring in the parallel route table code and clean up ARP.  The
route table is now replicated across all cpus (ncpus, not ncpus2).
Note that cloned routes are not replicated.

This removes one of the few remaining obstacles to being able
to run the network protocol stacks without the BGL.

Primary-Design-by: Jeffrey Hsu
Work-by: Jeffrey Hsu and Matthew Dillon

Revision 1.76: download - view: text, markup, annotated - select for diffs
Fri Dec 2 22:02:20 2005 UTC (8 years, 10 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_4_Slip, DragonFly_RELEASE_1_4
Diff to: previous 1.75: preferred, unified
Changes since revision 1.75: +4 -1 lines
Fix a process exit/wait race.  The wait*() code was making a faulty test
to determine that the exiting process had completely exited and was no
longer running.  Testing the TDF_RUNNING flag is insufficient because an
exiting process may block at various points after becoming a Zombie, but
before it deschedules itself for the last time.

Add a new flag, TDF_EXITING, which is set just prior to a thread descheduling
itself for the last time.  The reaper then checks that TDF_EXITING is set
and TDF_RUNNING is clear.

Fix a second faulty test in both the exit and the thread cpu migration
code.  If a thread gets preempted, TDF_RUNNING will be temporarily cleared,
so testing TDF_RUNNING is not sufficient by itself.  We must also test
the TDF_PREEMPT_LOCK flag to be sure that it is also clear.

So the grand result is that to really be sure the zombie process has been
completely descheduled and is no longer running or will ever run again,
the TDF_EXITING, TDF_RUNNING, *and* TDF_PREEMPT_LOCK flags must be tested
and all must be clear except for TDF_EXITING.

It should be noted that TDF_RUNNING on the previously scheduled process
is always cleared AFTER we have context-switched into the next scheduled
thread or the idle thread, so seeing a cleared TDF_RUNNING along with the
appropriate state for the other flags does in fact guarentee that the thread
in question is no longer using its stack in any way.

Reported-by: Stefan Krueger <skrueger@meinberlikomm.de>

Revision 1.75: download - view: text, markup, annotated - select for diffs
Tue Nov 22 08:41:05 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.74: preferred, unified
Changes since revision 1.74: +1 -1 lines
Consolidate the initialization of td_mpcount into lwkt_init_thread().

Fix a bug in kern.trap_mpsafe, the mplock was not being properly released
when operating in vm86 mode (when kern.trap_mpsafe was set to 1).

Revision 1.74: download - view: text, markup, annotated - select for diffs
Mon Nov 21 18:49:27 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.73: preferred, unified
Changes since revision 1.73: +1 -0 lines
Add a thread flag, TDF_MPSAFE, which is used during thread creation to
determine whether the thread should initially be holding the MP lock or not.

Revision 1.73: download - view: text, markup, annotated - select for diffs
Mon Nov 14 18:50:11 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.72: preferred, unified
Changes since revision 1.72: +7 -0 lines
Make tsleep/wakeup() MP SAFE for kernel threads and get us closer to
making it MP SAFE for user processes.  Currently the code is operating
under the rule that access to a thread structure requires cpu locality of
reference, and access to a proc structure requires the Big Giant Lock.  The
two are not mutually exclusive so, for example, tsleep/wakeup on a proc
needs both cpu locality of reference *AND* the BGL.  This was true with the
old tsleep/wakeup and has now been documented.

The new tsleep/wakeup algorithm is quite simple in concept.  Each cpu has its
own ident based hash table and each hash slot has a cpu mask which tells
wakeup() which cpu's might have the ident.  A wakeup iterates through all
candidate cpus simply by chaining the IPI message through them until either
all candidate cpus have been serviced, or (with wakeup_one()) the requested
number of threads have been woken up.

Other changes made in this patch set:

* The sense of P_INMEM has been reversed.  It is now P_SWAPPEDOUT.  Also,
  P_SWAPPING, P_SWAPINREQ are not longer relevant and have been removed.

* The swapping code has been cleaned up and seriously revamped.  The new
  swapin code staggers swapins to give the VM system a chance to respond
  to new conditions.  Also some lwp-related fixes were made (more
  p_rtprio vs lwp_rtprio confusion).

* As mentioned above, tsleep/wakeup have been rewritten.  The process
  p_stat no longer does crazy transitions from SSLEEP to SSTOP.  There is
  now only SSLEEP and SSTOP is synthesized from P_SWAPPEDOUT for userland
  consumpion.  Additionally, tsleep() with PCATCH will NO LONGER STOP THE
  PROCESS IN THE TSLEEP CALL.  Instead, the actual stop is deferred until
  the process tries to return to userland.  This removes all remaining cases
  where a stopped process can hold a locked kernel resource.

* A P_BREAKTSLEEP flag has been added.  This flag indicates when an event
  occurs that is allowed to break a tsleep with PCATCH.  All the weird
  undocumented setrunnable() rules have been removed and replaced with a
  very simple algorithm based on this flag.

* Since the UAREA is no longer swapped, we no longer faultin() on PHOLD().
  This also incidently fixes the 'ps' command's tendancy to try to swap
  all processes back into memory.

* speedup_syncer() no longer does hackish checks on proc0's tsleep channel
  (td_wchan).

* Userland scheduler acquisition and release has now been tightened up and
  KKASSERT's have been added (one of the bugs Stefan found was related
  to an improper lwkt_schedule() that was found by one of the new assertions).
  We also have added other assertions related to expected conditions.

* A serious race in pmap_release_free_page() has been corrected.  We
  no longer couple the object generation check with a failed
  pmap_release_free_page() call.  Instead the two conditions are checked
  independantly.  We no longer loop when pmap_release_free_page() succeeds
  (it is unclear how that could ever have worked properly).

Major testing by: Stefan Krueger <skrueger@meinberlikomm.de>

Revision 1.72: download - view: text, markup, annotated - select for diffs
Tue Nov 8 22:40:00 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.71: preferred, unified
Changes since revision 1.71: +2 -1 lines
Turn around the spinlock code to reduce the chance of programmer error.
Remove spin_lock_crit() and spin_unlock_crit().  Instead make the primary
spinlock API, spin_lock() and spin_unlock(), enter and exit a critical
section.  Add two API functions, spin_lock_quick() and spin_unlock_quick()
which assume the caller is already in a critical section or that the spinlock
will never be used by a preempting thread (hardware interrupt or software
interrupt).

Revision 1.71: download - view: text, markup, annotated - select for diffs
Tue Oct 25 17:26:58 2005 UTC (8 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.70: preferred, unified
Changes since revision 1.70: +24 -10 lines
Remove the dummy IPI messaging routines for UP builds and properly
conditionalize the use of IPI messages in various core kernel modules.

Change the callback from func(arg, frameptr) to func(arg1, arg2, frameptr),
where the new argument (arg2) is an integer supplied by the originator.

Create wrappers for simpler versions of the callback: func(arg1), and
func(arg1, arg2) (for the moment we presume that GCC will generate code
for the full-sized three-argument callback which is compatible with one
and two-argument function pointers).

This extension to the IPI messaging code is needed to properly implement
MP-safe tsleep/wakeup code.  Although the extra argument is superfluous in
most cases, the overhead of doing an IPI is such that there should be no
noticeable impact on performance.

Revision 1.70: download - view: text, markup, annotated - select for diffs
Thu Oct 13 00:02:23 2005 UTC (9 years ago) by dillon
Branches: MAIN
Diff to: previous 1.69: preferred, unified
Changes since revision 1.69: +0 -3 lines
Major cleanup of the interrupt registration subsystem.

* Collapse the separate registrations in the kernel interrupt thread and
  i386 layers into a single machine-independant kernel interrupt thread layer
  in kern/kern_intr.c.  Get rid of the i386 layer's 'MUX' code entirely.

* Have the interrupt vector assembly code (icu_vector.s and apic_vector.s)
  call a machine-independant function in the kernel interrupt thread
  layer to figure out how to process an interrupt.

* Move a lot of assembly into the new C interrupt processing function.

* Add support for INTR_MPSAFE.  If a device driver registers an interrupt
  as being MPSAFE, the Big Giant Lock will not be obtained or required.

* Temporarily just schedule the ithread if a FAST interrupt cannot be executed
  due to its serializer being locked.

* Add LWKT serialization support for a non-blocking 'try' function.

* Get rid of ointhand2_t and adjust all old ISA code to use inthand2_t.

* Supply a frame pointer as a pointer rather then embedding it on th stack.

* Allow FAST and SLOW interrupts to be mixed on the same IRQ, though this
  will not necessarily result in optimal operation.

* Remove direct APIC/ICU vector calls from the apic/icu vector assembly code.
  Everything goes through the new routine in kern/kern_intr.c now.

* Add a new flag, INTR_NOPOLL.  Interrupts registered with the flag will
  not be polled by the upcoming emergency general interrupt polling
  sysctl (e.g. ATA cannot be safely polled due to the way ATA register
  access interferes with ATA DMA).

* Remove most of the distinction in the i386 assembly layers between FAST
  and SLOW interrupts (part 1/2).

* Revamp the interrupt name array returned to userland to list multiple
  drivers associated with the same IRQ.

Revision 1.69: download - view: text, markup, annotated - select for diffs
Tue Oct 11 09:59:56 2005 UTC (9 years ago) by corecode
Branches: MAIN
Diff to: previous 1.68: preferred, unified
Changes since revision 1.68: +1 -1 lines
1:1 Userland threading stage 2.8/4:

Switch the userland scheduler to use lwps instead of procs.

Revision 1.68: download - view: text, markup, annotated - select for diffs
Wed Oct 5 21:53:41 2005 UTC (9 years ago) by corecode
Branches: MAIN
Diff to: previous 1.67: preferred, unified
Changes since revision 1.67: +2 -0 lines
Userland 1:1 threading changes step 1/4+:

o Move thread-local members from struct proc into new struct lwp.

o Add a LIST_HEAD(lwp) p_lwps to struct proc.  This links a proc
  with its lwps.

o Add a td_lwp member to struct thread which links a thread to its lwp,
  if it exists.  This won't replace td_proc completely to save indirections.

o For now embed one struct lwp into struct proc and set up preprocessor
  linkage so that semantics don't change for the rest of the kernel.
  Once all consumers are converted to take a struct lwp instead of a struct
  proc, this will go away.

Reviewed-by: dillon, davidxu

Revision 1.67: download - view: text, markup, annotated - select for diffs
Tue Jul 26 20:53:55 2005 UTC (9 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.66: preferred, unified
Changes since revision 1.66: +7 -0 lines
Add a new kernel compile debugging option, DEBUG_CRIT_SECTIONS.  This fairly
invasive debugging option compiles matching code into the critical section
inlines and reports mismatches at run-time.   It is used to detect
missing/forgotten crit_exit() calls.

Note that because there are a number of places where critical sections are
manipulated outside the procedures that entered them, this code will
generate a number of false hits and should only be used under the direction
of experienced developers.

Note that the thread structure will be extended by this option.

Revision 1.66: download - view: text, markup, annotated - select for diffs
Wed Jul 20 20:21:31 2005 UTC (9 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.65: preferred, unified
Changes since revision 1.65: +1 -0 lines
When a cpu is stopped due to a panic or the debugger, it can be in virtually
any state, including possibly holding a critical section.   IPIQ interrupts
must still be processed while we are in this state (even though we could be
racing IPIQ processing if we were interrupted at just the wrong time).  In
particular, dumping is not likely to work if a panic occurs on a cpu != 0
unless we process the IPIQ on the stopped cpus.  There are simply too many
interactions between cpus.

Interrupt threads are LWKT scheduled entities and will generally still not
work during a panic while dumping.  The dumping code expects this.  However,
call splz() anyway.

We may in the future have to allow certain threads to run while dumping.
For example, to allow dumping over the network.  There are various ways this
can be done, such as by masking gd_runqmask or flagging special threads to
be runnable while in a paniced or dumping state.

Revision 1.65: download - view: text, markup, annotated - select for diffs
Wed Jul 20 04:33:42 2005 UTC (9 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.64: preferred, unified
Changes since revision 1.64: +1 -0 lines
Limit switch-from-interrupt warnings to once per thread to avoid an endless
loop.  Generate a DDB backtrace when it occurs.

Note from Peter's report that it is possible for the idle thread to panic
if e.g. an IPI or FAST interrupt running in the idle thread's context panics.
This can result in highly unexpected operation and needs to be addressed.

Reported-by: Peter Avalos <pavalos@theshell.com>

Revision 1.64: download - view: text, markup, annotated - select for diffs
Thu Jul 7 20:28:26 2005 UTC (9 years, 3 months ago) by hmp
Branches: MAIN
Diff to: previous 1.63: preferred, unified
Changes since revision 1.63: +6 -0 lines
Add counters for recording Token/MPlock contention, this would help in
determining the number of times contention has occured in the system.

The contention counters have been made 64-bit quantities because they
are situated within a tight-loop.

KTR tracepoints have been added for marking start and stop of a token's
contention.  New field tr_flags added to struct lwkt_tokref.  By adding
tracepoints in lwkt_chktokens(9),  it gives us interesting data on MP
machines when it indirectly sends a passive IPI to the remote CPU for
gaining ownership of a token.  It would be interesting to see KTR dumps
for a 4-CPU or an 8-CPU system.

Discussed-with: 	Matthew Dillon <dillon@apollo.backplane.com>

Revision 1.63: download - view: text, markup, annotated - select for diffs
Sun Jun 19 22:07:17 2005 UTC (9 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.62: preferred, unified
Changes since revision 1.62: +10 -0 lines
Add more magic numbers for the token code.

Revision 1.62: download - view: text, markup, annotated - select for diffs
Mon Apr 18 01:03:33 2005 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Stable
Diff to: previous 1.61: preferred, unified
Changes since revision 1.61: +0 -1 lines
staticize lwkt_reqtoken_remote().

Revision 1.61: download - view: text, markup, annotated - select for diffs
Wed Apr 13 04:00:56 2005 UTC (9 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.60: preferred, unified
Changes since revision 1.60: +2 -1 lines
Optimize lwkt_send_ipiq() - the IPI based inter-cpu messaging routine.

* Add a passive version which does not initiate any actual hardware IPI.
  The message will be handled the next time the target cpu polls the
  queue (on each tick typically).  Adjust the free() path to use this
  version when freeing memory owned by another cpu.

* Add an interlock to avoid reissuing and unnecessarily stalling on
  the hardware IPI if a prior hardware IPI to the target cpu has not
  yet completed processing.

  This feature theoretically means that two cpus can tightly couple a
  large number of pipelined messages with only a single actual IPI being
  sent.

* Reorganize the hystersis points in the IPIQ FIFOs.

* Change a token livelock warning into a panic if it occurs 10 times in
  a row.

* Add a call to lwkt_process_ipiq() just after the AP startup code enables
  a cpu, to process any messages that might have built up during startup.
  There shouldn't be any, but this may avoid surprises later.

Revision 1.60: download - view: text, markup, annotated - select for diffs
Fri Jan 14 02:20:24 2005 UTC (9 years, 9 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_2_Slip, DragonFly_RELEASE_1_2
Diff to: previous 1.59: preferred, unified
Changes since revision 1.59: +1 -2 lines
Add syscall primitives for generic userland accessible sleep/wakeup
functions.  These functions are capable of sleeping and waking up based on
a generic user VM address.  Programs capable of sharing memory are also
capable of interaction through these functions.

Also regenerate our system calls.

umtx_sleep(ptr, matchvalue, timeout)

    If *(int *)ptr (userland pointer) does not match the matchvalue,
    sleep for timeout microseconds.  Access to the contents of *ptr plus
    entering the sleep is interlocked against calls to umtx_wakeup().
    Various error codes are turned depending on what causes the function
    to return.  Note that the timeout may not exceed 1 second.

utmx_wakeup(ptr, count)

    Wakeup at least count processes waiting on the specified userland
    address.  A count of 0 wakes all waiting processes up.  This function
    interlocks against umtx_sleep().

The typical race case showing resolution between two userland processes is
shown below.  A process releasing a contested mutex may adjust the contents
of the pointer after the kernel has tested *ptr in umtx_sleep(), but this does
not matter because the first process will see that the mutex is set to a
contested state and will call wakeup after changing the contents of the
pointer.  Thus, the kernel itself does not have to execute any
compare-and-exchange operations in order to support userland mutexes.

    PROCESS 1			PROCESS 2		******** RACE#1 ******

    cmp_exg(ptr, FREE, HELD)
	.			cmp_exg(ptr, HELD, CONTESTED)
	.			umtx_sleep(ptr, CONTESTED, 0)
	.			[kernel tests *ptr]     <<<< COMPARE vs
    cmp_exg(CONTESTED, FREE)		.		<<<< CHANGE
	.			tsleep(....)
    umtx_wakeup(ptr, 1)			.
	.				.
	.				.



    PROCESS 1			PROCESS 2		******** RACE#2 ******

    cmp_exg(ptr, FREE, HELD)
				cmp_exg(ptr, HELD, CONTESTED)
				umtx_sleep(ptr, CONTESTED, 0)
    cmp_exg(CONTESTED, FREE)				<<<< CHANGE vs
    umtx_wakeup(ptr, 1)
				[kernel tests *ptr]	<<<< COMPARE
				[MISMATCH, DO NOT TSLEEP]


These functions are very loosely based on Jeff Roberson's umtx work in
FreeBSD.  These functions are greatly simplified relative to that work in
order to provide a more generic mechanism.

This is precursor work for a port of David Xu's 1:1 userland threading
library.

Revision 1.59: download - view: text, markup, annotated - select for diffs
Wed Oct 13 19:51:31 2004 UTC (10 years ago) by dillon
Branches: MAIN
Diff to: previous 1.58: preferred, unified
Changes since revision 1.58: +2 -0 lines
Avoid redefined symbol warning when libcaps uses thread.h with its own
stack specification.

Submitted-by: Eirik Nygaard <eirikn@kerneled.com>

Revision 1.58: download - view: text, markup, annotated - select for diffs
Tue Sep 14 07:41:49 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004
Diff to: previous 1.57: preferred, unified
Changes since revision 1.57: +2 -2 lines
Give the MP fields in the thread structure useful names for UP builds so
programs like 'ps' (where SMP is not defined during compilation) can pick
out the MP info.

Revision 1.57: download - view: text, markup, annotated - select for diffs
Thu Jul 29 08:55:02 2004 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap13Sep2004
Diff to: previous 1.56: preferred, unified
Changes since revision 1.56: +7 -3 lines
Add a stack-size argument to the LWKT threading code so threads can be
created with different-sized stacks.  Adjust libcaps to match.

This is a pre-requisit to adding NDIS support.  NDIS threads need larger
stacks because microsoft drivers expect larger stacks.

Revision 1.56: download - view: text, markup, annotated - select for diffs
Sat Jul 24 20:21:35 2004 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.55: preferred, unified
Changes since revision 1.55: +2 -0 lines
Update the userland scheduler.  Fix scheduler interactions which were
previously resulting in the wrong process sometimes getting a full 1/10
second slice, which under heavy load resulted in serious glitching.
Introduce a new dynamic 'p_interactive' heuristic and allow it to effect
priority +/- by a few nice levels.

With this patch batch operations such as buildworlds, setiathome should not
interfere with X / interactive operations as much as they did before.

Note that we are talking about the the userland scheduler here, not the
LWKT scheduler.  Also note that the userland scheduler needs a complete
rewrite.

Revision 1.55: download - view: text, markup, annotated - select for diffs
Sun Jun 20 22:29:10 2004 UTC (10 years, 4 months ago) by hmp
Branches: MAIN
CVS tags: DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.54: preferred, unified
Changes since revision 1.54: +4 -0 lines
Move the 'p_start' field from struct pstats (Process Statistics) into the
thread structure and call it 'td_start'.  The behavior of vm_fork(9) is
retained, i.e., it still copies the start time from the parent process just
as it did before.

The 'td_start' will later be used by pure threads to indicate their start
time.  It has not been committed in this round because use of the microtime()
function at such a early point in the boot process might be unsafe.

Note, there should be no problem in accessing the td_start field, unless
the process is a Zombie; due to the way Zombies are reaped, the thread will
be decoupled in kern_wait1() but the process will still be around for a
while it will not be possible to access the td_start field in such
scenarios.  A little note about this has been added on top of struct proc
in <sys/proc.h> for future reference.

This work was a collaboration of Hiten Pandya <hmp@backplane.com> and
Matthew Dillon <dillon@apollo.backplane.com>

Revision 1.54: download - view: text, markup, annotated - select for diffs
Thu Jun 17 01:30:27 2004 UTC (10 years, 4 months ago) by hmp
Branches: MAIN
Diff to: previous 1.53: preferred, unified
Changes since revision 1.53: +1 -1 lines
Spell 'written' properly.

Revision 1.53: download - view: text, markup, annotated - select for diffs
Thu Jun 10 22:11:36 2004 UTC (10 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.52: preferred, unified
Changes since revision 1.52: +1 -0 lines
Both 'ps' and the loadav calculations got broken by thread sleeps, which
occur without knowledge by the proc and so ps/loadav thought processes
sitting in e.g. accept() were in a 'R'un state when they were actually
sleeping.

Make ps and the loadav calculator thread-aware.

Revision 1.52: download - view: text, markup, annotated - select for diffs
Fri May 28 08:37:32 2004 UTC (10 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.51: preferred, unified
Changes since revision 1.51: +2 -0 lines
Add lwkt_setcpu_self(), a function which migrates the current thread to
the specified cpu.  This will soon be used by sysctl_kern_proc() to
collect thread information across all available cpus (because it is only
legal to manipulate a thread on the cpu it belongs to).

Yes, you heard that right and, yes, the overhead is nasty... one whole
microsecond per cpu at least, possibly even two.  But who cares for
something like 'ps'?

In-conversation-with: Hiten Pandya <hmp@freebsd.org>

Revision 1.51: download - view: text, markup, annotated - select for diffs
Sat Apr 10 20:55:24 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.50: preferred, unified
Changes since revision 1.50: +2 -2 lines
Do some minor critical path performance improvements in the scheduler
and at the user/system boundary.  Avoid some unnecessary segment prefix ops,
remove some unnecessary memory ops by using more optimal critical
section inlines, and use 32 bit arithmatic instead of 64 bit arithmatic
when calculating system tick overheads in userret().

This saves a whopping 5ns worth of syscall overhead, which just proves
how silly I am sometimes.

Revision 1.50: download - view: text, markup, annotated - select for diffs
Wed Mar 31 20:23:40 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.49: preferred, unified
Changes since revision 1.49: +1 -1 lines
Cleanup libcaps to support recent LWKT changes.  Add TDF_SYSTHREAD back
to sys/thread.h (libcaps needs it).

Revision 1.49: download - view: text, markup, annotated - select for diffs
Tue Mar 30 19:14:13 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.48: preferred, unified
Changes since revision 1.48: +1 -1 lines
Second major scheduler patch.  This corrects interactive issues that were
introduced in the pipe sf_buf patch.

Split need_resched() into need_user_resched() and need_lwkt_resched().
Userland reschedules are requested when a process is scheduled with a higher
priority then the currently running process, and LWKT reschedules are
requested when a thread is scheduled with a higher priority then the
currently running thread.  As before, these are ASTs, LWKTs are not
preemptively switch while running in the kernel.

Exclusively use the resched wanted flags to determine whether to reschedule
or call lwkt_switch() upon return to user mode.  We were previously also
testing the LWKT run queue for higher priority threads, but this was causing
inefficient scheduler interactions when two processes are doing tightly
bound synchronous IPC (e.g. using PIPEs) because in DragonFly the LWKT
priority of a thread is raised when it enters the kernel, and lowered when
it tries to return to userland.  The wakeups occuring in the pipe code
were causing extra quick-flip thread switches.

Introduce a new tsleep() flag which disables the need_lwkt_resched() call
when the sleeping thread is woken up.   This is used by the PIPE code in
the synchronous direct-write PIPE case to avoid the above problem.

Redocument and revamp the ESTCPU code.  The original changes reduced the
interrupt rate from 100Hz (FBsd-4 and FBsd-5) to 20Hz, but did not compensate
for the slower ramp-up time.  This commit introduces a 'virtual' ESTCPU
frequency which compensates without us having to bump up the actual systimer
interrupt rate.

Redo the P_CURPROC methodology, which is used by the userland scheduler
to manage processes running in userland.  Create a globaldata->gd_uschedcp
process pointer which represents the current running-in-userland (or about
to be running in userland) process, and carefully recode acquire_curproc()
to allow this gd_uschedcp designation to be stolen from other threads trying
to return to userland without having to request a reschedule (which would
have to switch back to those threads to release the designation).  This
reduces the number of unnecessary context switches that occur due to
scheduler interactions.  Also note that this specifically solves the case
where there might be several threads running in the kernel which are trying
to return to userland at the same time.  A heuristic check against gd_upri
is used to select the correct thread for schedling to userland 'most of the
time'.  When the correct thread is not selected, we fall back to the old
behavior of forcing a reschedule.

Add debugging sysctl variables to better track userland scheduler efficiency.

With these changes pipe statistics are further improved.  Though some
scheduling aberrations still exist(1), the previous scheduler had totally
broken interactive processes and this one does not.

	BLKSIZE	BEFORE		NEWPIPE		NOW	    Tests on AMD64
		MBytes/s	MBytes/s	MBytes/s	3200+ FN85MB
							    (64KB L1, 1MB L2)
	256KB	1900		2200		2250
	 64KB	1800		2200		2250
	 32KB	-		-		3300
	 16KB	1650		2500-3000	2600-3200
	  8KB	1400		2300		2000-2400(1)
	  4KB	1300		1400-1500	1500-1700

Revision 1.48: download - view: text, markup, annotated - select for diffs
Sun Mar 14 20:54:02 2004 UTC (10 years, 7 months ago) by hmp
Branches: MAIN
Diff to: previous 1.47: preferred, unified
Changes since revision 1.47: +1 -1 lines
Turn TDF_SYSTHREAD into TDF_RESERVED0100 since the flag is never used
and such a flag is not required.

Discussed with: 	Matt Dillon

Revision 1.47: download - view: text, markup, annotated - select for diffs
Mon Mar 1 06:33:19 2004 UTC (10 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.46: preferred, unified
Changes since revision 1.46: +61 -14 lines
Newtoken commit.  Change the token implementation as follows:  (1) Obtaining
a token no longer enters a critical section.  (2) tokens can be held through
schedular switches and blocking conditions and are effectively released and
reacquired on resume.  Thus tokens serialize access only while the thread
is actually running.  Serialization is not broken by preemptive interrupts.
That is, interrupt threads which preempt do no release the preempted thread's
tokens.  (3) Unlike spl's, tokens will interlock w/ interrupt threads on
the same or on a different cpu.

The vnode interlock code has been rewritten and the API has changed.  The
mountlist vnode scanning code has been consolidated and all known races have
been fixed.  The vnode interlock is now a pool token.

The code that frees unreferenced vnodes whos last VM page has been freed has
been moved out of the low level vm_page_free() code and moved to the
periodic filesystem sycer code in vfs_msycn().

The SMP startup code and the IPI code has been cleaned up considerably.
Certain early token interactions on AP cpus have been moved to the BSP.

The LWKT rwlock API has been cleaned up and turned on.

Major testing by: David Rhodus

Revision 1.46: download - view: text, markup, annotated - select for diffs
Tue Feb 17 19:38:50 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.45: preferred, unified
Changes since revision 1.45: +2 -0 lines
Introduce an MI cpu synchronization API, redo the SMP AP startup code,
and start cleaning up deprecated IPI and clock code.  Add a MMU/TLB page
table invalidation API (pmap_inval.c) which properly synchronizes page
table changes with other cpus in SMP environments.

    * removed (unused) gd_cpu_lockid
    * remove confusing invltlb() and friends, normalize use of cpu_invltlb()
      and smp_invltlb().
    * redo the SMP AP startup code to make the system work better in
      situations where all APs do not startup.
    * add memory barrier API, cpu_mb1() and cpu_mb2().
    * remove (obsolete, no longer used) old IPI hard and stat clock forwarding
      code.
    * add a cpu synchronization API which is capable of handling multiple
      simultanious requests without deadlocking or livelocking.
    * major changes to the PMAP code to use the new invalidation API.
    * remove (unused) all_procs_ipi() and self_ipi().
    * only use all_but_self_ipi() if it is known that all AP's started up,
      otherwise use a mask.
    * remove (obsolete, no longer usde) BETTER_CLOCK code
    * remove (obsolete, no longer used) Xcpucheckstate IPI code

Testing-by: David Rhodus and others

Revision 1.45: download - view: text, markup, annotated - select for diffs
Sun Feb 15 05:15:27 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.44: preferred, unified
Changes since revision 1.44: +7 -3 lines
Cleanup and augment the cpu synchronization API a bit.  Embed the maxcount
in the structure rather then returning it and requiring it to be passed
again, and document the procedures a bit more.

Revision 1.44: download - view: text, markup, annotated - select for diffs
Sun Feb 15 02:14:42 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.43: preferred, unified
Changes since revision 1.43: +23 -2 lines
Split the IPIQ messaging out of lwkt_thread.c and move it to its own file,
lwkt_ipiq.c.

Add a MI synchronous cpu rendezvous API lwkt_cpusync_*().  This API allows the
kernel to synchronize an operation across any number of cpus.  Multiple cpus
can initiate synchronization operations simultaniously without creating a
deadlock.  The API utilizes the IPI messaging core and guarentees that
other synchronization and IPI messaging operations will continue to work
during any given synchronization op.  The API is a spin-blocking API, meaning
that it will not switch threads and can be used by mainline code, interrupts,
and other sensitive code.

This API is intended to replace smp_rendezvous(), Xcpustop, and other
hardwired IPI ops.  It will also be used to fix our TLB shootdown code.

As of this commit the API has not yet been connected to anything and has
been tested only a little.

Revision 1.43: download - view: text, markup, annotated - select for diffs
Sat Feb 14 20:34:33 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.42: preferred, unified
Changes since revision 1.42: +1 -1 lines
Create a new machine type, cpumask_t, to represent a mask of cpus, and
replaces earlier uses of __uint32_t for cpu masks with cpumask_t.

Revision 1.42: download - view: text, markup, annotated - select for diffs
Thu Feb 12 20:43:57 2004 UTC (10 years, 8 months ago) by drhodus
Branches: MAIN
Diff to: previous 1.41: preferred, unified
Changes since revision 1.41: +4 -3 lines
*	Update function defines to match up with the work from
	this moring as to fix the kernel build process.

Revision 1.41: download - view: text, markup, annotated - select for diffs
Tue Feb 10 07:34:43 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +2 -2 lines
Use a globaldata_t instead of a cpuid in the lwkt_token structure.  The
LWKT subsystem already uses globaldata_t instead of cpuid for its thread
td_gd reference, and the IPI messaging code will soon be converted to take
a globaldata_t instead of a cpuid as well.  This reduces the number of
memory indirections we have to make to access the per-cpu globaldata space
in various procedures.

Revision 1.40: download - view: text, markup, annotated - select for diffs
Fri Jan 30 05:42:18 2004 UTC (10 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +15 -1 lines
This commit represents a major revamping of the clock interrupt and timebase
infrastructure in DragonFly.

* Rip out the existing 8254 timer 0 code, and also disable the use of
  Timer 2 (which means that the PC speaker will no longer go beep).  Timer 0
  used to represent a periodic interrupt and a great deal of code was in
  place to attempt to obtain a timebase off of that periodic interrupt.

  Timer 0 is now used in software retriggerable one-shot mode to produce
  variable-delay interrupts.  A new hardware interrupt clock abstraction
  called SYSTIMERS has been introduced which allows threads to register
  periodic or one-shot interrupt/IPI callbacks at approximately 1uS
  granularity.

  Timer 2 is now set in continuous periodic mode with a period of 65536
  and provides the timebase for the system, abstracted to 32 bits.

  All the old platform-integrated hardclock() and statclock() code has
  been rewritten.  The old IPI forwarding code has been #if 0'd out and
  will soon be entirely removed (the systimer abstraction takes care of
  multi-cpu registrations now).  The architecture-specific clkintr() now
  simply calls an entry point into the systimer and provides a Timer 0
  reload and Timer 2 timebase function API.

* On both UP and SMP systems, cpus register systimer interrupts for the Hz
  interrupt, the stat interrupt, and the scheduler round-robin interrupt.
  The abstraction is carefully designed to allow multiple interrupts occuring
  at the same time to be processed in a single hardware interrupt.  While
  we currently use IPI's to distribute requested interrupts from other cpu's,
  the intent is to use the abstraction to take advantage of per-cpu timers
  when available (e.g. on the LAPIC) in the future.

  systimer interrupts run OUTSIDE THE MP LOCK.  Entry points may be called
  from the hard interrupt or via an IPI message (IPI messages have always
  run outside the MP lock).

* Rip out timecounters and disable alternative timecounter code for other
  time sources.  This is temporary.  Eventually other time sources, such as
  the TSC, will be reintegrated as independant, parallel-running entities.
  There will be no 'time switching' per-say, subsystems will be able to
  select which timebase they wish to use.  It is desireable to reintegrate
  at least the TSC to improve [get]{micro,nano}[up]time() performance.

  WARNING: PPS events may not work properly.  They were not removed, but
  they have not been retested with the new code either.

* Remove spl protection around [get]{micro,nano}[up]time() calls, they are
  now internally protected.

* Use uptime instead of realtime in certain CAM timeout tests

* Remove struct clockframe.  Use struct intrframe everywhere where clockframe
  used to be used.

* Replace most splstatclock() protections with crit_*() protections, because
  such protections must now also protect against IPI messaging interrupts.

* Add fields to the per-cpu globaldata structure to access timebase related
  information using only a critical section rather then a mutex.  However,
  the 8254 Timer 2 access code still uses spin locks.  More work needs to
  be done here, the 'realtime' correction is still done in a single global
  'struct timespec basetime' structure.

* Remove the CLKINTR_PENDING icu and apic interrupt hacks.

* Augment the IPI Messaging code to make an intrframe available to callbacks.

* Document 8254 timing modes in i386/sai/timerreg.h.  Note that at the
  moment we assume an 8254 instead of an 8253 as we are using TIMER_SWSTROBE
  mode.  This may or may not have to be changed to an 8253 mode.

* Integrate the NTP correction code into the new timebase subsystem.

* Separate boottime from basettime.  Once boottime is believed to be stable
  it is no longer effected by NTP or other time corrections.

CAVETS:

	* PC speaker no longer works

	* Profiling interrupt rate not increased (it needs work to be
	  made operational on a per-cpu basis rather then system-wide).

	* The native timebase API is function-based, but currently hardwired.

	* There might or might not be issues with 486 systems due to the
	  timer mode I am using.

Revision 1.39: download - view: text, markup, annotated - select for diffs
Sun Jan 18 12:29:50 2004 UTC (10 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +2 -0 lines
CAPS IPC library stage 1/3: The core CAPS IPC code, providing system calls
to create and connect to named rendezvous points.  The CAPS interface
implements a many-to-1 (client:server) capability and is totally self
contained.  The messaging is designed to support single and multi-threading,
synchronous or asynchronous (as of this commit: polling and synchronous only).

Message data is 100% opaque and so while the intention is to integrate it into
a userland LWKT messaging subsystem, the actual system calls do not depend
on any LWKT structures.

Since these system calls are experiemental and may contain root holes,
they must be enabled via the sysctl kern.caps_enabled.

Revision 1.38: download - view: text, markup, annotated - select for diffs
Sun Dec 7 04:20:38 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +1 -1 lines
Add additional functionality to the upcall support to allow us to wait for
an upcall instead of spin.

Also fix a bug in the trap code.  %gs faults have to be handled in nested
interrupts because %gs is not saved and restored.  It is also possible that
%fs may have to be handled the same way, but I am not sure yet.

Revision 1.37: download - view: text, markup, annotated - select for diffs
Fri Nov 21 22:46:13 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +28 -10 lines
Do some fairly major include file cleanups to further separate kernelland
from userland.

    * Do not allow userland to include sys/proc.h directly, it must use
      sys/user.h instead.   This is because sys/proc.h has a huge number
      of kernel header file dependancies.

    * Do cleanups and work in lwkt_thread.c and lwkt_msgport.c to allow
      these files to be directly compiled in an upcoming userland thread
      support library.

    * sys/lock.h is inappropriately included by a number of third party
      programs so we can't disallow its inclusion, but do not include
      any kernel structures unless _KERNEL or _KERNEL_STRUCTURES are
      defined.

    * <ufs/ufs/inode.h> is often included by userland to get at the
      on-disk inode structure.  Only include the on-disk components and do
      not include kernel structural components unless _KERNEL or
      _KERNEL_STRUCTURES is defined

    * Various usr.bin programs include sys/proc.h unnecessarily.

    * The slab allocator has no concept of malloc buckets.  Remove malloc
      buckets structures and VMSTAT support from the system.

    * Make adjustments to sys/thread.h and sys/msgport.h such that the
      upcoming userland thread support library can include these files
      directly rather then copy them.

    * Use low level __int types in sys/globaldata.h, sys/msgport.h,
      sys/slaballoc.h, sys/thread.h, and sys/malloc.h, instead of
      high level sys/types.h types, reducing include dependancies.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Mon Nov 3 02:08:36 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +3 -3 lines
Augment the LWKT thread creation APIs to allow a cpu to be specified.  This
will be used by upcoming netisr and interrupt thread work to create protocol
and interrupt threads on specified cpus rather then cpu #0.

Revision 1.35: download - view: text, markup, annotated - select for diffs
Thu Oct 16 22:26:42 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +1 -0 lines
Fix the userland scheduler.  When the scheduler releases the P_CURPROC
designation it unconditionally handed it off to the highest priority
process on the userland process queue, ignoring the fact that the 'current'
process might have had a higher priority.  There was also a missing call to
lwkt_maybe_switch() in the resched_wanted() case that could cause interrupt
threads to stall for a long period of time when they could not preempt.

In SMP there are still some issues.  Niced processes work better, but at
the moment the P_CURPROC handoff does not take into account the fact that
the new higher priority process might better be handed off to another cpu
that is running a lower priority process then the current cpu.

Revision 1.34: download - view: text, markup, annotated - select for diffs
Wed Oct 15 23:27:05 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +1 -1 lines
Have lwkt_reltoken() return the generation number to facilitate checks
for stolen tokens.  Cleanup, optimize, and better document lwkt_gentoken().

Revision 1.33: download - view: text, markup, annotated - select for diffs
Thu Oct 2 22:27:00 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +1 -0 lines
Fix a number of interrupt related issues.

* Don't access kernel_map in free(), defer such operations to malloc()

* Fix a slab allocator panic due to mishandling of malloc size slab
  limit checks on machines with small amounts of memory (the slab allocator
  reduces the size of the zone on low-memory machines but did not handle the
  reduced size properly).

* Add thread->td_nest_count to prevent splz recursions from underflowing
  the kernel stack.  This can occur because we drop the critical section
  when calling sched_ithd() in order to allow it to preempt.

* Properly adjust intr_nesting_level around FAST interrupts

* Adjust the debugging printf() in lockmgr to only complain about blockable
  lock requests from interrupts.

Revision 1.32: download - view: text, markup, annotated - select for diffs
Wed Sep 24 18:37:51 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +2 -0 lines
Clean up thread priority and critical section handling during boot.  The
initial kernel threads (e.g. thread0/proc0) had a priority lower then userland!
Default them to the minimum kernel thread priority.

Thread0 was also unnecessarily left in a critical section, which prevented
certain device probes, such as the APIC 8254 timer test code, from working.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Mon Aug 25 19:50:33 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +1 -0 lines
Add the NO_KMEM_MAP kernel configuration option.  This is a temporary option
that will allow developers to test kmem_map removal and also the upcoming
(not this commit) slab allocator.  Currently this option removes kmem_map
and causes the malloc and zalloc subsystems to use kernel_map exclusively.

Change gd_intr_nesting_level.  This variable is now only bumped while we
are in a FAST interrupt or processing an IPIQ message.  This variable is
not bumped while we are in a normal interrupt or software interrupt thread.

Add warning printf()s if malloc() and related functions detect attempts to
use them from within a FAST interrupt or IPIQ.

Remove references to the no-longer-used zalloci() and zfreei() functions.

Revision 1.30: download - view: text, markup, annotated - select for diffs
Sun Aug 24 22:36:43 2003 UTC (11 years, 1 month ago) by hsu
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +1 -2 lines
Fix typos in comments.

Revision 1.29: download - view: text, markup, annotated - select for diffs
Wed Aug 20 07:31:21 2003 UTC (11 years, 2 months ago) by rob
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +1 -1 lines
__P() != wanted, begin removal, in order to preserve white space this needs
to be done by hand, as I accidently killed a source tree that I had gotten
this far on. I'm committing this now, LINT and GENERIC both build with
these changes, there are many more to come.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Fri Jul 25 05:26:52 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +4 -6 lines
Fix a minor bug in lwkt_init_thread() (the thread was being added to the
wrong td_allq).

Remove thread->td_cpu.  thread->td_gd (which points to the globaldata
structure) is sufficient.  Add e_cpuid to eproc to compensate.

Revision 1.27: download - view: text, markup, annotated - select for diffs
Thu Jul 24 23:52:39 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +1 -0 lines
Syscall messaging work 2: Continue with the implementation of sendsys(),
using int 0x81.  This entry point will be responsible for sending system
call messages or waiting for messages / port activity.

With this commit system call messages can be run through 0x81 but at the
moment they will always run synchronously. Here's the core interface
code for IA32:

    static __inline int
    sendsys(void *port, void *msg, int msgsize)
    {
	int error;
	__asm __volatile("int $0x81" : "=a"(error) :
			"a"(port), "c"(msg), "d"(msgsize) : "memory");
	return(error);
    }

Performance verses a direct system call is currently excellent considering
that this is my initial attempt.

		600MHzC3	1.2GHzP3x2(SMP)

getuid()	1300 ns		 909 ns
getuid_msg()	1700 ns		1077 ns

Revision 1.26: download - view: text, markup, annotated - select for diffs
Tue Jul 22 17:03:34 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +0 -1 lines
DEV messaging stage 2/4: In this stage all DEV commands are now being
funneled through the message port for action by the port's beginmsg function.
CONSOLE and DISK device shims replace the port with their own and then
forward to the original.  FB (Frame Buffer) shims supposedly do the same
thing but I haven't been able to test it.   I don't expect instability
in mainline code but there might be easy-to-fix, and some drivers still need
to be converted.  See primarily: kern/kern_device.c (new dev_*() functions and
inherits cdevsw code from kern/kern_conf.c), sys/device.h, and kern/subr_disk.c
for the high points.

In this stage all DEV messages are still acted upon synchronously in the
context of the caller.  We cannot create a separate handler thread until
the copyin's (primarily in ioctl functions) are made thread-aware.

Note that the messaging shims are going to look rather messy in these early
days but as more subsystems are converted over we will begin to use
pre-initialized messages and message forwarding to avoid having to constantly
rebuild messages prior to use.

Note that DEV itself is a mess oweing to its 4.x roots and will be cleaned
up in subsequent passes.  e.g. the way sub-devices inherit the main device's
cdevsw was always a bad hack and it still is, and several functions
(mmap, kqfilter, psize, poll) return results rather then error codes, which
will be fixed since now we have a message to store the result in :-)

Revision 1.25: download - view: text, markup, annotated - select for diffs
Sun Jul 20 01:37:22 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +8 -29 lines
This is the initial implmentation of the LWKT messaging infrastructure.
Messages are sent to message ports and typically replied to a message port
embedded in the originating thread's thread structure (td_msgport).
The port functions match up and optimization client sync/asynch requests
verses target synch/asynch responses.

In this initial implementation a port must be owned by a particular thread,
and we use *asynch* IPI messaging to forward queueing and dequeueing operations
to the correct cpu.  Most of the IPI overhead will be absorbed by the fact
that these same IPIs also tend to schedule the threads in question, which on
the correct cpu (which is the one it will be on) costs nothing.

Message ports have in-context dispatch functions for initiating, aborting,
and replying to a message which can be overriden and will queue by default.

This code compiles but is as yet unreferenced, and almost certainly needs more
work.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Sat Jul 12 17:54:36 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +3 -4 lines
Collapse gd_astpending and gd_reqpri together into gd_reqflags.  gd_reqflags
now rollsup requests made pending for doreti.  Cleanup a number of scheduling
primitives and note that we do not need to use locked bus cycles on per-cpu
variables.

Note that the aweful idelayed hack for certain softints (used only by the TTY
subsystem, BTW) gets slightly broken in this commit because idelayed has become
per-cpu and the clock ints aren't yet distributed.

Revision 1.23: download - view: text, markup, annotated - select for diffs
Fri Jul 11 17:42:11 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +6 -5 lines
MP Implmentation 4/4: Final cleanup for this stage.  Deal with a race
that occurs due to not having to hold the MP lock through an lwkt_switch()
where another cpu may pull off a process from the userland scheduler and
schedule its thread before the original cpu has completely switched out it.
Oddly enough latencies were enough that this bug never caused a crash!

Cleanup the scheduling code and in particular the switch assembly code, save
and restore eflags (cli/sti state) when switching heavy weight processes
(this is already done for light weight threads), add some counters, and
optimize fork() to (statistically) stay on the current cpu for a short while
to take advantage of locality of cache reference, which greatly improves
fork/exec times.  Note that synchronous pipe operations between two procseses
already (statistically) stick to the same cpu (which is what we want).

Revision 1.22: download - view: text, markup, annotated - select for diffs
Fri Jul 11 01:23:24 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +4 -1 lines
MP Implmentation 3B/4: Remove Xcpuast and Xforward_irq, replacing them
with IPI messaging functions.  Fix user scheduling issues so user processes
are dependably scheduled on available cpus.

Revision 1.21: download - view: text, markup, annotated - select for diffs
Thu Jul 10 04:47:55 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +9 -3 lines
MP Implmentation 3/4: MAJOR progress on SMP, full userland MP is now working!
A number of issues relating to MP lock operation have been fixed, primarily
that we have to read %cr2 before get_mplock() since get_mplock() may switch
away.  Idlethreads can now safely HLT without any performance detriment.
The userland scheduler has been almost completely rewritten and is now
using an extremely flexible abstraction with a lot of room to grow.  pgeflag
has been removed from mapdev (without per-page invalidation it isn't safe
to use PG_G even on UP).  Necessary locked bus cycles have been added for
the pmap->pm_active field in swtch.s.  CR3 has been unoptimized for the
moment (see comment in swtch.s).  Since the switch code runs without the
MP lock we have to adjust pm_active PRIOR to loading %cr3.
Additional sanity checks have been added to the code (see PARANOID_INVLTLB
and ONLY_ONE_USER_CPU in the code), plus many more in kern_switch.c.
A passive release mechanism has been implemented to optimize P_CURPROC/lwkt
priority shifting when going from user->kernel and kernel->user.
Note: preemptive interrupts don't care due to the way preemption works so
no additional complexity there.  non-locking atomic functions to protect
only against local interrupts have been added.  astpending now uses
non-locking atomic functions to set and clear bits.  private_tss has been
moved to a per-cpu variable.   The LWKT thread module has been considerably
enhanced and cleaned up, including some fixes to handle MPLOCKED vs td_mpcount
races (so eventually we can do MP locking without a pushfl/cli/popfl combo).
stopevent() needs critical section protection, maybe.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Tue Jul 8 06:27:28 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +28 -3 lines
MP Implementation 2/4: Implement a poor-man's IPI messaging subsystem,
get both cpus arbitrating the BGL for interrupts, IPIing foreign
cpu LWKT scheduling requests without crashing, and dealing with the cpl.

The APs are in a slightly less degenerate state now, but hardclock and
statclock distribution is broken, only one user process is being scheduled
at a time, and priorities are all messed up.

Revision 1.19: download - view: text, markup, annotated - select for diffs
Sun Jul 6 21:23:54 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +15 -3 lines
MP Implementation 1/2: Get the APIC code working again, sweetly integrate the
MP lock into the LWKT scheduler, replace the old simplelock code with
tokens or spin locks as appropriate.  In particular, the vnode interlock
(and most other interlocks) are now tokens.  Also clean up a few curproc/cred
sequences that are no longer needed.

The APs are left in degenerate state with non IPI interrupts disabled as
additional LWKT work must be done before we can really make use of them,
and FAST interrupts are not managed by the MP lock yet.  The main thing
for this stage was to get the system working with an APIC again.

buildworld tested on UP and 2xCPU/MP (Dell 2550)

Revision 1.18: download - view: text, markup, annotated - select for diffs
Fri Jul 4 00:32:32 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +2 -1 lines
Generic MP rollup work.

Revision 1.17: download - view: text, markup, annotated - select for diffs
Mon Jun 30 23:54:04 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +8 -1 lines
Add threads to the process-retrieval sysctls so they show up in top, ps, etc.
Reorder the boot sequence a little to add a TAILQ for all threads.   Add
a td_refs field to prevent a thread from disappearing on us.

Revision 1.16: download - view: text, markup, annotated - select for diffs
Mon Jun 30 19:50:32 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +10 -3 lines
Misc interrupts/LWKT 1/2: threaded interrupts 2: Major work on the
user scheduler, separate it completely from the LWKT scheduler and make
user priorities, including idprio, normal, and rtprio, work properly.
This includes fixing the priority inversion problem that 4.x had.
Also complete the work on interrupt preemption.  There were a few things
I wasn't doing correctly including not protecting the initial call
to cpu_heavy_restore when a process is just starting up.  Enhance DDB a
bit (threads don't show up in PS yet).

This is a major milestone.

Revision 1.15: download - view: text, markup, annotated - select for diffs
Sun Jun 29 07:37:07 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +8 -2 lines
Misc interrupts/LWKT 1/2: interlock the idle thread.  Put execution of
fast interrupts inside a critical section.  Make the hardclock and statclock
INTR_FAST.  Implement the strict priority queue mechanism for LWKTs.
Implement prioritized preemption for interrupt and softint preemption.
Keep better stats.

Note: this commit hacks up the userland scheduler, in particular the
notion of 'curproc' because threaded interrupts really mess up the userland
scheduler's idea of curproc, which it uses to assume that the process is not
on a run queue even though it is runnable.  The next step will be to
separate out and cleanup the userland scheduler.

Revision 1.14: download - view: text, markup, annotated - select for diffs
Sun Jun 29 05:29:31 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +2 -2 lines
Implement interrupt thread preemption + minor cleanup.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Sun Jun 29 03:28:46 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +4 -3 lines
threaded interrupts 1: Rewrite the ICU interrupt code, splz, and doreti code.
The APIC code hasn't been done yet.   Consolidate many interrupt thread
related functions into MI code, especially software interrupts.  All normal
interrupts and software interrupts are now threaded, and I'm almost ready
to deal with interrupt-thread-only preemption.  At the moment I run
interrupt threads in a critical section and probably will continue to do
so until I can make them MP safe.

Revision 1.12: download - view: text, markup, annotated - select for diffs
Sat Jun 28 04:16:05 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +5 -3 lines
smp/up collapse stage 2 of 2:  cleanup the globaldata structure, cleanup
and separate machine dependant portions of thread, proc, and globaldata,
and reduce the need to include lots of MD header files.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Fri Jun 27 03:30:43 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +16 -10 lines
Cleanup lwkt threads a bit, change the exit/reap interlock.

Revision 1.10: download - view: text, markup, annotated - select for diffs
Fri Jun 27 01:53:26 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +6 -1 lines
proc->thread stage 6: kernel threads now create processless LWKT threads.
A number of obvious curproc cases were removed, tsleep/wakeup was made to
work with threads (wmesg, ident, and timeout features moved to threads).
There are probably a few curproc cases left to fix.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Wed Jun 25 03:56:10 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +3 -1 lines
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread
pointers instead of process pointers as arguments, similar to what FreeBSD-5
did.  Note however that ultimately both APIs are going to be message-passing
which means the current thread context will not be useable for creds and
descriptor access.

Revision 1.8: download - view: text, markup, annotated - select for diffs
Mon Jun 23 23:36:14 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +4 -1 lines
proc->thread stage3: make time accounting threads based and rework it for
performance.

Cleanup user/sys/interrupt time accounting.  Get rid of the microputime and
equivalent support code in mi_switch() (it was really a bad idea to put that
in the critical path IMHO).  Instead account for time statistically
from the statclock, which produce time accounting that is just as accurate
in the long haul.  Remove the u/s/iticks fields from the proc structure and
put a slightly different version in the thread structure, so time can be
accounted for both threads and processes.

Revision 1.7: download - view: text, markup, annotated - select for diffs
Sun Jun 22 20:32:18 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +2 -1 lines
Optimize lwkt_rwlock.c a bit

Revision 1.6: download - view: text, markup, annotated - select for diffs
Sun Jun 22 04:30:43 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +4 -2 lines
thread stage 10: (note stage 9 was the kern/lwkt_rwlock commit).  Cleanup
thread and process creation functions.  Check the spl against ipending in
cpu_lwkt_restore (so the idle loop does not lockup the machine).  Remove
the old VM object kstack allocation and freeing code.  Leave newly created
processes in a stopped state to fix wakeup/fork_handler races.  Normalize
the lwkt_init_*() functions.

Add a sysctl debug.untimely_switch which will cause the last crit_exit()
to yield, which causes a task switch to occur in wakeup() and catches a
lot of 4.x-isms that can be found and fixed on UP.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Sat Jun 21 17:31:22 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +14 -3 lines
Add kern/lwkt_rwlock.c -- reader/writer locks.  Clean up the process exit &
reaping interlock code to allow context switches to occur.  Clean up and
make operational the lwkt_block/signaling code.

Revision 1.4: download - view: text, markup, annotated - select for diffs
Sat Jun 21 07:54:57 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +138 -23 lines
thread stage 8: add crit_enter(), per-thread cpl handling, fix deferred
interrupt handling for critical sections, add some basic passive token code,
and blocking/signaling code.   Add structural definitions for additional
LWKT mechanisms.

Remove asleep/await.  Add generation number based xsleep/xwakeup.

Note that when exiting the last crit_exit() we run splz() to catch up
on blocked interrupts.  There is also some #if 0'd code that will cause
a thread switch to occur 'at odd times'... primarily wakeup()->
lwkt_schedule()->critical_section->switch.  This will be usefulf or testing
purposes down the line.

The passive token code is mostly disabled at the moment.  It's primary use
will be under SMP and its primary advantage is very low overhead on UP and,
if used properly, should also have good characteristics under SMP.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Fri Jun 20 02:09:59 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +23 -3 lines
thread stage 7: Implement basic LWKTs, use a straight round-robin model for
the moment.  Also continue consolidating the globaldata structure so both UP
and SMP use it with more commonality.  Temporarily match user processes up
with scheduled LWKTs on a 1:1 basis.  Eventually user processes will have
LWKTs, but they will not all be scheduled 1:1 with the user process's
runnability.

With this commit work can potentially start to fan out, but I'm not ready
to announce yet.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Thu Jun 19 06:26:10 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +4 -3 lines
thread stage 6:  Move thread stack management from the proc structure to
the thread structure, cleanup the pmap_new_*() and pmap_dispose_*()
functions, and disable UPAGES swapping (if we eventually separate the kstack
from the UPAGES we can reenable it).  Also LIFO/4 cache thread structures
which improves fork() performance by 40% (when used in typical fork/exec/exit
or fork/subshell/exit situations).

Revision 1.1: download - view: text, markup, annotated - select for diffs
Wed Jun 18 23:05:12 2003 UTC (11 years, 4 months ago) by dillon
Branches: MAIN
Oops commit the thread.h file.

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options