Up to [DragonFly] / src / sys / i386 / i386
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Reorganize the way machine architectures are handled. Consolidate the kernel configurations into a single generic directory. Move machine-specific Makefile's and loader scripts into the appropriate architecture directory. Kernel and module builds also generally add sys/arch to the include path so source files that include architecture-specific headers do not have to be adjusted. sys/<ARCH> -> sys/arch/<ARCH> sys/conf/*.<ARCH> -> sys/arch/<ARCH>/conf/*.<ARCH> sys/<ARCH>/conf/<KERNEL> -> sys/config/<KERNEL>
ICU/APIC cleanup part 9/many. Get rid of machine/smptests.h, remove or implement the related #defines. Distinguish between boot-time vector initialization and interrupt setup and teardown in MACHINTR ABI. Get rid of the ISR test for APIC-generated interrupts and all related support code. Just generate the EOI and pray. Document more of the IO APIC redirection register(s). Intel sure screwed up the LAPIC and IO APIC royally. There is no simple way to poll the actual signal level on a pin, no simple way to manually EOI interrupts or EOI them in the order we desire, no simple way to poll the LAPIC for the vector that will be EOI'd when we send the EOI. We can't mask the interrupt on the IO APIC without triggering stupid legacy code on some machines. We can't even program the IO APIC linearly, it uses a stupid register/data sequence that makes it impossible for access on an SMP system without serialization. It's a goddamn mess, and it is all Intel's fault.
ICU/APIC cleanup part 2/many. Break the long chain of #include's based at exception.s.
ICU/APIC cleanup part 1/many. Move ICU and APIC support files into their own subdirectory, bump the required config version for the build since this move also requires the use of the new arch/ symlink.
Implement TLS support, tls manual pages, and link the umtx and tls manual pages together. TLS stands for 'thread local storage' and is used to support efficient userland threading and threaded data access models. Three TLS segments are supported in order to (eventually) support GCC3's __thread qualifier. David Xu's thread library only uses one descriptor for now. The system calls implement a mostly machine-independant API which return architecture-specific results. Rather then pass the actual descriptor structure, which unnecessarily pollutes the userland implementation, we pass a more generic (base,size) and the system call returns the %gs load value for IA32. For AMD64 and other architectures, the returned value will be something for those architectures. The current low level assembly support is not as efficient as it could be, but it is good enough for now. The heavy weight switch code for processes does the work. The light weight switch code for pure kernel threads has not been changed (since the kernel doesn't use TLS descriptors we can just ignore them). Based on work by David Xu <email@example.com> and Matthew Dillon <firstname.lastname@example.org>
Remove bogus DIAGNOSTIC code that checked if the process was SZOMB or SRUN during cpu_heavy_restore(). In fact, the process structure may be in virtually any state since a preemption will call the restore function to return to the preempted thread. The process state is really more a function of the userland scheduler and not so much related to the LWKT scheduler.
Update all my personal copyrights to the Dragonfly Standard Copyright.
Another major mmx/xmm/FP commit. This is a combination of several patches but since the earlier patches didn't actually fix the crashing and corruption issues we were seeing everything has been rolled into one well tested commit. Make the FP more deterministic by requiring that npxthread and the FP state be properly synchronized, and that the FP be in a 'safe' state (meaning that mmx/xmm registers be useable) when npxthread is NULL. Allow the FP save area to be revectored. Kernel entities which use the FP unit, such as the bcopy code, must save the app state if it hasn't already been saved, then revector the save area. Note that combinations of operations must be protected by a critical section or interrupt disablement. Any clearing or setting npxthread combined with an fxsave/fnsave/frstor/fxrstor/fninit must be protected as an atomic entity. Since interrupts are threads and can preempt, such preemption will cause a thread switch to occur and thus cause npxthread and the FP state to be manipulated. The kernel can only depend on the FP state being stable for its use after it has revectored the FP save area. This commit fixes a number of issues, including potential filesystem corruption and kernel crashes.
Correct a bug in the last FPU optimized bcopy commit. The user FPU state was being corrupted by interrupts. Fix the bug by implementing a feature described as a missif in the original FreeBSD comments... add a pointer to the FP saved state in the thread structure so routines which 'borrow' the FP unit can simply revector the pointer temporarily to avoid corruption of the original user FP state. The MMX_*_BLOCK macros in bcopy.s have also been simplified somewhat. We can simplify them even more (in the future) by reserving FPU save space in the per-cpu structure instead of on the stack.
Rewrite the optimized memcpy/bcopy/bzero support subsystem. Rip out the old FreeBSD code almost entirely. * Add support for stacked ONFAULT routines, allowing copyin and copyout to call the general memcpy entry point instead of rolling their own. * Split memcpy/bcopy and bzero into their own files * Add support for XMM (128 bit) and MMX (64 bit) media instruction copies * Rewrite the integer code. Also note that most of the previous integer and FP special case support had been ripped out of DragonFly long ago in that the assembly was no longer being referenced. It doesn't make sense to have a dozen different zeroing/copying routines so focus on the ones that work well with recent (last ~5 years) cpus. * Rewrite the FP state handling code. Instead of restoring the FP state let it hang, which allows userland to make multiple syscalls and/or for the system to make multiple bcopy()/memcpy() calls without having to save/restore the FP state on each call. Userland will take a fault when it needs the FP again. Note that FP optimized copies only occur for block sizes >= 2048 bytes, so this is not something that userland, or the kernel, will trip up on every time it tries to do a bcopy(). * LWKT threads need to be able to save the FP state, add the simple conditional and 5 lines of assembly required to do that. AMD Athlon notes: 64 bit media instructions will get us 90% of the way there. It is possible to squeeze out slightly more memory bandwidth from the 128 bit XMM instructions (SSE2). While it does not exist in this commit there are two additional features that can be used: prefetching and non-temporal writes. Prefetching is a 3dNOW instruction and can squeeze out significant additionaL performance if you fetch ~128 bytes ahead of the game, but I believe it is AMD-only. Non-temporal writes can double UNCACHED memory bandwidth, but they have a horrible effect on L1/L2 performance and you can't mix non-temporal writes with normal writes without completely destroying memory performance (e.g. multiple GB/s -> less then 100 MBytes/sec). Neither prefetching nor non-temporal writes are implemented in this commit.
Do some major performance tuning of the userland scheduler. When determining whether to reschedule, use a relative priority comparison against PPQ rather then a queue index comparison to avoid the edge case where two processes are only a p_priority of 1 apart, but fall into different queues. This reduces unnecessary preemptive context switches. Also change the sense of test_resched() and document it. Properly incriement p_ru.ru_nivcsw (involuntary context switches stat counter). Fix uio_yield(). We have to call lwkt_setpri_self() to cycle our thread to the end of its runq, and we do not need to call acquire_curproc() and release_curproc() after switching. When returning to userland, lower our priority and call lwkt_maybe_switch() BEFORE acquiring P_CURPROC. Before we called lwkt_maybe_switch() after we acquired P_CURPROC which could result in us holding P_CURPROC, switching to another thread which itself returns to usermode at a higher priority, and that thread having to switch back to us to release P_CURPROC and then us back to the other thread again. This reduces the number of unnecessary context switches that occur in certain situations. In particular, this cuts the number of context switches in PIPE situations by 50-75% (1/2 to 2/3).
Synchronize a bunch of things from FreeBSD-5 in preparation for the new ACPICA driver support. * Bring in a lot of new bus and pci DEV_METHODs from FreeBSD-5 * split apic.h into apicreg.h and apicio.h * rename INTR_TYPE_FAST -> INTR_FAST and move the #define * rename INTR_TYPE_EXCL -> INTR_EXCL and move the #define * rename some PCIR_ registers and add additional macros from FreeBSD-5 * note: new pcib bus call, host_pcib_get_busno() imported. * kern/subr_power.c no longer optional. Other changes: * machine/smp.h machine smp/smptests.h can now be #included unconditionally, and some APIC_IO vs SMP separation has been done as well. * gd_acpi_id and gd_apic_id added to machine/globaldata.h prep for new ACPI code. Despite all the changes, the generated code should be virtually the same. These were mostly additions which the pre-existing code does not (yet) use.
USER_LDT is now required by a number of packages as well as our upcoming user threads support. Make it non-optional. USER_LDT breaks SysV emulated sysarch(... SVR4_SYSARCH_DSCR) support. For now just #if 0 out the support (which is what FreeBSD-5.x does). Submitted-by: Craig Dooley <email@example.com>
Fix a number of mp_lock issues. I had outsmarted myself trying to deal with td->td_mpcount / mp_lock races. The new rule is: you first modify td->td_mpcount, then you deal with mp_lock assuming that an interrupt might have already dealt with it for you, and various other pieces of code deal with the race if an interrupt occurs in the middle of the above two data accesses.
Fix a DIAGNOSTIC check in the heavy-weight switch code. A thread's process is normally in the SRUN state but can also be preempted while moving into the SZOMB (zombie) state. Report-by: "David P. Reese, Jr." <firstname.lastname@example.org>
kernel tree reorganization stage 1: Major cvs repository work (not logged as commits) plus a major reworking of the #include's to accomodate the relocations. * CVS repository files manually moved. Old directories left intact and empty (temporary). * Reorganize all filesystems into vfs/, most devices into dev/, sub-divide devices by function. * Begin to move device-specific architecture files to the device subdirs rather then throwing them all into, e.g. i386/include * Reorganize files related to system busses, placing the related code in a new bus/ directory. Also move cam to bus/cam though this may not have been the best idea in retrospect. * Reorganize emulation code and place it in a new emulation/ directory. * Remove the -I- compiler option in order to allow #include file localization, rename all config generated X.h files to use_X.h to clean up the conflicts. * Remove /usr/src/include (or /usr/include) dependancies during the kernel build, beyond what is normally needed to compile helper programs. * Make config create 'machine' softlinks for architecture specific directories outside of the standard <arch>/include. * Bump the config rev. WARNING! after this commit /usr/include and /usr/src/sys/compile/* should be regenerated from scratch.
Profiling cleanup 1/2: fix crashes (all registers need to be left intact from assembly), and fix a few syntax issues, etc. It isn't ticking away properly yet but at least it isn't crashing. Submitted-by: Kip Macy <email@example.com> Additional-work: dillon
MP Implmentation 4/4: Final cleanup for this stage. Deal with a race that occurs due to not having to hold the MP lock through an lwkt_switch() where another cpu may pull off a process from the userland scheduler and schedule its thread before the original cpu has completely switched out it. Oddly enough latencies were enough that this bug never caused a crash! Cleanup the scheduling code and in particular the switch assembly code, save and restore eflags (cli/sti state) when switching heavy weight processes (this is already done for light weight threads), add some counters, and optimize fork() to (statistically) stay on the current cpu for a short while to take advantage of locality of cache reference, which greatly improves fork/exec times. Note that synchronous pipe operations between two procseses already (statistically) stick to the same cpu (which is what we want).
MP Implmentation 3/4: MAJOR progress on SMP, full userland MP is now working! A number of issues relating to MP lock operation have been fixed, primarily that we have to read %cr2 before get_mplock() since get_mplock() may switch away. Idlethreads can now safely HLT without any performance detriment. The userland scheduler has been almost completely rewritten and is now using an extremely flexible abstraction with a lot of room to grow. pgeflag has been removed from mapdev (without per-page invalidation it isn't safe to use PG_G even on UP). Necessary locked bus cycles have been added for the pmap->pm_active field in swtch.s. CR3 has been unoptimized for the moment (see comment in swtch.s). Since the switch code runs without the MP lock we have to adjust pm_active PRIOR to loading %cr3. Additional sanity checks have been added to the code (see PARANOID_INVLTLB and ONLY_ONE_USER_CPU in the code), plus many more in kern_switch.c. A passive release mechanism has been implemented to optimize P_CURPROC/lwkt priority shifting when going from user->kernel and kernel->user. Note: preemptive interrupts don't care due to the way preemption works so no additional complexity there. non-locking atomic functions to protect only against local interrupts have been added. astpending now uses non-locking atomic functions to set and clear bits. private_tss has been moved to a per-cpu variable. The LWKT thread module has been considerably enhanced and cleaned up, including some fixes to handle MPLOCKED vs td_mpcount races (so eventually we can do MP locking without a pushfl/cli/popfl combo). stopevent() needs critical section protection, maybe.
MP Implementation 2/4: Implement a poor-man's IPI messaging subsystem, get both cpus arbitrating the BGL for interrupts, IPIing foreign cpu LWKT scheduling requests without crashing, and dealing with the cpl. The APs are in a slightly less degenerate state now, but hardclock and statclock distribution is broken, only one user process is being scheduled at a time, and priorities are all messed up.
MP Implementation 1/2: Get the APIC code working again, sweetly integrate the MP lock into the LWKT scheduler, replace the old simplelock code with tokens or spin locks as appropriate. In particular, the vnode interlock (and most other interlocks) are now tokens. Also clean up a few curproc/cred sequences that are no longer needed. The APs are left in degenerate state with non IPI interrupts disabled as additional LWKT work must be done before we can really make use of them, and FAST interrupts are not managed by the MP lock yet. The main thing for this stage was to get the system working with an APIC again. buildworld tested on UP and 2xCPU/MP (Dell 2550)
fix a bug in the exit td_switch function, curthread was not always being loaded correctly.
Generic MP rollup work.
Remove pre-ELF underscore prefix and asnames macro hacks.
Misc interrupts/LWKT 1/2: threaded interrupts 2: Major work on the user scheduler, separate it completely from the LWKT scheduler and make user priorities, including idprio, normal, and rtprio, work properly. This includes fixing the priority inversion problem that 4.x had. Also complete the work on interrupt preemption. There were a few things I wasn't doing correctly including not protecting the initial call to cpu_heavy_restore when a process is just starting up. Enhance DDB a bit (threads don't show up in PS yet). This is a major milestone.
threaded interrupts 1: Rewrite the ICU interrupt code, splz, and doreti code. The APIC code hasn't been done yet. Consolidate many interrupt thread related functions into MI code, especially software interrupts. All normal interrupts and software interrupts are now threaded, and I'm almost ready to deal with interrupt-thread-only preemption. At the moment I run interrupt threads in a critical section and probably will continue to do so until I can make them MP safe.
smp/up collapse stage 1 of 2: Make UP use the globaldata structure the same way SMP does, and start removing all the bad macros and hacks that existed before.
Cleanup lwkt threads a bit, change the exit/reap interlock.
proc->thread stage 6: kernel threads now create processless LWKT threads. A number of obvious curproc cases were removed, tsleep/wakeup was made to work with threads (wmesg, ident, and timeout features moved to threads). There are probably a few curproc cases left to fix.
minor code optimization.
Finish migrating the cpl into the thread structure.
thread stage 10: (note stage 9 was the kern/lwkt_rwlock commit). Cleanup thread and process creation functions. Check the spl against ipending in cpu_lwkt_restore (so the idle loop does not lockup the machine). Remove the old VM object kstack allocation and freeing code. Leave newly created processes in a stopped state to fix wakeup/fork_handler races. Normalize the lwkt_init_*() functions. Add a sysctl debug.untimely_switch which will cause the last crit_exit() to yield, which causes a task switch to occur in wakeup() and catches a lot of 4.x-isms that can be found and fixed on UP.
Add kern/lwkt_rwlock.c -- reader/writer locks. Clean up the process exit & reaping interlock code to allow context switches to occur. Clean up and make operational the lwkt_block/signaling code.
thread stage 8: add crit_enter(), per-thread cpl handling, fix deferred interrupt handling for critical sections, add some basic passive token code, and blocking/signaling code. Add structural definitions for additional LWKT mechanisms. Remove asleep/await. Add generation number based xsleep/xwakeup. Note that when exiting the last crit_exit() we run splz() to catch up on blocked interrupts. There is also some #if 0'd code that will cause a thread switch to occur 'at odd times'... primarily wakeup()-> lwkt_schedule()->critical_section->switch. This will be usefulf or testing purposes down the line. The passive token code is mostly disabled at the moment. It's primary use will be under SMP and its primary advantage is very low overhead on UP and, if used properly, should also have good characteristics under SMP.
thread stage 7: Implement basic LWKTs, use a straight round-robin model for the moment. Also continue consolidating the globaldata structure so both UP and SMP use it with more commonality. Temporarily match user processes up with scheduled LWKTs on a 1:1 basis. Eventually user processes will have LWKTs, but they will not all be scheduled 1:1 with the user process's runnability. With this commit work can potentially start to fan out, but I'm not ready to announce yet.
thread stage 4: remove curpcb, use td_pcb reference instead. Move the pcb to the end of the thread stack, and note that a pcb will always exist because a thread context will always exist. Also note that vm86 replaces td_pcb temporarily and we really need to rip that out and instead make a copy on the stack, because assumptions are made in regards to the pcb's location.
thread stage 3: create independant thread structure, unembed from proc.
thread stage 2: convert npxproc to npxthread.
thread stage 1: convert curproc to curthread, embed struct thread in proc.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 220.127.116.11