Up to [DragonFly] / src / sys / i386 / i386
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Reorganize the way machine architectures are handled. Consolidate the kernel configurations into a single generic directory. Move machine-specific Makefile's and loader scripts into the appropriate architecture directory. Kernel and module builds also generally add sys/arch to the include path so source files that include architecture-specific headers do not have to be adjusted. sys/<ARCH> -> sys/arch/<ARCH> sys/conf/*.<ARCH> -> sys/arch/<ARCH>/conf/*.<ARCH> sys/<ARCH>/conf/<KERNEL> -> sys/config/<KERNEL>
We shouldn't have to fninit to make the FP unit usable for MMX based copies. fnclex should be sufficient. Reported-by: "Attilio Rao" <attilio@freebsd.org> Info-originally-from: Bruce Evans
Introduce an ultra-simple, non-overlapping, int-aligned bcopy called bcopyi().
Update all my personal copyrights to the Dragonfly Standard Copyright.
Another major mmx/xmm/FP commit. This is a combination of several patches but since the earlier patches didn't actually fix the crashing and corruption issues we were seeing everything has been rolled into one well tested commit. Make the FP more deterministic by requiring that npxthread and the FP state be properly synchronized, and that the FP be in a 'safe' state (meaning that mmx/xmm registers be useable) when npxthread is NULL. Allow the FP save area to be revectored. Kernel entities which use the FP unit, such as the bcopy code, must save the app state if it hasn't already been saved, then revector the save area. Note that combinations of operations must be protected by a critical section or interrupt disablement. Any clearing or setting npxthread combined with an fxsave/fnsave/frstor/fxrstor/fninit must be protected as an atomic entity. Since interrupts are threads and can preempt, such preemption will cause a thread switch to occur and thus cause npxthread and the FP state to be manipulated. The kernel can only depend on the FP state being stable for its use after it has revectored the FP save area. This commit fixes a number of issues, including potential filesystem corruption and kernel crashes.
Add bcopyb() back in for the PCVT driver. bcopyb() is explicitly byte-granular for the few (one?) memory mapped device which cannot always handle 16 or 32 bit ops. Reported-by: David Rhodus
Fix a race in the FP copy code. If we setup our temporary FP save area before we check npxthread it is possible for a one-instruction-window interrupt to come along and save the application FP state to our temporary area and then clear npxthread, causing the application FP state to be thrown away. Also, if there is no app FP state (npxthread is NULL), it is possible once we set npxthread=curthread for an interrupt to come along and save bogus FP state to our temporary save area before we have a chance to fninit (one instruction window since we clts just prior to the fninit), causing the fninit to fault and npxdna to restore the bogus state. Use a critical section to prevent these cases from occuring.
Correct a bug in the last FPU optimized bcopy commit. The user FPU state was being corrupted by interrupts. Fix the bug by implementing a feature described as a missif in the original FreeBSD comments... add a pointer to the FP saved state in the thread structure so routines which 'borrow' the FP unit can simply revector the pointer temporarily to avoid corruption of the original user FP state. The MMX_*_BLOCK macros in bcopy.s have also been simplified somewhat. We can simplify them even more (in the future) by reserving FPU save space in the per-cpu structure instead of on the stack.
Rewrite the optimized memcpy/bcopy/bzero support subsystem. Rip out the old FreeBSD code almost entirely. * Add support for stacked ONFAULT routines, allowing copyin and copyout to call the general memcpy entry point instead of rolling their own. * Split memcpy/bcopy and bzero into their own files * Add support for XMM (128 bit) and MMX (64 bit) media instruction copies * Rewrite the integer code. Also note that most of the previous integer and FP special case support had been ripped out of DragonFly long ago in that the assembly was no longer being referenced. It doesn't make sense to have a dozen different zeroing/copying routines so focus on the ones that work well with recent (last ~5 years) cpus. * Rewrite the FP state handling code. Instead of restoring the FP state let it hang, which allows userland to make multiple syscalls and/or for the system to make multiple bcopy()/memcpy() calls without having to save/restore the FP state on each call. Userland will take a fault when it needs the FP again. Note that FP optimized copies only occur for block sizes >= 2048 bytes, so this is not something that userland, or the kernel, will trip up on every time it tries to do a bcopy(). * LWKT threads need to be able to save the FP state, add the simple conditional and 5 lines of assembly required to do that. AMD Athlon notes: 64 bit media instructions will get us 90% of the way there. It is possible to squeeze out slightly more memory bandwidth from the 128 bit XMM instructions (SSE2). While it does not exist in this commit there are two additional features that can be used: prefetching and non-temporal writes. Prefetching is a 3dNOW instruction and can squeeze out significant additionaL performance if you fetch ~128 bytes ahead of the game, but I believe it is AMD-only. Non-temporal writes can double UNCACHED memory bandwidth, but they have a horrible effect on L1/L2 performance and you can't mix non-temporal writes with normal writes without completely destroying memory performance (e.g. multiple GB/s -> less then 100 MBytes/sec). Neither prefetching nor non-temporal writes are implemented in this commit.