- view: text
- select for diffs
Thu Apr 29 17:24:58 2004 UTC
(9 years, 11 months ago) by dillon
CVS tags: HEAD
Rewrite the optimized memcpy/bcopy/bzero support subsystem. Rip out the
old FreeBSD code almost entirely.
* Add support for stacked ONFAULT routines, allowing copyin and copyout to
call the general memcpy entry point instead of rolling their own.
* Split memcpy/bcopy and bzero into their own files
* Add support for XMM (128 bit) and MMX (64 bit) media instruction copies
* Rewrite the integer code. Also note that most of the previous integer
and FP special case support had been ripped out of DragonFly long ago
in that the assembly was no longer being referenced. It doesn't make
sense to have a dozen different zeroing/copying routines so focus on
the ones that work well with recent (last ~5 years) cpus.
* Rewrite the FP state handling code. Instead of restoring the FP state
let it hang, which allows userland to make multiple syscalls and/or for
the system to make multiple bcopy()/memcpy() calls without having to
save/restore the FP state on each call. Userland will take a fault when
it needs the FP again.
Note that FP optimized copies only occur for block sizes >= 2048 bytes,
so this is not something that userland, or the kernel, will trip up on
every time it tries to do a bcopy().
* LWKT threads need to be able to save the FP state, add the simple
conditional and 5 lines of assembly required to do that.
AMD Athlon notes: 64 bit media instructions will get us 90% of the way
there. It is possible to squeeze out slightly more memory bandwidth from
the 128 bit XMM instructions (SSE2). While it does not exist in this commit
there are two additional features that can be used: prefetching and
non-temporal writes. Prefetching is a 3dNOW instruction and can squeeze
out significant additionaL performance if you fetch ~128 bytes ahead of
the game, but I believe it is AMD-only. Non-temporal writes can double
UNCACHED memory bandwidth, but they have a horrible effect on L1/L2
performance and you can't mix non-temporal writes with normal writes without
completely destroying memory performance (e.g. multiple GB/s -> less then
Neither prefetching nor non-temporal writes are implemented in this commit.
2: * Copyright (c) 1990 The Regents of the University of California.
3: * All rights reserved.
4: * LWKT threads Copyright (c) 2003 Matthew Dillon
6: * This code is derived from software contributed to Berkeley by
7: * William Jolitz.
9: * Redistribution and use in source and binary forms, with or without
10: * modification, are permitted provided that the following conditions
11: * are met:
12: * 1. Redistributions of source code must retain the above copyright
13: * notice, this list of conditions and the following disclaimer.
14: * 2. Redistributions in binary form must reproduce the above copyright
15: * notice, this list of conditions and the following disclaimer in the
16: * documentation and/or other materials provided with the distribution.
17: * 3. All advertising materials mentioning features or use of this software
18: * must display the following acknowledgement:
19: * This product includes software developed by the University of
20: * California, Berkeley and its contributors.
21: * 4. Neither the name of the University nor the names of its contributors
22: * may be used to endorse or promote products derived from this software
23: * without specific prior written permission.
25: * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
26: * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27: * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
28: * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
29: * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
30: * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
31: * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
32: * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
33: * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
34: * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
35: * SUCH DAMAGE.
37: * $FreeBSD: src/sys/i386/i386/swtch.s,v 18.104.22.168 2003/01/23 03:36:24 ps Exp $
38: * $DragonFly: src/sys/i386/i386/swtch.s,v 1.32 2004/04/29 17:24:58 dillon Exp $
41: #include "use_npx.h"
43: #include <sys/rtprio.h>
45: #include <machine/asmacros.h>
46: #include <machine/ipl.h>
48: #include <machine/pmap.h>
49: #include <machine/smptests.h> /** GRAB_LOPRIO */
50: #include <machine/apicreg.h>
51: #include <machine/lock.h>
53: #include "assym.s"
55: #if defined(SMP)
56: #define MPLOCKED lock ;
58: #define MPLOCKED
63: .globl panic
65: #if defined(SWTCH_OPTIM_STATS)
66: .globl swtch_optim_stats, tlb_flush_count
67: swtch_optim_stats: .long 0 /* number of _swtch_optims */
68: tlb_flush_count: .long 0
75: * cpu_heavy_switch(next_thread)
77: * Switch from the current thread to a new thread. This entry
78: * is normally called via the thread->td_switch function, and will
79: * only be called when the current thread is a heavy weight process.
81: * Some instructions have been reordered to reduce pipeline stalls.
83: * YYY disable interrupts once giant is removed.
87: * Save general regs
89: movl PCPU(curthread),%ecx
90: movl (%esp),%eax /* (reorder optimization) */
91: movl TD_PCB(%ecx),%edx /* EDX = PCB */
92: movl %eax,PCB_EIP(%edx) /* return PC may be modified */
93: movl %ebx,PCB_EBX(%edx)
94: movl %esp,PCB_ESP(%edx)
95: movl %ebp,PCB_EBP(%edx)
96: movl %esi,PCB_ESI(%edx)
97: movl %edi,PCB_EDI(%edx)
98: movl %gs,PCB_GS(%edx)
100: movl %ecx,%ebx /* EBX = curthread */
101: movl TD_PROC(%ecx),%ecx
102: movl PCPU(cpuid), %eax
103: movl P_VMSPACE(%ecx), %ecx /* ECX = vmspace */
104: MPLOCKED btrl %eax, VM_PMAP+PM_ACTIVE(%ecx)
107: * Push the LWKT switch restore function, which resumes a heavy
108: * weight process. Note that the LWKT switcher is based on
109: * TD_SP, while the heavy weight process switcher is based on
110: * PCB_ESP. TD_SP is usually two ints pushed relative to
111: * PCB_ESP. We push the flags for later restore by cpu_heavy_restore.
114: pushl $cpu_heavy_restore
115: movl %esp,TD_SP(%ebx)
118: * Save debug regs if necessary
120: movb PCB_FLAGS(%edx),%al
121: andb $PCB_DBREGS,%al
122: jz 1f /* no, skip over */
123: movl %dr7,%eax /* yes, do the save */
124: movl %eax,PCB_DR7(%edx)
125: andl $0x0000fc00, %eax /* disable all watchpoints */
126: movl %eax,%dr7
127: movl %dr6,%eax
128: movl %eax,PCB_DR6(%edx)
129: movl %dr3,%eax
130: movl %eax,PCB_DR3(%edx)
131: movl %dr2,%eax
132: movl %eax,PCB_DR2(%edx)
133: movl %dr1,%eax
134: movl %eax,PCB_DR1(%edx)
135: movl %dr0,%eax
136: movl %eax,PCB_DR0(%edx)
139: #if NNPX > 0
141: * Save the FP state if we have used the FP. Note that calling
142: * npxsave will NULL out PCPU(npxthread).
144: cmpl %ebx,PCPU(npxthread)
145: jne 1f
146: addl $PCB_SAVEFPU,%edx
147: pushl %edx
148: call npxsave /* do it in a big C function */
149: addl $4,%esp /* EAX, ECX, EDX trashed */
151: #endif /* NNPX > 0 */
154: * Switch to the next thread, which was passed as an argument
155: * to cpu_heavy_switch(). Due to the eflags and switch-restore
156: * function we pushed, the argument is at 12(%esp). Set the current
157: * thread, load the stack pointer, and 'ret' into the switch-restore
158: * function.
160: * The switch restore function expects the new thread to be in %eax
161: * and the old one to be in %ebx.
163: * There is a one-instruction window where curthread is the new
164: * thread but %esp still points to the old thread's stack, but
165: * we are protected by a critical section so it is ok.
167: movl 12(%esp),%eax /* EAX = newtd, EBX = oldtd */
168: movl %eax,PCPU(curthread)
169: movl TD_SP(%eax),%esp
173: * cpu_exit_switch()
175: * The switch function is changed to this when a thread is going away
176: * for good. We have to ensure that the MMU state is not cached, and
177: * we don't bother saving the existing thread state before switching.
179: * At this point we are in a critical section and this cpu owns the
180: * thread's token, which serves as an interlock until the switchout is
181: * complete.
185: * Get us out of the vmspace
187: movl IdlePTD,%ecx
188: movl %cr3,%eax
189: cmpl %ecx,%eax
190: je 1f
191: movl %ecx,%cr3
193: movl PCPU(curthread),%ebx
195: * Switch to the next thread. RET into the restore function, which
196: * expects the new thread in EAX and the old in EBX.
198: * There is a one-instruction window where curthread is the new
199: * thread but %esp still points to the old thread's stack, but
200: * we are protected by a critical section so it is ok.
202: movl 4(%esp),%eax
203: movl %eax,PCPU(curthread)
204: movl TD_SP(%eax),%esp
208: * cpu_heavy_restore() (current thread in %eax on entry)
210: * Restore the thread after an LWKT switch. This entry is normally
211: * called via the LWKT switch restore function, which was pulled
212: * off the thread stack and jumped to.
214: * This entry is only called if the thread was previously saved
215: * using cpu_heavy_switch() (the heavy weight process thread switcher),
216: * or when a new process is initially scheduled. The first thing we
217: * do is clear the TDF_RUNNING bit in the old thread and set it in the
218: * new thread.
220: * YYY theoretically we do not have to restore everything here, a lot
221: * of this junk can wait until we return to usermode. But for now
222: * we restore everything.
224: * YYY the PCB crap is really crap, it makes startup a bitch because
225: * we can't switch away.
227: * YYY note: spl check is done in mi_switch when it splx()'s.
232: movl TD_PCB(%eax),%edx /* EDX = PCB */
233: movl TD_PROC(%eax),%ecx
234: #ifdef DIAGNOSTIC
236: * A heavy weight process will normally be in an SRUN state
237: * but can also be preempted while it is entering a SZOMB
238: * (zombie) state.
240: cmpb $SRUN,P_STAT(%ecx)
241: je 1f
242: cmpb $SZOMB,P_STAT(%ecx)
243: jne badsw2
247: #if defined(SWTCH_OPTIM_STATS)
248: incl _swtch_optim_stats
251: * Tell the pmap that our cpu is using the VMSPACE now. We cannot
252: * safely test/reload %cr3 until after we have set the bit in the
253: * pmap (remember, we do not hold the MP lock in the switch code).
255: movl P_VMSPACE(%ecx), %ecx /* ECX = vmspace */
256: movl PCPU(cpuid), %esi
257: MPLOCKED btsl %esi, VM_PMAP+PM_ACTIVE(%ecx)
260: * Restore the MMU address space. If it is the same as the last
261: * thread we don't have to invalidate the tlb (i.e. reload cr3).
262: * YYY which naturally also means that the PM_ACTIVE bit had better
263: * already have been set before we set it above, check? YYY
265: movl %cr3,%esi
266: movl PCB_CR3(%edx),%ecx
267: cmpl %esi,%ecx
268: je 4f
269: #if defined(SWTCH_OPTIM_STATS)
270: decl _swtch_optim_stats
271: incl _tlb_flush_count
273: movl %ecx,%cr3
276: * Clear TDF_RUNNING flag in old thread only after cleaning up
277: * %cr3. The target thread is already protected by being TDF_RUNQ
278: * so setting TDF_RUNNING isn't as big a deal.
280: andl $~TDF_RUNNING,TD_FLAGS(%ebx)
281: orl $TDF_RUNNING,TD_FLAGS(%eax)
284: * Deal with the PCB extension, restore the private tss
286: movl PCB_EXT(%edx),%edi /* check for a PCB extension */
287: movl $1,%ebx /* maybe mark use of a private tss */
288: testl %edi,%edi
289: jnz 2f
292: * Going back to the common_tss. We may need to update TSS_ESP0
293: * which sets the top of the supervisor stack when entering from
294: * usermode. The PCB is at the top of the stack but we need another
295: * 16 bytes to take vm86 into account.
297: leal -16(%edx),%ebx
298: movl %ebx, PCPU(common_tss) + TSS_ESP0
300: cmpl $0,PCPU(private_tss) /* don't have to reload if */
301: je 3f /* already using the common TSS */
303: subl %ebx,%ebx /* unmark use of private tss */
306: * Get the address of the common TSS descriptor for the ltr.
307: * There is no way to get the address of a segment-accessed variable
308: * so we store a self-referential pointer at the base of the per-cpu
309: * data area and add the appropriate offset.
311: movl $gd_common_tssd, %edi
312: addl %fs:0, %edi
315: * Move the correct TSS descriptor into the GDT slot, then reload
316: * ltr.
319: movl %ebx,PCPU(private_tss) /* mark/unmark private tss */
320: movl PCPU(tss_gdt), %ebx /* entry in GDT */
321: movl 0(%edi), %eax
322: movl %eax, 0(%ebx)
323: movl 4(%edi), %eax
324: movl %eax, 4(%ebx)
325: movl $GPROC0_SEL*8, %esi /* GSEL(entry, SEL_KPL) */
326: ltr %si
330: * Restore general registers.
332: movl PCB_EBX(%edx),%ebx
333: movl PCB_ESP(%edx),%esp
334: movl PCB_EBP(%edx),%ebp
335: movl PCB_ESI(%edx),%esi
336: movl PCB_EDI(%edx),%edi
337: movl PCB_EIP(%edx),%eax
338: movl %eax,(%esp)
341: * Restore the user LDT if we have one
343: cmpl $0, PCB_USERLDT(%edx)
344: jnz 1f
345: movl _default_ldt,%eax
346: cmpl PCPU(currentldt),%eax
347: je 2f
348: lldt _default_ldt
349: movl %eax,PCPU(currentldt)
350: jmp 2f
351: 1: pushl %edx
352: call set_user_ldt
353: popl %edx
356: * Restore the %gs segment register, which must be done after
357: * loading the user LDT. Since user processes can modify the
358: * register via procfs, this may result in a fault which is
359: * detected by checking the fault address against cpu_switch_load_gs
360: * in i386/i386/trap.c
362: .globl cpu_switch_load_gs
364: movl PCB_GS(%edx),%gs
367: * Restore the DEBUG register state if necessary.
369: movb PCB_FLAGS(%edx),%al
370: andb $PCB_DBREGS,%al
371: jz 1f /* no, skip over */
372: movl PCB_DR6(%edx),%eax /* yes, do the restore */
373: movl %eax,%dr6
374: movl PCB_DR3(%edx),%eax
375: movl %eax,%dr3
376: movl PCB_DR2(%edx),%eax
377: movl %eax,%dr2
378: movl PCB_DR1(%edx),%eax
379: movl %eax,%dr1
380: movl PCB_DR0(%edx),%eax
381: movl %eax,%dr0
382: movl %dr7,%eax /* load dr7 so as not to disturb */
383: andl $0x0000fc00,%eax /* reserved bits */
384: pushl %ebx
385: movl PCB_DR7(%edx),%ebx
386: andl $~0x0000fc00,%ebx
387: orl %ebx,%eax
388: popl %ebx
389: movl %eax,%dr7
395: pushl $sw0_2
396: call panic
398: sw0_2: .asciz "cpu_switch: not SRUN"
401: * savectx(pcb)
402: * Update pcb, saving current processor state.
405: /* fetch PCB */
406: movl 4(%esp),%ecx
408: /* caller's return address - child won't execute this routine */
409: movl (%esp),%eax
410: movl %eax,PCB_EIP(%ecx)
412: movl %cr3,%eax
413: movl %eax,PCB_CR3(%ecx)
415: movl %ebx,PCB_EBX(%ecx)
416: movl %esp,PCB_ESP(%ecx)
417: movl %ebp,PCB_EBP(%ecx)
418: movl %esi,PCB_ESI(%ecx)
419: movl %edi,PCB_EDI(%ecx)
420: movl %gs,PCB_GS(%ecx)
422: #if NNPX > 0
424: * If npxthread == NULL, then the npx h/w state is irrelevant and the
425: * state had better already be in the pcb. This is true for forks
426: * but not for dumps (the old book-keeping with FP flags in the pcb
427: * always lost for dumps because the dump pcb has 0 flags).
429: * If npxthread != NULL, then we have to save the npx h/w state to
430: * npxthread's pcb and copy it to the requested pcb, or save to the
431: * requested pcb and reload. Copying is easier because we would
432: * have to handle h/w bugs for reloading. We used to lose the
433: * parent's npx state for forks by forgetting to reload.
435: movl PCPU(npxthread),%eax
436: testl %eax,%eax
437: je 1f
439: pushl %ecx
440: movl TD_PCB(%eax),%eax
441: leal PCB_SAVEFPU(%eax),%eax
442: pushl %eax
443: pushl %eax
444: call npxsave
445: addl $4,%esp
446: popl %eax
447: popl %ecx
449: pushl $PCB_SAVEFPU_SIZE
450: leal PCB_SAVEFPU(%ecx),%ecx
451: pushl %ecx
452: pushl %eax
453: call bcopy
454: addl $12,%esp
455: #endif /* NNPX > 0 */
461: * cpu_idle_restore() (current thread in %eax on entry) (one-time execution)
463: * Don't bother setting up any regs other then %ebp so backtraces
464: * don't die. This restore function is used to bootstrap into the
465: * cpu_idle() LWKT only, after that cpu_lwkt_*() will be used for
466: * switching.
468: * Clear TDF_RUNNING in old thread only after we've cleaned up %cr3.
470: * If we are an AP we have to call ap_init() before jumping to
471: * cpu_idle(). ap_init() will synchronize with the BP and finish
472: * setting up various ncpu-dependant globaldata fields. This may
473: * happen on UP as well as SMP if we happen to be simulating multiple
474: * cpus.
477: /* cli */
478: movl IdlePTD,%ecx
479: movl $0,%ebp
480: pushl $0
481: movl %ecx,%cr3
482: andl $~TDF_RUNNING,TD_FLAGS(%ebx)
483: orl $TDF_RUNNING,TD_FLAGS(%eax)
484: #ifdef SMP
485: cmpl $0,PCPU(cpuid)
486: je 1f
487: call ap_init
491: jmp cpu_idle
494: * cpu_kthread_restore() (current thread is %eax on entry) (one-time execution)
496: * Don't bother setting up any regs other then %ebp so backtraces
497: * don't die. This restore function is used to bootstrap into an
498: * LWKT based kernel thread only. cpu_lwkt_switch() will be used
499: * after this.
501: * Since all of our context is on the stack we are reentrant and
502: * we can release our critical section and enable interrupts early.
506: movl IdlePTD,%ecx
507: movl TD_PCB(%eax),%edx
508: movl $0,%ebp
509: movl %ecx,%cr3
510: andl $~TDF_RUNNING,TD_FLAGS(%ebx)
511: orl $TDF_RUNNING,TD_FLAGS(%eax)
512: subl $TDPRI_CRIT,TD_PRI(%eax)
513: popl %eax /* kthread exit function */
514: pushl PCB_EBX(%edx) /* argument to ESI function */
515: pushl %eax /* set exit func as return address */
516: movl PCB_ESI(%edx),%eax
517: jmp *%eax
520: * cpu_lwkt_switch()
522: * Standard LWKT switching function. Only non-scratch registers are
523: * saved and we don't bother with the MMU state or anything else.
525: * This function is always called while in a critical section.
527: * There is a one-instruction window where curthread is the new
528: * thread but %esp still points to the old thread's stack, but
529: * we are protected by a critical section so it is ok.
531: * YYY BGL, SPL
534: pushl %ebp /* note: GDB hacked to locate ebp relative to td_sp */
535: pushl %ebx
536: movl PCPU(curthread),%ebx
537: pushl %esi
538: pushl %edi
540: /* warning: adjust movl into %eax below if you change the pushes */
542: #if NNPX > 0
544: * Save the FP state if we have used the FP. Note that calling
545: * npxsave will NULL out PCPU(npxthread).
547: * We have to deal with the FP state for LWKT threads in case they
548: * happen to get preempted or block while doing an optimized
549: * bzero/bcopy/memcpy.
551: cmpl %ebx,PCPU(npxthread)
552: jne 1f
553: movl TD_PCB(%ebx),%edx /* EDX = PCB */
554: addl $PCB_SAVEFPU,%edx
555: pushl %edx
556: call npxsave /* do it in a big C function */
557: addl $4,%esp /* EAX, ECX, EDX trashed */
559: #endif /* NNPX > 0 */
561: movl 4+20(%esp),%eax /* switch to this thread */
562: pushl $cpu_lwkt_restore
563: movl %esp,TD_SP(%ebx)
564: movl %eax,PCPU(curthread)
565: movl TD_SP(%eax),%esp
568: * eax contains new thread, ebx contains old thread.
573: * cpu_lwkt_restore() (current thread in %eax on entry)
575: * Standard LWKT restore function. This function is always called
576: * while in a critical section.
578: * Warning: due to preemption the restore function can be used to
579: * 'return' to the original thread. Interrupt disablement must be
580: * protected through the switch so we cannot run splz here.
582: * YYY we theoretically do not need to load IdlePTD into cr3, but if
583: * so we need a way to detect when the PTD we are using is being
584: * deleted due to a process exiting.
587: movl IdlePTD,%ecx /* YYY borrow but beware desched/cpuchg/exit */
588: movl %cr3,%edx
589: cmpl %ecx,%edx
590: je 1f
591: movl %ecx,%cr3
593: andl $~TDF_RUNNING,TD_FLAGS(%ebx)
594: orl $TDF_RUNNING,TD_FLAGS(%eax)
596: popl %edi
597: popl %esi
598: popl %ebx
599: popl %ebp