DragonFly kernel List (threaded) for 2004-04
Re: pipe testing and kernel copyin/copyout/bcopy performance
:Matthew Dillon wrote:
:> Just to let people know, in case anyone is wondering why I have been so
:> quiet lately :-)
:> I've been running some major pipe benchmarks to compare various pipe
:> optimizations as part of a paper (FreeBSD's) Alan Cox and I are writing.
:> At the same time I've delved deeply into the AMD64 and have been working
:> on optimizing the kernel bcopy, memcpy, copyin, and copyout to use
:> XMM instructions when possible.
: Didn't you mention to me something about FPU context switch
: overhead? Secondly, wouldn't the XMM based copyin, bcopy etc
: make small transfers slow?
Yah, the code has a check for small copies and just runs an integer
loop. xmm/mxx is only beneficial for larger buffers.
That said, I came up with a neat solution that allows the kernel to
avoid the fxsave/fxrstore. Since the kernel is likely to make multiple
copyout() calls to break down larger buffers, and (when dealing with
larger buffers) the userland code is not likely to execute any FP ops
in the core of the read/write loop, I have the kernel's optimized
bcopy/copyin/copyout code save off the FP state from userland and *not*
attempt to restore it. That is, userland will take a fault to restore
its fpstate in that particular situation. This means that multiple
entries into the kernel can be made and/or the kernel can make multiple
bcopy/copyin/copyout calls (w/ buffers > 2K) and use the FP registers
at the cost of only a single fxsave.
I should be able to commit that tomorrow. I basically rewrote nearly
all of i386/i386/support.s, and broke-out the zeroing and copying
routines into their own .s files.
This allows the kernel to use the FP registers with basically only
an 'fninit' call, and it would even be possible to avoid that with some
additional logic. Unfortunately, there are a lot of other overheads
involved that, while small, do add up. The minimum buffer size
where kernel use of FP registers begins to make sense is around