DragonFly kernel List (threaded) for 2010-05
Re: VM idle page zeroing
:On a 2.4 GHz P4 with 2GB of RAM - buildkernel took 1048 sec on a kernel
:immediately preceding the idlezero patch; on a kernel with the idlezero
:patch but with it disabled, 1054 sec; with idlezero on, 1051 sec; with
:idlezero and nocache on, 1052 sec. So as to whether it improved
:performance, 'too close to call'.
:In the non-idlezero and idlezero runs, there was ~6.5M zero-fill faults;
:less than 1% were ozfod (found a zero page available) in the former,
:approximately 40% in the latter. At various points during the build in the
:idlezero case, we just ran out of zero pages and it would be some time
:before they were restored.
:Not sure what to make of this.
You can calculate the actual time. Compile and run the mbwtest
program from /usr/src/test/sysperf/mbwtest.c and use the non-cache
bandwidth, then multiply by 6.5m x 4k. On my test box non-cache
bandwidth is 4672 MB/sec, so:
6.5m x 4k / 4672 mb/sec = 5.6 seconds.
If only 40% of those are pre-zerod then the difference will only
be 2.8 seconds best case. And that's only if the execution of the
build is completely serialized and baring other issues.
If pre-zeroing makes a difference anywhere it will be with serialized
programs... programs which do not use parallelism and take a lot of
faults. Shell scripts might be one example. Perhaps a pkgsrc build.
Perhaps application startup (like maybe firefox). Regardless it would
take a considerable load to see anything noticeable.
Another thing to note is that an inline zfod fault zeros the page
through the cache, meaning the bytes in the page will already be
'hot' on return from the fault. Since the program is about to access
the page anyway a lot of this overhead winds up being useful to the
program as the page will already be in the cpu cache.
When the program uses a pre-zerod page instead the contents of the
page will not be 'hot' on return from the fault and the program itself
will have to load the data from ram into the cpu cache. Not only
that but it actually has to issue memory reads to the ram, whereas in
the inline zeroing case the zeroing operation itself which 'hot's the
cache issues only writes.
This eats away at the advantages of pre-zeroing as well.
I do think there are still going to be cases where pre-zeroing does
in fact help, and it certainly doesn't hurt, so it is probably worth
running it at a low or medium burn rate. You can mess around with the
parameters to remove as much of the downside as possible.
pre-zeroing might also do much, much better on machines with very
low memory bandwidths, such as (possibly) netbooks.