DragonFly kernel List (threaded) for 2011-07
Re: Blogbench RAID benchmarks
Since Dragonfly has high res timers...
Would it make any sense to use two different metrics to help balance the
loading of the read and write pipelines? In some sense, we want to keep
the full pipeline (read and write), the disk I/O path utilized, but we
to starve one type of request for the other. We want all I/O requests to
complete in a reasonable amount of time, both on controller, and while waiting
to get onto a disk queue.
Can we use the timer to timestamp requests, both when they first get
constructed, and also when they hit the I/O controller queue, as well as when
they get completed, to help balance the I/O load and latency in an acceptable
Note, if we have such timestamps, it would enable us to also make smarter
decisions in certain situations (which mirror disk do I issue a
read/write to first?),
and may allow us to keep summary statistics about the performance of a disk,
and complain if things get too far outta whack.
Naturally, if hammer (or anything) is generating a certain type of I/O
in the course
of servicing some benchmark, it's going to be rather hard to have the
I/O path be
able to be fair, being overwhelmed with one type of I/O, which may be necessary
for the upper layers (hamer, etc) to make forward progress.
On Mon, Jul 18, 2011 at 8:06 PM, Matthew Dillon
> Ok, well this is interesting. Basically it comes down to whether we
> want to starve read operations or whether we want to starve write
> The FreeBSD results starve read operations, while the DragonFly results
> starve write operations. That's the entirety of the difference between
> the two tests.
> The final numbers don't do justice to this... if you look at the
> raw numbers though it is apparent. When the blogbench test blows out
> system caches the read activity on FreeBSD drops into the ~600 range
> while on DragonFly the read activity drops to the ~25000 range. At
> the same time FreeBSD's write activity stays in the ~4000 range while
> DragonFly's write activity drops into the ~50's.
> I tracked the reason for the DragonFly write activity dropping. It
> basically comes down to the backlog of inodes in HAMMER needing
> reclamation. Due to the heavy concurrent read load the HAMMER flusher
> is constantly stuck in B-Tree locks and cannot flush inode meta-data
> out quickly enough to keep up with blogbench. Once it hits the inode
> backlog limit (25000) write throughput goes down drastically.
> While one can increase the limit (vfs.hammer.limit_reclaim), all that
> happens is that HAMMER takes a little longer before it hits it, at
> least in the blogbench test. For more bursty bulk write operations
> increasing the limit would be a good tuning parameter.
> Frankly both FreeBSDs and DragonFlys results are incorrect. FreeBSD is
> killing read performance way way way too much while DragonFly is killing
> write performance way way way too much.
> I'm not sure how it could be fixed, though. I can definitely reduce
> B-Tree deadlocks in HAMMER by unlocking b-tree nodes during synchronous
> read I/O (for meta-data), but the result that we really want is more
> balanced read vs write performance, not these extreme tilts that we see.
> Also note that blogbench's 'final' results are worthless. The read
> performance is mostly counting the pre-cache-blowout numbers. DragonFly's
> read performance is 41x FreeBSD's once the caches are blown out,
> while FreeBSD's write performance is 80x DragonFly's write performance
> once the caches are blown out. Reads tend to be less localized than
> writes so, generally speaking, the disk bandwidth *IS* being used fairly
> efficiently in both cases. But neither result is really acceptable
> This is all with swapcache turned off. The only way to test in a
> fair manner with swapcache turned on (with a SSD) is if the FreeBSD
> test used a similar setup w/ZFS.
> Matthew Dillon