DragonFly kernel List (threaded) for 2011-07
Re: Blogbench RAID benchmarks
Ok, well this is interesting. Basically it comes down to whether we
want to starve read operations or whether we want to starve write
The FreeBSD results starve read operations, while the DragonFly results
starve write operations. That's the entirety of the difference between
the two tests.
The final numbers don't do justice to this... if you look at the
raw numbers though it is apparent. When the blogbench test blows out
system caches the read activity on FreeBSD drops into the ~600 range
while on DragonFly the read activity drops to the ~25000 range. At
the same time FreeBSD's write activity stays in the ~4000 range while
DragonFly's write activity drops into the ~50's.
I tracked the reason for the DragonFly write activity dropping. It
basically comes down to the backlog of inodes in HAMMER needing
reclamation. Due to the heavy concurrent read load the HAMMER flusher
is constantly stuck in B-Tree locks and cannot flush inode meta-data
out quickly enough to keep up with blogbench. Once it hits the inode
backlog limit (25000) write throughput goes down drastically.
While one can increase the limit (vfs.hammer.limit_reclaim), all that
happens is that HAMMER takes a little longer before it hits it, at
least in the blogbench test. For more bursty bulk write operations
increasing the limit would be a good tuning parameter.
Frankly both FreeBSDs and DragonFlys results are incorrect. FreeBSD is
killing read performance way way way too much while DragonFly is killing
write performance way way way too much.
I'm not sure how it could be fixed, though. I can definitely reduce
B-Tree deadlocks in HAMMER by unlocking b-tree nodes during synchronous
read I/O (for meta-data), but the result that we really want is more
balanced read vs write performance, not these extreme tilts that we see.
Also note that blogbench's 'final' results are worthless. The read
performance is mostly counting the pre-cache-blowout numbers. DragonFly's
read performance is 41x FreeBSD's once the caches are blown out,
while FreeBSD's write performance is 80x DragonFly's write performance
once the caches are blown out. Reads tend to be less localized than
writes so, generally speaking, the disk bandwidth *IS* being used fairly
efficiently in both cases. But neither result is really acceptable
This is all with swapcache turned off. The only way to test in a
fair manner with swapcache turned on (with a SSD) is if the FreeBSD
test used a similar setup w/ZFS.