DragonFly kernel List (threaded) for 2008-05
Re: HEADS UP - Final HAMMER on-disk structural changes being made today
:I guess this is probably a good time to ask.
:Have you thought at all about how HAMMER will scale to the current
:of NAND-based disks, and solid state storage in general? From my high level
:understanding of the media and HAMMER in general I can hypothesize that it
:looks like it will map fairly well in many respects, at least, much better
:any traditional filesystem intended for magnetic storage. Purpose-specific
:filesystems all gravitate toward being log-based and do buffering and
:in a fairly similar manner to HAMMER. Current NAND (as far as I understand)
:chips also operate on 16k blocks (you have to issue an erasure before a
:There are various other things that these purpose-specific filesystems try
:which is more media-specific, like spreading erasures/writes evenly over the
:in an attempt to extend the life of the disk, re-mapping bad blocks, etc.
:LogFS,JFS2,YAFFS). Considering how well much of this seems to map to
:HAMMER's on-disk layout, and considering any of the extra higher-level bits
:could likely be integrated "when the time is right" without much pain. It
:wonder if you took solid state disks into consideration or if things just
:worked out that way. Also, if you have any plans or ideas to see HAMMER be
:performant on SSD's?
:I wanted to go into more detail here and a couple weeks ago I even emailed
:a couple of SSD manufacturers to inquire as to whether there were any (or
:proposed) standards or methods of device inspection. For attempting to do
:things like block allocation to exploit per-nand-chip performance, etc.
:(None of them have gotten back to me). At any rate, a higher level email is
:probably more palatable anyway :)
Well I've thought about it quite a bit over the last month or two
and I think the answer is that HAMMER would not scale any better
then, say, UFS.
The reason is that even though HAMMER uses 16K blocks and even though
HAMMER doesn't 'delete' data or overwrite file data, it *DOES* modify
meta-data in-place. In addition to modifying meta-data in-place
HAMMER also uses an UNDO log for the meta-data, and it tracks
allocations and frees in the blockmaps in-place as well.
The UNDO log itself is fairly small... actually very small because
it does not contain file data (which doesn't get modified in-place),
only meta data.
But things like B-Tree and record elements do get modified in-place
in HAMMER, and that means it will wind up having to do block replacement
at least as often as something like UFS would on a SSD.
But I'll also put forth the fact that insofar as SSDs go, and NAND
in particular, you have to think about it from two different points
(1) A storage subsystem with limited or no static-ram caching.
(2) A storage subsystem with extensive ram (either battery backed
or NOT battery backed), for caching purposes.
In the first case, if no front-end caching is available, then the only
way to get performance out of NAND is to write a filesystem explicitly
built to NAND.
In the second case, when front-end caching is available, I personally
do not believe that it matters as much. With even a moderate amount
of caching you can run any standard filesystem on top and still eek
out most of the performance. Maybe not 100% of the performance a
custom filesystem would give you, but it would be fairly close.
Let me give you an example of case (2), particularly as it applies to
Remember that UNDO log I mentioned? For meta-data? Well, lets say
I wanted to improve HAMMER's performance on SSD/NAND devices. The
big problem is all the in-place modification of the meta-data.
But I have that UNDO log. And a memory cache. HMMM. So, what if I
logged not only the UNDO information, but also the DO information. That
is, if HAMMER had to update a field in some meta-data somewhere it
would lay down an UNDO record with the old contents of that field,
and I would also have it lay down a record with the NEW contents
of that same field.
If I were to do that, then I wouldn't actually have to update the
meta-data on-media for a very long period of time. Instead I could
simply cache the modifie meta-data bufers in memory and lay down
the UNDO+DO records on the flash. If the system crashes or is shutdown,
I don't have to flush the dirty buffers in memory. All I would have
to do is when the system is booted up again and the filesystem mounted,
I would simply play back the UNDO+DO records and regenerate the dirty
buffers in memory.
Eventually the meta-data would have to be flushed to the media,
meaning block replacement of course, but with a reasonable amount of
memory a great deal of work could be cached before that had to happen
which means that multiple meta-data modifications could build up and
be written out far more efficiently.
So that is an example of case (2), where memory is available for
caching. Such a scheme would not work in case (1), where very little
memory is available for caching. I would theorize a significant
increase in performance on NAND/SSD devices were I to make that change.
You also have to consider the relative value of writing a filesystem
completely from scratch designed explicitly for NAND. I personally
do not have much of an interest in designing such a beast. If I were
to do it I would keep it simple stupid, with none of the bells and
whistles you see in HAMMER. It would be a turnkey product designed
for applications which are aware they are running on a flash-backed