DragonFly kernel List (threaded) for 2008-05
DragonFly BSD
DragonFly kernel List (threaded) for 2008-05
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: HEADS UP - Final HAMMER on-disk structural changes being made today

From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 5 May 2008 21:03:46 -0700 (PDT)

:I guess this is probably a good time to ask.
:Have you thought at all about how HAMMER will scale to the current 
:of NAND-based disks, and solid state storage in general? From my high level
:understanding of the media and HAMMER in general I can hypothesize that it
:looks like it will map fairly well in many respects, at least, much better 
:than most/
:any traditional filesystem intended for magnetic storage. Purpose-specific 
:filesystems all gravitate toward being log-based and do buffering and 
:in a fairly similar manner to HAMMER. Current NAND (as far as I understand)
:chips also operate on 16k blocks (you have to issue an erasure before a 
:There are various other things that these purpose-specific filesystems try 
:to do
:which is more media-specific, like spreading erasures/writes evenly over the 
:in an attempt to extend the life of the disk, re-mapping bad blocks, etc. 
:LogFS,JFS2,YAFFS). Considering how well much of this seems to map to
:HAMMER's on-disk layout, and considering any of the extra higher-level bits
:could likely be integrated "when the time is right" without much pain. It 
:made me
:wonder if you took solid state disks into consideration or if things just 
:worked out that way. Also, if you have any plans or ideas to see HAMMER be
:performant on SSD's?
:I wanted to go into more detail here and a couple weeks ago I even emailed
:a couple of SSD manufacturers to inquire as to whether there were any (or
:proposed) standards or methods of device inspection. For attempting to do
:things like block allocation to exploit per-nand-chip performance, etc.
:(None of them have gotten back to me). At any rate, a higher level email is
:probably more palatable anyway :)

    Well I've thought about it quite a bit over the last month or two
    and I think the answer is that HAMMER would not scale any better
    then, say, UFS.

    The reason is that even though HAMMER uses 16K blocks and even though
    HAMMER doesn't 'delete' data or overwrite file data, it *DOES* modify
    meta-data in-place.  In addition to modifying meta-data in-place
    HAMMER also uses an UNDO log for the meta-data, and it tracks
    allocations and frees in the blockmaps in-place as well.

    The UNDO log itself is fairly small... actually very small because
    it does not contain file data (which doesn't get modified in-place),
    only meta data.

    But things like B-Tree and record elements do get modified in-place
    in HAMMER, and that means it will wind up having to do block replacement
    at least as often as something like UFS would on a SSD.


    But I'll also put forth the fact that insofar as SSDs go, and NAND
    in particular, you have to think about it from two different points
    of view:

    (1) A storage subsystem with limited or no static-ram caching.

    (2) A storage subsystem with extensive ram (either battery backed
	or NOT battery backed), for caching purposes.

    In the first case, if no front-end caching is available, then the only
    way to get performance out of NAND is to write a filesystem explicitly
    built to NAND.  

    In the second case, when front-end caching is available, I personally
    do not believe that it matters as much.  With even a moderate amount
    of caching you can run any standard filesystem on top and still eek
    out most of the performance.  Maybe not 100% of the performance a
    custom filesystem would give you, but it would be fairly close.


    Let me give you an example of case (2), particularly as it applies to

    Remember that UNDO log I mentioned?  For meta-data?  Well, lets say 
    I wanted to improve HAMMER's performance on SSD/NAND devices.  The
    big problem is all the in-place modification of the meta-data.

    But I have that UNDO log.  And a memory cache.  HMMM.  So, what if I
    logged not only the UNDO information, but also the DO information.  That
    is, if HAMMER had to update a field in some meta-data somewhere it
    would lay down an UNDO record with the old contents of that field,
    and I would also have it lay down a record with the NEW contents
    of that same field.

    If I were to do that, then I wouldn't actually have to update the
    meta-data on-media for a very long period of time.  Instead I could
    simply cache the modifie meta-data bufers in memory and lay down 
    the UNDO+DO records on the flash.  If the system crashes or is shutdown,
    I don't have to flush the dirty buffers in memory.  All I would have 
    to do is when the system is booted up again and the filesystem mounted,
    I would simply play back the UNDO+DO records and regenerate the dirty
    buffers in memory.

    Eventually the meta-data would have to be flushed to the media,
    meaning block replacement of course, but with a reasonable amount of
    memory a great deal of work could be cached before that had to happen
    which means that multiple meta-data modifications could build up and
    be written out far more efficiently.

    So that is an example of case (2), where memory is available for
    caching.   Such a scheme would not work in case (1), where very little
    memory is available for caching.  I would theorize a significant
    increase in performance on NAND/SSD devices were I to make that change.


    You also have to consider the relative value of writing a filesystem
    completely from scratch designed explicitly for NAND.  I personally
    do not have much of an interest in designing such a beast.  If I were
    to do it I would keep it simple stupid, with none of the bells and
    whistles you see in HAMMER.  It would be a turnkey product designed
    for applications which are aware they are running on a flash-backed

					Matthew Dillon 

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]