DragonFly BSD
DragonFly kernel List (threaded) for 2009-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: hammer errors

From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 10 Nov 2009 08:41:18 -0800 (PST)

:ever since the last time I had CRC problems on my router box, I've 
:developed the habit of doing a daily 'hammer -f /dev/ad4s1d show |& grep 
:"^B"' to see if any new errors crept up, and today I found:
:yoyodyne# hammer -f /dev/ad4s1d show |& grep "^B"
:B                dataoff=a00000714d120000/65536 crc=7e4f7545
:B                dataoff=a000007171380000/65536 crc=616b1cc1

    The question is whether it is real or not.  If the filesystem is
    mounted live then the show command could be catching things in
    odd states.

:Console log for the recent days is:
:Nov  7 03:15:19 <kern.crit> yoyodyne kernel: HAMMER: Warning: rebalance 
:caught race against propagate

    None of those are serious.  Basically just debug messages that will
    be removed soon.  The emergency page allocation for BIO is unrelated
    to the filesystem code.  It's also actually just a warning (telling me
    that something is eating too many free VM pages).

:So my question is: What are my next steps in order to help resolve this 
:issue? Is there any way to get e.g. to the names of the files affected 
:by this problem from the data which is output by 'hammer show'?
:So far the only thing I've done is to disable nightly hammer cleanup 
:because DragonFly, upon encountering a CRC error, will unfortunately 
:simply drop to the debugger without panicing, so this doesn't get caught 
:by DDB_UNATTENDED as far as I can tell (Matt, are there any plans to 
:change this unpleasant behavior?). And I won't be near that box until 
:next weekend.

    I fixed the behavior in current.  There is now a sysctl which
    controls whether it drops into the debugger or not (and it does not
    by default).  Though it doesn't panic... maybe the sysctl should be
    modified to give it the ability to panic instead of propagating an
    error code up the call chain.  The filesystem still drops into
    read-only mode if an error is encountered.

    What you want to do now is run 'hammer -f ... show | less -B' and
    search for B, as in '/^B'.  less -B uses a fixed buffer so if you
    scroll down you basically cannot scroll back up (by much), which allows
    you to pipe gigabytes and gigabytes of text through it without it
    malloc()ing itself into oblivion.  You want to try to find the problem
    area and get more context out of it, such as the object id.  And also
    to determine whether the problem area is real or not.

    Again the filesystem has to be idle and it would be even better if it
    were offline entirely.

					Matthew Dillon 

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]