DragonFly BSD
DragonFly bugs List (threaded) for 2004-06
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Boot hangs starting postfix


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 14 Jun 2004 13:58:24 -0700 (PDT)

    Ah shoot, too bad.  Definitely save the postfix state (just tar it up)
    the next time it happens.

    However, we have gleaned a lot of information about this issue from your
    reports.

    We know it's a livelock issue rather then a deadlock issue, because
    otherwise the addition of the tsleep would not have allowed you to
    ssh in or kill or otherwise signal the postfix processes.

    We know it's a livelock issue because the scheduler is not getting a
    chance to deschedule the postfix processes that are bouncing between
    each other, which likely means that the livelock is occuring in the 
    kernel and neither process is returning to usermode.

    We know it's not stuck in a critical section because interrupts still
    work.

    The output lines you were getting continuously provided a surprise...
    I expected the process to be passed as 'owner' but it looks like 
    &proc0 is passed, and the flags indicate F_WAIT|F_FLOCK, so we know
    the issue is occuring with flock() rather then with POSIX locks, which
    really narrows down the code cases.

    So we know a lot now even though we haven't found the smoking gun yet.

					-Matt
					Matthew Dillon 
					<dillon@xxxxxxxxxxxxx>

:[joseph, if you happen to have corrupted messages in the queue, please don't
:remove it before giving other people a clue to fix this lock up.]
:
:Yes, this at least keeps ssh alive, and the following messages repeated
:until I removed corrupted messages (files in /var/spool/postfix/corrupt/)
:
:Jun 14 10:46:55 fred /kernel: lf_setlock: 0xcf8df6d4 pid 0 type 3 flags 00000030
: [00000000,7fffffffffffffff]
:Jun 14 10:46:55 fred /kernel: lf_setlock: 0xcf8df7f4 pid 0 type 3 flags 00000030
: [00000000,7fffffffffffffff]
:
:I was so stupid that I didn't keep the corrupted messages, and
:now older kernel(without your patch) doesn't lock up anymore!
:Just creating pair of empty files in the corrupt/ directory doesn't
:reproduce it.



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]