DragonFly bugs List (threaded) for 2004-06
Re: Boot hangs starting postfix
Ah shoot, too bad. Definitely save the postfix state (just tar it up)
the next time it happens.
However, we have gleaned a lot of information about this issue from your
We know it's a livelock issue rather then a deadlock issue, because
otherwise the addition of the tsleep would not have allowed you to
ssh in or kill or otherwise signal the postfix processes.
We know it's a livelock issue because the scheduler is not getting a
chance to deschedule the postfix processes that are bouncing between
each other, which likely means that the livelock is occuring in the
kernel and neither process is returning to usermode.
We know it's not stuck in a critical section because interrupts still
The output lines you were getting continuously provided a surprise...
I expected the process to be passed as 'owner' but it looks like
&proc0 is passed, and the flags indicate F_WAIT|F_FLOCK, so we know
the issue is occuring with flock() rather then with POSIX locks, which
really narrows down the code cases.
So we know a lot now even though we haven't found the smoking gun yet.
:[joseph, if you happen to have corrupted messages in the queue, please don't
:remove it before giving other people a clue to fix this lock up.]
:Yes, this at least keeps ssh alive, and the following messages repeated
:until I removed corrupted messages (files in /var/spool/postfix/corrupt/)
:Jun 14 10:46:55 fred /kernel: lf_setlock: 0xcf8df6d4 pid 0 type 3 flags 00000030
:Jun 14 10:46:55 fred /kernel: lf_setlock: 0xcf8df7f4 pid 0 type 3 flags 00000030
:I was so stupid that I didn't keep the corrupted messages, and
:now older kernel(without your patch) doesn't lock up anymore!
:Just creating pair of empty files in the corrupt/ directory doesn't