DragonFly bugs List (threaded) for 2009-11
Re: panic: assertion: leaf->base.obj_id == ip->obj_id in hammer_ip_delete_range
:On Thu, Oct 22, 2009 at 06:19:01PM -0700, Matthew Dillon wrote:
:> fetch http://apollo.backplane.com/DFlyMisc/hammer06.patch
:It seems like btree_remove() sets cursor->parent to NULL in its
:recursion path starting at hammer_btree.c:2226 but somehow returns 0
:which ends up hitting the first KKASSERT() in hammer_cursor_removed_node().
In your vkernel panic. By the way, the panic message will be correct
but the symbols in the backtrace are clearly messed up. For some
reason the vkernel reports the symbols incorrectly, I don't know
But given that panic message:
panic: assertion: parent != NULL in hammer_cursor_removed_node
The only possible path is via btree_remove(). I'm a bit at a loss
here. I don't see how that panic could still occur with the recent
patches. I went through your emails again and found this comment:
"By the way, I caught a different panic on vkernel. I think the last
I ran `hammer cleanup' on /usr/obj was before applying hammer05.patch
or hammer06.patch to vkernel."
Was that vkernel backtrace a pre-patch panic?
Going back to crash dumps .10 and .11 which paniced at:
panic: assertion: s <= 0 in hammer_btree_iterate
I think I see a possible issue. If hammer_btree_remove() fails with
EDEADLK hammer_btree_delete() ignores the error on line 897. This
is correct, we WANT to ignore the error because it is ok for
hammer_btree_remove()'s recursion to fail... it just means we could
not recursively delete the internal nodes to get rid of the empty
leaf. The leaf is simply left empty.
However I think this opens up an error path where the cursor can wind
up in a bad state when EDEADLK is returned and lead to the assertion.
It's just a guess at the moment. The normal case is clearly not
causing any problems, otherwise you'd get the panic instantly. It
takes a cpu/disk load and time to cause the panic to occur so it has
to be in the EDEADLK handling somewhere.
Another possibility is via hammer_btree_do_propagation(), which is also
called indirectly inside that loop. This code pushes the cursor,
does some work, then pops the cursor. But pushing a cursor unlocks it,
so some other third party operation can wind up adjusting it. It
is possible that some other deletion caused the node under the cursor
to be removed, causing the cursor to be adjusted to the parent node.
If the node that was removed was a node under the root node, then the
new cursor->node will become the root node and the cursor->parent will
become NULL. This should be ok (we are talking about the s <= 0 panic
here, not the cursor->parent != NULL panic).
But I'm thinking one of the above two conditions is causing the cursor
to get whacked out of shape badly enough to hit the s <= 0 assertion
in the iteration.
For the assertion to fail the cursor would have to be indexed to
BEFORE the beginning of the range. This can only occur if the
cursor gets adjusted while unlocked or is munged beyond hope in
the hammer_btree_do_propagation() or hammer_btree_remove() sequence.
I haven't found the smoking gun yet. This code is terribly complex.