DragonFly BSD
DragonFly bugs List (threaded) for 2009-10
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Re: panic: assertion: leaf->base.obj_id == ip->obj_id in hammer_ip_delete_range

From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 18 Oct 2009 12:53:47 -0700 (PDT)

:It's at ~y0netan1/crash/{kernel,vmcore}.9 on leaf.

    The file it blew up on was a nohistory-flagged file being deleted
    about 2 seconds after it was created.  The parts of the B-Tree that
    the hammer cursor was sitting on look fine, so it's a software bug
    in the iteration somewhere that the assertion caught.

    I think I may have found the issue but it's a very rare and difficult
    edge case so I just can't be sure yet.  I will need to do some testing
    before I'm even willing to post the patch.

    Basically we have this situation in the B-Tree leaf:

    F1 = records related to file 1  (cached records not yet inserted)
    F2 = records related to file 2  (cached records not yet deleted,
				     and not on media)
    F3 = records related to file 3  (records on-media)

    We are trying to delete F2, so hammer_ip_delete_range() is called
    for F2's records.  There happen to be no records on-disk but it
    doesn't know that yet.

	F3 F3 F3
	^ index

    The iteration looking for records for F2 is essentially at the end,
    having not found any records for F2.  But then before the iteration
    can finish the following happens:

	* All the records for F3 are deleted due to the rm -rf (i.e.
	  some other inode flushing its deletion at the same time F2

	* Because the node is now empty the btree tries to remove the
	  entire node.

	* This pushes the cursor for the F2 delete_range iterations up to
	  the parent.

	* BUT btree_remove() is unsucessful at removing the entire node
	  due to hitting a deadlock, so now the cursor for F2 is pointing
	  at the parent instead of pointing at the node.

	* Records for file F1 are inserted via another unrelated parallel
	  flush op.  Since the node could not be deleted the records
	  are inserted in the node.  And since the cursor for F2 is now
	  pointing at the parent, it is effectively pointing at the
	  records for F1 (which are indexed prior to the F2 cursor's begin

	* On the next iteration of the cursor it pushes down from the parent
	  node to the node that could not deleted and finds the records for
	  F1.  This is the bug.  The cursor for F2's deletion is now indexed
	  at F1 which is prior to the cursor begin point.

	F1 F1 F1
	^ index for F2's iteration

	F1 F1 F1
	         ^ index for F2's iteration

    The iteration code assumes that the index already points to a record
    greater or equal to the starting point, so it only checks that the index
    has not gone past the ending point.  But due to the insertion of the
    records for F1 the index is now pointing to a point BEFORE the starting
    point of the range we want to delete.

    The KKASSERT catches the situation and quite properly panics the system
    in order to avoid corrupting the on-disk media.

    The solution is basically that I cannot call hammer_cursor_removed_node()
    until the node is actually removed.  That is, if F2's cursor is still
    pointing at the node instead of the parent, then when F1's records are
    inserted F2's cursor's node index will be incremented and stay past them,
    instead of pointing at the parent which then pushes down into the node
    and 'finds' the F1 records.

    This creates a fairly severe complication in btree_remove() due to the
    depth-first recursion it does.  The patch looks trivial but if I make
    a mistake we wave goodbye to the B-Tree.  Hence I have to carefully
    test the patch before I'm willing to even post it.  So... it is going
    to be a day or two I think.

    I think this bug has been hit occassionally by several people over the
    last year, including myself.  Fortunately that assertion prevented
    the media from getting corrupted (at the cost of panicing the system :-)).

					Matthew Dillon 

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]