DragonFly users List (threaded) for 2008-06
Re: HAMMER lockup
:> HAMMER is reserving space in the strategy_write() code and must
:> also allocate a memory-record to placemark the operation. This means
:> HAMMER must do various getblks and breads. The buffer cache is
:> supposed to have enough clean buffers to satisfy those operations but
:> sometimes it doesn't.
:> Matthew Dillon
:(you forgot CC'ing users@)
:Thanks, I'll try it out as soon as you commit it.
:By the way, for some reason rtorrent is a great filesystem testing
:application. There has been an ext3 bug caught with it and I also
:caught a bug in FreeBSD's ZFS implementation with rtorrent, and now
:Gergo Szakal MD <email@example.com>
If you could email me your rtorrent rc and a config file I can use
to test with I'd appreciate it.
I'm making good progress. I've fixed another 3 deadlocks in my
local tree and I will commit them tonight. I can't commit them now
because one of the fixes also involved a major rewrite of the low
level storage allocator. The media format is still the same, but
I had to carefully reorder the way the blockmap lock is handled
and the change is too dangerous to commit without at least a good
day's worth of testing.
I was scratching my head wondering how, with all the work I have done,
the buffer cache could STILL get stuck in "newbuf". Turns out I was
chasing my tail. I had changed HAMMER's VOP_WRITE last week to not
block if there were too many dirty buffers in the buffer cache
when called from the pageout daemon. The idea was that not blocking
would prevent HAMMER from deadlocking the pageout daemon. The
result was that the pageout daemon happily queued out so many dirty
pages that the buffer cache filled up with dirty buffers and
deadlocked against other processes trying to read data from disk
instead. HAMMER needs to be able to issue I/O reads in order to
reserve the space needed for the writes so, boom, it deadlocked.
So now I've fixed that, but it means I have to deal with potential
vnode deadlocks. If I focus on getblk()/bread() not getting stuck
in "newbuf" that should break the chain reaction. VOP_READ will
not get stuck, then. But there still may be cases where a kmalloc()
gets stuck holding a vnode lock which then prevents the pageout
daemon from being able to page-out pages from that vnode.
It's a big merry-go-round involving careful attention to what
locks are needed for what operation. I feel like I've been working
on this problem for 10+ years now :-(.