DragonFly bugs List (threaded) for 2009-10
Re: many processes stuck in "hmrrcm", system unusable
:Shouldn't we rather try to fix the issue, i.e. make hammer work just a
:little bit performant and capable of concurrent use? I think now that
:the code is stable we should start investigating performance (latency)
:issues and address them.
I think the main culprit here is the background flusher. With UFS
any modifying operations can block the process context responsible
for them. With HAMMER *ALL* modifying operations are asynchronous
and do not block the process context responsible for them. Thus when
resources reach their limit, ANY process trying to make a modification
or even just load a new inode (hmrrcm) winds up taking a hit instead
of the one process that was responsible for eating up all the resources
in the first place.
These limits are quickly hit when rm -rf'ing or tar extracting tens of
thousands of files, but otherwise typically not hit.
In both cases the disk winds up being banged up, but with UFS it is
easier to prevent the resource starvation issue from bleeding over
into other processes. HAMMER can't really distinguish between
modifying operations belonging to a heavy handed process verses
modifying operations incidental to processes which otherwise have
a light touch.
I do believe it is possible to solve the problem, but it isn't a
quick fix. Essentially we have to move meta-data modification
out of the backend flusher and into the frontend. This will shift
the cpu and buffer cache burden back to the processes responsible.
But it isn't easy to do this because those meta-data buffers cannot
be flushed to the media without first synchronizing the UNDO space.
Synchronizing the UNDO space and still maintaining a pipeline requires
double-buffering dirty meta-data buffers (because new changes to
meta-data which is already dirtied from a previous operation now
undergoing a flush cannot be made in-place).
I would have to abandon using the buffer cache entirely for meta-data
buffers and go with a roll-my-own scheme. That might make porters
happier but it won't make me happier as it opens a whole new can of
worms on how to manage the buffer resources.
I would much rather work on the clustering, but if people are going
to constantly complain about HAMMER's performance I will have to take
2-3 months and deal with this issue first I guess.