DragonFly kernel List (threaded) for 2004-08
Re: VFS ROADMAP (and vfs01.patch stage 1 available for testing)
-On [20040817 07:32], Matthew Dillon (dillon@xxxxxxxxxxxxxxxxxxxx) wrote:
> This is a question with a very, very complex answer. If I were to
> try to simplify it and put it in layman's terms, we will almost
> certainly have to use the resource accessibility model. So what
> happens would depend on what was running on the node that went down
> and whether the resources are recoverable or not. Processes beholden
> to the dead node or needing critical resources (like memory) on the dead
> node would probably be killed by default.
Most high availability clusters have proprietary memory interlinks for this
end. Wasn't there some development/work on sharing memory state over
(dedicated) Ethernet links?
Otherwise you might need to work with process checkpointing and migrating
those states to centralised controller boxes every once in a while to not
loose work. But then you need to make sure you can safely migrate process
to other boxes anyway.
One big hurdle is: overcoming different hardware configurations.
> Recoverable resources, such as a block device representing a physical
> disk, could result in dependant proceses blocking until the resource
> becomes available again. If the block device is part of a RAID then
> theoretically dependant processes would still be able to run as long as
> the RAID as a whole remains intact.
I think with hardware RAID we should not need to worry about that. And
software RAID is also a separate subsystem.
The blocked issue reminds me of NFS shares just going away and leaving a box
wedged until it gets back online. :S
Most clusters I've seen also typically work with quorum disks.
Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / kita no mono
Free Tibet! http://www.savetibet.org/ | http://www.tibet.nu/
http://www.tendra.org/ | http://www.in-nomine.org/
You yourself, as much as anybody in the entire universe, deserve your love