DragonFly kernel List (threaded) for 2006-09
Re: Cache coherency, clustering, and Kernel virtualization
Please excuse my newbness --- but how does this differ from UML?
On Sat, Sep 02, 2006 at 11:49:36AM -0700, Matthew Dillon wrote:
> As people may have noticed, I managed to get the first cut of the
> cache coherency subsystem in place. Unfortunately, a great deal more
> work is needed to get it working fully. I need to flesh out syslink and
> work on the cross-machine cache coherency algorithms themselves. This
> work is going to be very heavily integrated into the kernel, and it is
> very complex... so much so that debugging it in an actual kernel (even
> via VMWARE) would not be all that much fun. For that matter, it gets
> even worse once I get to the point where I need to test communication
> between living systems.
> So for the last week or two I've been considering my options, and I
> have finally come up with a plan that will not only make development
> a whole lot easier, but also give us a nice feather in our cap for our
> December release!
> Consider what we want to accomplish. We want to be able to cut up
> system resources and link them into 'clusters', with the whole mess
> tied together on the internet. Originally I envisioned cutting up
> memory, disk, and cpu resources and connecting them to a cluster
> individually, but now I believe what we need to do is connect an
> entire kernel to the cluster and basically operate as a single system
> Now consider the problem of tying an entire kernel into an internet-based
> cluster. Does that sound like something that would be 'safe' to
> integrate into your real kernel? NO WAY! It is virtually impossible
> to 'secure' a kernel which is operating as a single system image in
> a cluster of machines connected together via the internet.
> So what do we do? Well, I finally figured it out. It may seem obvious
> but there were some severe problems I had to work out before I could be
> sure that it would work. The coding for it isn't even all that
> What we do is we make it so a DragonFly kernel can be compiled and run
> as a userland application running under the real DragonFly kernel. As
> a userland application the virtual kernel can be completely firewalled
> off from the rest of the system. The virtual kernel can then be
> associated with the 'cluster', and managing controlling memory, cpu,
> and disk resources is a whole lot easier when you have an entire kernel
> as your funnel into the real system's resources. If you want to tie
> into multiple clusters you just create multiple virtual kernels! More
> to the point, the technology could be used to partition off major
> services and EVEN USER LOGINS(!) on a large machine.
> Sounds kinda like what IBM did with linux on its mainframes, eh? But I
> am going to do it with DragonFly and my expectation is that performance
> within a virtual kernel will be within 20% of the performance of a
> real kernel.
> In order to be able to have a virtual kernel running as a userland
> application the virtual kernel must be able to manipulate other VM
> spaces. Manipulating other VM spaces means I have to develop new system
> calls to control VM spaces. These VM spaces will represent the user
> processes running under the virtual kernel. This is where I have been
> stuck for the last week, trying to figure out how to be able to map
> memory between the virtual kernel and user processes running under
> the virtual kernel without blowing away the REAL kernel's memory with
> millions of VM map and VM object structures.
> I finally figured it out, and the answer is so simple that I am surprised
> it took a week for me to figure it out. The answer is: You simply do
> not attempt to represent the memory maps in the VM spaces being
> controlled by the virtual kernel with real-kernel objects. Instead,
> you map the memory into those VM spaces directly via the PMAP subsystem.
> As some of you may know, the PMAP in a BSD kernel (unlike a Linux kernel)
> is ephermal... the real kernel can remove mappings at any time and
> simply take a page fault to fill them in again. This means that the
> real kernel can theoretically support hundreds of thousands of PMAPs
> and thus allow us to operate pretty much as many virtual kernels and
> as many virtual processes under those virtual kernels as we wish without
> blowing up our real kernel.
> The cost of this method is that when a virtual process running under a
> virtual kernel takes a page fault, it must chain through the virtual
> kernel and cannot short-cut directly to the real kernel to handle the
> page fault. I do not think this is a big deal considering the number
> of page table optimizations we already have.
> This is going to be my goal for our December release... to have userland
> kernels fully operational. Development of the syslink and cache coherency
> technology will go a lot faster once we have virtual kernels.