DragonFly kernel List (threaded) for 2004-02
Re: lkwt in DragonFly
On Fri, 06 Feb 2004 18:33:47 -0800, Matthew Dillon wrote:
> First, Bosko, thanks for asking your question! You really got me
> thinking about our token implementation. I've been whining about it
> being broken on our mailing lists for quite a while and as I was
> formulating a reply to your email I had an epiphany. So I hope you
> don't mind me having CC'd your email and my reply with the kernel@
> : Reading this:
> :Token Passing Primitives Not Mutexes :...
> : Do you intend to have a single pcpu token on a given cpu protecting
> : all data structures (or a class of data structures)? The tradeoff
> of : the lesser-granularity vs the admittedly much lower cost for a
> common : token acquisition may or may not be worth it in the end.
> (The other : part to the tradeoff is that you can never go with
> full-preemption). : I did understand this correctly, right? :
> :Bosko Milekic * bmilekic@xxxxxxxxxxxxxxxx * bmilekic@xxxxxxxxxxx
> Not entirely. First, please note that the current token
> implementation does not work very well... it's a mess to use because
> cpu #B can steal away cpu #A's token by sending cpu #A an IPI
> message requesting the token. This means cpu #A has to be in a
> critical section to 'protect' the token, and must re-acquire the
> token (in case it was lost) after it blocks.
> After thinking about this quite a bit I have come up with a solution
> that, in a moment of enlightment, would make our tokens operate like
> self-contained little RCU implementations (superior to Linux's
> implementation, in fact!).
RCU = Remote Copy Update, right? Just asking, so I can find the right
paper to read more about it.
> I intend to fix our tokens by not allowing an IPI to steal a token
> until the owning cpu has gone through a thread switch, or by
> explicit release. This means that, once fixed, our token
> implementation will allow us to abstract RCU-like operations, which
> I think is very important. It is not what I originally intended
> tokens to be but it is turning out to be what they should be.
> The nice thing about the revamp I intend to do on the token code is
> that the RCU abstraction will wind up being a lot better then
> Linux's RCU abstraction, because it will be compartmentalized. Only
> cpus competing for the same token are effected rather then all cpus
> in the system. We would not have the serialization problem that
> Linux has.
> Tokens will be used in a manner similar to how mutex are used in
> FreeBSD-5, except that in our case (with the revamped token code)
> DFly will be able to hold any number of tokens across a blocking
> situation without creating deadlocks (but also not guarenteeing
> atomicy across the blocking situation). I think this is exactly the
> sort of abstraction we need. Mutexes in FBSD-5 and tokens in DFly
> are primarily used as interlocks on structures, for which no
> blocking occurs, or in situations where the blocking condition is
> well manageed (e.g. lockmgr). I feel that my new token (RCU-like)
> implementation will wind up being far superior to FreeBSD-5's mutex
> implementation for 90% of these situations and it will not suffer
> from the deadlock potential that mutexes have because it will not
> guarentee atomicy across a blocking call.
> Also note that several major DragonFly subsystems (LWKT scheduler
> and SLAB allocator) are totally mutex & token free and things like
> the network stack will also be mostly mutex & token free because it
> will assign PCBs to particular threads (so operations on those PCBs
> are only done by a particular thread which in turn means no locking
> of the PCB is required at all).
> The performance trade off is basically the ability for the cpu
> owning a token to operate on the token without having to use
> bus-locked instructions, verses having to send an IPI message to the
> owning cpu when the requesting cpu does not own the token. This is
> not the major reason for using tokens, though. Still, judging by
> the benchmark comparisons we've done between 4.x and DFly, the
> overhead of a 'miss' appears to be very small. It turns out that
> most locks (like vnode locks) hit the same-cpu case a *LOT* more
> then they hit the foreign-cpu case so the overall benefit is at
> least equivalent to FreeBSD-5's mutex overhead. It certainly isn't
> worse. We do lose when it comes to IPC operations between two
> processes running on difference CPUs, but if this is the only thing
> that winds up being a problem it is a simple enough case to fix.
> However, the major reason for using tokens is that it creates an
> abstraction that allows us to execute many types of operations as if
> the system had only one cpu. Since a token is owned by a cpu and
> not a thread, if thread #1 obtains a token and blocks, thread #2
> running on the same cpu can obtain that same token optimally.
> Likewise, thread #3 running on a different cpu can 'steal' the token
> (but it only gets transfered on a scheduler boundary of the first
> cpu once I fix the current token code). We also have the advantage
> of being able to use this abstraction heavily without compromising
> UP performance, because on a UP box a blocking situations implies a
> scheduler switch and since tokens are per-cpu... well, the whole
> thing devolves into a NOP.
> I have some work to do to move my current token implementation into
> this new paradigm, but I do not expect it to be difficult. For
> example, if cpu #B steals a token from cpu #A then the threads that
> own the token on cpu #A can't be scheduled to run again until cpu #B
> is through with the token. This means actively accounting for and
> structurally linking the dependancy information.
> Whew! That may seem complex, and in a mutexed environment it would
> be very complex. But remember in DFly the whole basis of using this
> abstraction is to be able to operate almost as if one were on a
> single cpu even when one is on a 16-cpu system, so the actual
> structural accounting does not really have to worry about interrupts
> or stale data or other cpus. A critical section is sufficient to
> operate on token meta data on the owning cpu, and we can actually
> defer structural linkages within tokens we own until a thread switch
> occurs (which won't happen in most cases since like a mutex a token
> is obtained and released without blocking most cases).
> I think I rambled on there. Please feel free to ask additional
bsdforums.org 's moderator, mezz.