DragonFly BSD
DragonFly kernel List (threaded) for 2010-06
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: HEADS UP - major structure size changes in HEAD

From: Aggelos Economopoulos <aoiko@xxxxxxxxxxxxxx>
Date: Thu, 10 Jun 2010 02:19:03 +0300

On 09/06/2010 08:57 μμ, Matthew Dillon wrote:
:I fully agree that tokens are a more convenient API than, say, lockmgr
:locks, but AFAIK nobody has investigated their behavior under heavy
:contention (I suspect you have Matt, so please share any information you
:might have collected).

     Tokens and the big-giant-lock behave the same way when under contention.
     Both basically call lwkt_switch().  The LWKT scheduler then essentially
     spins looking for schedulable threads until it finds one for which it
     can restore the state of both the MP lock and the token(s).

:Right now, the lwkt_switch() routine is responsible for getting all
:tokens that a thread needs to run. If it can't get them (because some
:token is owned by a thread running on a different cpu), it moves on to
:the next runnable thread.
:However, lwkt_getalltokens() tries to get the tokens in the same order
:that they were acquired by the thread, which means that if you lose a
:race to another thread that needs your Nth token, you just wasted time
:on taking (N - 1) tokens, possibly obstructing other threads that could
:use those tokens to move forward. I admit that in practice most threads
:will take the same tokens in the same order, so I don't expect that to
:be a big issue. Some concrete data would be nice though :)

[description of issues with taking the tokens in reverse order]
     If you think about it the meet-in-the-middle collision can create a
     horrible cascading effect on systems with a lot of cpus, where everyone
     contends until all cpus in question wind up in the scheduler.

Indeed, which is why I specifically wrote that, since I expect most paths that contend for a set of tokens will try to take them in the same order, trying to take the tokens in forward order is the sane approach. There will only be an issue if that assumption is false, i.e. contending code paths do not try to take a set of tokens in the same order (then you may often have the meet-in-the-middle collision). I hope it will be mostly true in practice, but if it's not we'll definitely notice :)

:Another thing that makes me sceptical about token scalability is that,
:AFAICT, the scheduler needs to try to acquire them *every time*. So if
:you have M threads on a cpu all waiting for, say, the VM token, then
:while it is held on another cpu you'll try (and fail) to get that token
:M times every time you lwkt_switch(). This is time you could have spent
:running a task that potentially *could* get some work done (nevermind
:the cmp. That is in contrast to sleep locks where you only become
:runnable when a lock you wanted is released.
:I'm sure you've thought about all this; I suppose you're seeing tokens
:as a good trade off between simplicity and scalability?

     If many kernel threads are contending on the same token the requesting
     cpu has nothing to do until the holding cpu releases the token,
     no matter how many threads are stuck on the requesting cpu.  When the
     token is released by the holding cpu then the requesting cpu will
     instantly be able to schedule whatever thread (of the many contending
     threads) it is currently polling.

Err, I'm not seeing this. AFAICT if there are M threads that need token X to run, the lwkt_switch() will have to loop through all M threads. So the cpu has some work to do.

As a more realistic example, if another CPU owns the VM token and you have M + 1 runnable threads, M of which need the VM token and 1 which doesn't, on average (assumming random order on the run queue) you'll try to get the VM token M/2 times before finding the one thread that doesn't need it. The issue is that you don't get a notification when that token is released (and as you said IPIs have big latencies, so that's not really doable). And of course if M threads are waiting for the token, you really do want to try hard to get it; if you just kept track of busy tokens as you went through the runnable threads and avoided trying to acquire them a second time in the for_each_runnable_thread loop you'd incur much higher latencies. You could perhaps get ultra-clever in which token you try to take next (though I can't think of an obviously reasonable approach ATM), but this means more effort before each context switch.

     The spin (spinlocks) or the hybrid spin-in-scheduler (MP lock, tokens)
     approach gives us a solution which has lower overheads for locks which
     are held for very short periods of time.

The subsystem tokens will be coarse-grained at first so I'm not sure why e.g. the VM tokens can't be held for a considerable (if bounded) amount of time. This is not really a problem; our SMP scalability will improve considerably. My concern is the next step and whether lwkt_switch() can (or will be able to) efficiently juggle a large number of tokens. I'm not saying it *will* be a problem. It's just that, to me at least, it is not entirely obvious how the current algorithm would perform in that scenario.

I'm all for using tokens to lock down subsystems however.


     The lockmgr/mutex design gives us a solution which has lower overheads
     for locks which are held for longer periods of time and/or must be held
     across blocking conditions.

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]