DragonFly kernel List (threaded) for 2003-07
Re: token passing & IPIs
: It's the lock-less SMP stuff that Linux is using. The idea
: originated from IBM's Paul McKenney; and Rusty Russel of Linux
: Camp actual made it work for Linux 2.4.x and later versions 
: It will be a good idea to do lock-less file system lookups,
: since it is a lookup, there is no write to it, thus we can avoid
: locks as much as we can. So can we for device state tables, and
: They have seen a significant scalability gain according to the
: results, and the basic theory behind how
: Read-Copy-Update/Lock-Less works .
:  - http://lse.sourceforge.net/locking/dcache/dcache.html
:  - http://lse.sourceforge.net/locking/rcupdate.html
: -- Hiten (hmp@xxxxxxxxxxx)
Ok, I now understand read-copy-update. It is a very interesting approach
and, indeed, it is very similar to MESI but taylored to a strict SMP
environment. I don't think I like the synchronization methodology of
waiting for all the cpus to go through a scheduling switch to synchronize
the update, but it is certainly an innovative solution that avoids having
to message the other cpus and it does allow a single 'shared' cache.
I think the MESI model is a better abstraction because it can be used
to manage a single shared cache as well as multiple independant (but
coherent) caches. The RCU method only works with a single shared cache
topology. But the problem remains: how to synchronize updates
and invalidations. In particular, how to update the globally shared
linkages between cached objects.
Hmm. I think it's possible to do without creating stall points. If
the global cache looks something like this:
then when you want to replace B with Bnew you would obtain a mutex and
then hang Bnew off a dedicated pointer in B, like this:
So far so good. Now in order to be able to change the A->B linkage
to A->Bnew, which would complete the update, you have to be sure
that all cpus see the B->new_ptr field as non-NULL. Once you know
they see B->new_ptr as non-NULL you can safely change A->B to A->Bnew.
Other cpus will either traverse A->B->Bnew, or they will see
A->Bnew, so it doesn't matter when they see the A->B to A->Bnew update.
Then once you know that all cpus see A->Bnew instead of A->B, and
that no other cpus are inside the old B any more, you can free the old B.
So in this scheme we have two 'synchronization' points we must resolve.
The first synchronization can be resolved with a combination of a
memory barrier and making B->new_ptr a volatile field. The second
synchronization point can be resolved by delaying freeing the old B,
which could work by counting thread switches until all the cpus have
switched at least once... but it wouldn't have to stall, it could be
I'm sure I can think of other ways of doing it that aren't as costly
as the RCU algorithm which fit into both a single shared global cache
MESI model and a multiple-independant-caches MESI model.