DragonFly kernel List (threaded) for 2004-02
Re: lkwt in DragonFly
:> RCU = Remote Copy Update, right? Just asking, so I can find the right
:> paper to read more about it.
:Read-Copy-Update. It's a read-mostly algorithm that's used to cluster data
:structures writes/changes, so that it minimizes bus traffic across processors.
:What Matt is talk about is merging a kind of quiescience operation into dfBSD
:tokens if I understand correctly.
Right. My new idea for the token API gives us a scheduler-level
serialization abstraction that works across the whole system. It
would kinda be like having multiple 'MP' locks in that it would have
the same sort of side effects as an MP lock.
The traditional serialization abstraction is the concept of a 'mutex',
which is what FreeBSD-5 uses. Linux tries to avoid mutex overheads
by using RCU (anyone can read data without having to worry about it
being changed out from under, but modifications must be synchronized
across all cpus at a scheduler boundary). Right now we try to avoid
mutex-like overheads through our per-cpu model (e.g. LWKT scheduler
and SLAB allocator) and through our per-cpu tokens.
The problem with our current token abstraction is very similar to the
problem FreeBSD-5 has with its mutex abstraction. In our case, a
token can be 'lost' when a thread blocks or gets interrupted outside
of a critical section and the thread must carefully code re-acquisition
of potentially lost tokens because of this. In FreeBSD-5's case a mutex
is not allowed to be held across a blocking operation at all and so
the use of mutexes must be carefully coded... adding significant complexity
to deeply nested APIs. Mutexes also have lock ordering issues (deadlock
My new token idea takes the original per-cpu idea and abstracts it
into a serialization model. We could, in fact, even support 'reader'
and 'writer' refs to tokens in order to allow multiple readers to
hold the same token at the same time. The key difference between
the new token idea and, say, a standard lock, is that the token is
only in effect while the thread that acquired it is running. When
the thread blocks or switches away the tokens that thread has acquired
can be borrowed by other threads (on the same or on other cpus) and
must be returned before the original thread can resume.
So, unlike mutexes or locks, a token only guarentees run-time
serialization. A token also has no deadlock issues and no lock
ordering issues at all. Like mutexes, code that uses tokens must
be aware of potential blocking conditions (because atomicy is lost
across a blocking operation), but unlike mutexes the code that
uses tokens does not have to release the tokens across the blocking
operation. I believe this to be a VERY IMPORTANT feature of the
new token design because it means we can (A) hold multiple tokens
across a blocking situation, (B) higher level procedures do not need
to pass token references to lower level procedures that 'might block',
they need only be aware that the lower level procedure might block,
and (C) the new token abstraction leaves open the possibility of
major implementation optimizations because it devolves nearly into
a NOP on UP systems. e.g. we can support per-cpu 'caching' of the
token after it has been released, allowing us to avoid locked bus
cycles as well as memory barriers in the cache case.
So how does this tie into the RCU concept? Simple, really... RCU
turns out to be a special case of the new token abstraction, that's
all. In the new abstraction if you had a token T and every thread
in the system obtained 'read access' on T, then any thread which
then obtains 'write access' on T would be serialized against all
the threads with read access at the scheduler level. If we wanted to
implement Linux's 'queued' RCU API then we would queue structural
changes directly to a thread that obtains 'write access' on T
and this translates directly to updating the data during
I actualy don't like Linux's queued RCU API, I am just saying that
it could be implemented natively on top of the new token abstraction.
We would use our tokens in a more active manner. e.g. instead of
holding a read token and queueing write udpates to another entity that
holds the write token we would obtain a write token and do the
update in real time, then release the write token. Nothing prevents
us from doing it both ways, though :-)
So what does this mean? This means that the new token abstraction
will not be a replacement for standard reader/writer (lockmgr) locks,
but it certainly would be an excellent replacement for mutexes
because it has far fewer side effects, requires less knowledge of
the lock state to be passed down the subroutine chain, and is far more