DragonFly kernel List (threaded) for 2003-07
Re: Call for Developers! Userland threading
Matthew Dillon wrote:
Ok, this is an official call for developers to begin working on userland
threading. I've come up with a timetable and infrastructure that
should be sufficient for those developers interested in the work to
actually begin working! I would like to hand-out some commit bits for
those doing the work, and I would like to find someone to head up this
sub-project (i.e. not me, I will be focusing on the kernel-side support).
I was kinda thinking of Peter da Silva or Jeffrey Hsu to lead this effort,
but I don't know what kind of time commitment people have.
Here's the infrastructure idea I've come up with:
* We throw away libc_r. Er, that is, we keep it around but base all
new development on a copy of the original libc. We would call
it, oh, libcr (as a pun on libc_r). It wouldn't be an 'alias'
of libc like libc_r is, it would be an actual physical copy of
When all is said and done, several months from now, the new libcr
will *become* our libc, i.e. it will be responsible for both
non-threaded and threaded programs. Don't worry about non-threaded
overhead, it won't be that big a deal because LWKTs can be made
quite optimal in non-threaded environments.
* We (temporarily) throw away POSIX compatibility. I believe that
the userland threading implementation should be based around LWKTs
and LWKT messaging - i.e. a direct port of the LWKT modules now
in the kernel. The problem with trying to maintain the POSIX
infrastructure is that the signal handling will bog down the
development. I believe the signal handling can be dealt with with
supporting kernel infrastructure that does not yet exist. So for
now we throw away POSIX. Later on we will re-implement it.
I think you are making a mistake here..
your conclusion that what is being done in freeBSD 5 is not compatible with
what you are doing here is wrong. You yourself said that async syscals with
upcalls would be easy to do.
"kse" (misnamed) threading uses two 'orthogonal' aproaches.
1/ All syscalls can be async through callbacks. this allows even a single
cpu machine to run multiple threads. I believe this approach is still
optimal for allowing mor ethan one thread to run on a single cpu.
2/ adding virtual cpus to the 'group' allows more than one execution unit to
be used by the Userland scheduler at the same time.
This is basically what you call "rfork" except that rather than
making sub-structures such as file descriptor tables be shared by multiple
'processes' as in the rfork method, there is a single structure (the proc
struct) that acts as a rendesvous point for all threads in teh same process.
The 'rfork' method is in the end more complicated than to have a single
process structure. For example, you need to store the 'pid' somewhere,
but if you rfork multiple proc structs then you have to have a hierarchy set
up to find the correct pid to return. Then when you decide to start removing
process structures, you have to make sure that the hierarchy is correctl
Direct LWKT port but: maybe rename struct thread to struct
I will happily provide the userland assembly bits for I386 for
the initial entry and switching functions for LWKTs. They're
really easy to do, basically just pushal, stack switch, popal, ret.
* We (temporarily) build the new system call emulation layer into
libcr with an eye towards eventually separating it out into its own
This new layer is very simple in concept. Basically you will begin
implementing system calls which convert to messages. For example,
in libcr read() would be:
read(int fd, void *buf, size_t nbytes)
* Use the convenient mostly pre-built message stored in the
* userthread structure
msg = &curthread->td_sysmsg;
msg->fd = fd;
msg->buf = buf;
msg->nbytes = bytes;
error = lwkt_domsg(&syscall_port, msg);
curthread->td_errno = error;
msg->result = -1;
this is a synchronous syscall. Async syscalls are much more interesting..
The actual int 0x80 would be done by syscall_port's beginmsg
function (it would point to a bit of assembly). And, yes, that
means you can theoretically shim the syscall port if you want
(mantra fodder: flexibility!), and it also means that errno
handling is done in userland (more mantra fodder: flexibility!).
I think this would be a great project for developers to really sink their
teeth into, because there is so much to do it can be worked on by
several people in parallel, and because the breakages will not effect
the stability of the development environment, and for all the reasons
above it means I can start handing out commit bits.
I would like to find one developer to act as the head-honcho for the
userland work, and any number of developers to work on the pieces. The
piecemeal work is:
* Messaging for individual syscalls (i.e. each system call, like
read() above, needs to be coded for the messaging interface).
* The LWKT threading port (I can help with the assembly bits).
* Implementation of the per-cpu-area abstraction (becomes per-rfork).
why throw away evertything that has been learned so far?
* (later on) Use %fs or %gs (kernel-supported?) to aid in access
to per-cpu areas? Anyone have any ideas here? It isn't necessary
for the initial threading work.
It MUST be %gs on x86.. the ELF TLS spec mandates it.
It must also describe an LDT entry that points to a POINTER to the thread
i.e. movl gs:0, %eax
loads the address of the thread control block into %eax
For OTHER architectures the thread pointer is a normal pointer register and
points directly to the Thread control block.
* (later on) Thread migration between rforks (i.e. more sophisticated
your decision top rfork multiple process structures is I believe misguided.
You have the oportunity to change everything.. why stick with old thinking?
* (later on) development of a kernel-supported signal infrastructure
for proper POSIX signal handling.
Check what david Xu has done for posix thread support.
We treat the upcalls as an upgoing message interface.. should just get
simpler if moved to your system.
I should have a basic syscall messaging syscall working this week,
even though it will initially operate synchronously.