DragonFly commits List (threaded) for 2005-03
Re: cvs commit: src/sys/sys tls.h src/lib/libc/gen tls.c src/lib/libthread_xu/arch/amd64/amd64 pthread_md.c src/lib/libthread_xu/arch/i386/i386 pthread_md.c src/libexec/rtld-elf rtld.c rtld.h rtld_tls.h src/libexec/rtld-elf/i386 reloc.c
:I'd like to get rid of the size argument too. This should be split into
:machine/tls.h (with e.g. the struct tcb define) and sys/tls.h with the
:general system call.
This is viable, but I think it might be best to retool it so the thread
library has full control over the size of the TCB rather then hardwire
it into the OS headers.
:> * Gets rid of the Variant I code (we can add it in later, it just gets
:> in the way).
:That's why I wanted to use Variant I always. For the archive, there is one
:nasty thing -- statically linked binaries. For those, ld itself does the
:relocation and therefore it would have to be changed to go directly to
I would say that this is not a viable option. Using positive offsets
locks the program into a particular TCB size. This may not matter so
much for static programs, but it is information that would have to be
communicated to the linker through some out-of-band method (like via
the linker map) and that seems a bit too hackish for my tastes.
For dynamically linked programs it is out of the question... the
rtld and libc can statically code the size of the TCB, but the
program binary cannot because it would tie our hands from an ABI
point of view.
:> * Retools the Variant II code to support %gs:OFFSET (negative offset)
:> AND %gs:0 relative accesses, supporting both -mtls-direct-seg-refs and
:As Doug mentioned, this means m:n implementation has to do a syscall for
Yup, or a linux-like kernel-supported thread switch (which I would
prefer NOT to do). I did a quick timing test on sys_set_tls_area()
and it costs around 339ns on my AMD64 test cube. But this is still
going to be far higher performing then having to call __tls_get_addr
all the time. The procedure setup cost for figuring out the GOT offset
alone is 17ns on the same box.
I think this problem goes away in 64 bit mode. It's a little confusing,
I don't know why they made the segment load instructions only 32 bits,
but it appears you can load a 32 bit base address into %fs or %gs from
:> * Retains the DTV methodology.
:> * Retains the TCB methodology, but note that the area 'after' the tcb
:> is now available for future use (at least with Variant I removed).
:> Frankly I'm not sure we would ever want to support having the
:> 'data' area after the TCB instead of before the TCB, at least not
:> for i386.
:Placing the TLS area after the TCB solves a lot of nasty problems :)
It would appear to be cleaner, but it's not worth doing if we have
to make serious hacks to the compiler to support it.