DragonFly kernel List (threaded) for 2003-09
Re: SLAB allocator now the default.
:> to allocate whole pages. The slab allocator does this for power-of-2
:> sized requests beyond PAGE_SIZE but does NOT page-align oddly sized
:> requests (like a 6K request) beyond PAGE_SIZE, at least until the requests
:> get large (greater then 16K).
:> So keeping the power-of-2-allocation-is-power-of-2-aligned characteristic
:> is reasonable for power-of-2-sized requests.
:structures smaller than say 128 bytes should be rounded up to the next larger
:2^n size though.
Well, I don't think you can point to any one thing and say that it
will magically solve all the problems. It takes an integrated approach
to make things operate smoothly.
For example, there is a rather severe memory and cache efficiency
tradeoff here that cannot be ignored. If one is allocating 32 byte
structures and wasting 128 bytes of memory on each one the result is
that 80% of your memory accesses wind up using only 20% of your available
L2 cache, which makes your cache only 1/3 as effective as it would be
if you had compacted the allocations to spread them over the entire L2
In DragonFly we do several things, and taken together they form a far
more effective solution:
(1) Our slab allocator is per-cpu.
(2) Because it is per-cpu our slab allocator can make compact
allocations without severe cache line contention.
(3) We forward modifications to structures to the cpu owning the
structure (the structure that was also allocated on that cpu,
typically), to reduce modifying cache contention and avoid the use
of mutexes (mutexes virtually guarentee cache contention).
(4) We intend to isolate subsystems in their own cpu-locked threads
so the related data structures remain local to the cpu.
We don't do everything perfectly... right now the cpu allocating a
structure is not necessarily the cpu that is going to use it, for example,
but it is simply not possible to cover all the bases right from the
start. As long as the infrastructure and programming model allow for
it to be done properly, as a goal, then we can eventually achieve the
So, at least in regard to DragonFly, aligning memory requests on 128
byte boundaries would be detrimental.