DragonFly kernel List (threaded) for 2003-09
Re: SLAB allocator now the default.
On Sun, 28 Sep 2003, Matthew Dillon wrote:
> :> to allocate whole pages. The slab allocator does this for power-of-2
> :> sized requests beyond PAGE_SIZE but does NOT page-align oddly sized
> :> requests (like a 6K request) beyond PAGE_SIZE, at least until the requests
> :> get large (greater then 16K).
> :> So keeping the power-of-2-allocation-is-power-of-2-aligned characteristic
> :> is reasonable for power-of-2-sized requests.
> :structures smaller than say 128 bytes should be rounded up to the next larger
> :2^n size though.
> : Sander
> Well, I don't think you can point to any one thing and say that it
> will magically solve all the problems. It takes an integrated approach
> to make things operate smoothly.
There never is a one "solve it all" solution, except in some trivial
> For example, there is a rather severe memory and cache efficiency
> tradeoff here that cannot be ignored. If one is allocating 32 byte
> structures and wasting 128 bytes of memory on each one the result is
> that 80% of your memory accesses wind up using only 20% of your available
> L2 cache, which makes your cache only 1/3 as effective as it would be
> if you had compacted the allocations to spread them over the entire L2
> cache evenly.
Yes, that would be rather wasteful - you need differnt strategies for
small and non-small allocatios
> In DragonFly we do several things, and taken together they form a far
> more effective solution:
> (1) Our slab allocator is per-cpu.
> (2) Because it is per-cpu our slab allocator can make compact
> allocations without severe cache line contention.
> (3) We forward modifications to structures to the cpu owning the
> structure (the structure that was also allocated on that cpu,
> typically), to reduce modifying cache contention and avoid the use
> of mutexes (mutexes virtually guarentee cache contention).
> (4) We intend to isolate subsystems in their own cpu-locked threads
> so the related data structures remain local to the cpu.
> We don't do everything perfectly... right now the cpu allocating a
> structure is not necessarily the cpu that is going to use it, for example,
> but it is simply not possible to cover all the bases right from the
> start. As long as the infrastructure and programming model allow for
> it to be done properly, as a goal, then we can eventually achieve the
> So, at least in regard to DragonFly, aligning memory requests on 128
> byte boundaries would be detrimental.
Allocating a 800byte structure on a 64 or 128 byte boundary imposes a
pretty neglible additional overhead. and you can figure out the right
alignment at kernel startup the latest
> Matthew Dillon
+++ Out of cheese error +++