DragonFly kernel List (threaded) for 2003-09
Re: SLAB allocator now the default.
Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx> wrote:
> It depends what power of 2 you are talking about. Generally speaking,
> there is a benefit to be had when data objects fit entirely in cache
> lines. A cache line is typically 8, 16, or 32 bytes wide depending on the
> architecture. There is also a benefit to the location of the initial
> data access within the cache line... that is, accessing the first word
> of a multi-word burst being loaded into the cache line from external
> memory will often unclog instruction flow earlier, but whether
> the 'first' word is the low address of the cache line or the high address
> depends on the architecture. e.g. cache lines are loaded backwards on
More precicely - there is a performance *penalty* when two items wanted by
different CPU-s are in the same cache line (line is increasingly something like
128 bytes in case of L2/L3 these days) over such a cache line being marked
as exclusive. The penalty is considerably worse should either of the data
structures get modified. It definitely more than offsets any benefit from
> Larger alignments can create performance penalties and this is the
> performance penalty being talked about above. By larger alignments I am
> talking about the case where, say, you try to allocate 800 bytes and the
> allocation is thrown into a 1K block (which is what the old kernel
you can very easily get non-obvious results by not alligning such on cacheline
boundaries though, as you would get by aligning 800 bytes on 4-byte boundaries.
An additional small troop of goblins comes out when things are on more than
> malloc did). If you are trying to allocate 1024 bytes then presumably
> you intend to use all 1K and you might as well 1K align it since there
> is no data loss and no likely performance loss either. Also, once you
> reach PAGE_SIZE you almost always want to take advantage of the VM system
> to allocate whole pages. The slab allocator does this for power-of-2
> sized requests beyond PAGE_SIZE but does NOT page-align oddly sized
> requests (like a 6K request) beyond PAGE_SIZE, at least until the requests
> get large (greater then 16K).
> So keeping the power-of-2-allocation-is-power-of-2-aligned characteristic
> is reasonable for power-of-2-sized requests.
structures smaller than say 128 bytes should be rounded up to the next larger
2^n size though.
> Matthew Dillon
+++ Out of cheese error +++