DragonFly kernel List (threaded) for 2003-09
Re: SLAB allocator now the default.
:> I haven't looked at the Slab-alloc code recently, but I am
:> wondering if you are planning to remove the power-of-2 alignment
:> stuff out of malloc()?
:Doesn't power of two alignment make it easier on some CPUs internally
:data? If a person wanted to access a word overlapping two word's
:alignment would it
:not take two bus accesses to get that data [or two cache reads... what
:Do the benefits of not having power of two alignment outweigh this?
:> From the many papers I have been reading in the last couple of
:> weeks, it seems that power-of-2 alignment seems to cause a lot
:> of performance degradation in SMP cases, and cache issues.
It depends what power of 2 you are talking about. Generally speaking,
there is a benefit to be had when data objects fit entirely in cache
lines. A cache line is typically 8, 16, or 32 bytes wide depending on the
architecture. There is also a benefit to the location of the initial
data access within the cache line... that is, accessing the first word
of a multi-word burst being loaded into the cache line from external
memory will often unclog instruction flow earlier, but whether
the 'first' word is the low address of the cache line or the high address
depends on the architecture. e.g. cache lines are loaded backwards on
Larger alignments can create performance penalties and this is the
performance penalty being talked about above. By larger alignments I am
talking about the case where, say, you try to allocate 800 bytes and the
allocation is thrown into a 1K block (which is what the old kernel
malloc did). If you are trying to allocate 1024 bytes then presumably
you intend to use all 1K and you might as well 1K align it since there
is no data loss and no likely performance loss either. Also, once you
reach PAGE_SIZE you almost always want to take advantage of the VM system
to allocate whole pages. The slab allocator does this for power-of-2
sized requests beyond PAGE_SIZE but does NOT page-align oddly sized
requests (like a 6K request) beyond PAGE_SIZE, at least until the requests
get large (greater then 16K).
So keeping the power-of-2-allocation-is-power-of-2-aligned characteristic
is reasonable for power-of-2-sized requests.