DragonFly BSD
DragonFly kernel List (threaded) for 2003-09
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: SLAB allocator now the default.

From: Sander Vesik <sander@xxxxxxxxxxxxxxxxxxx>
Date: 28 Sep 2003 22:26:32 GMT
Cache-post-path: haldjas.folklore.ee!unknown@localhost

Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx> wrote:
>    It depends what power of 2 you are talking about.  Generally speaking,
>    there is a benefit to be had when data objects fit entirely in cache
>    lines.  A cache line is typically 8, 16, or 32 bytes wide depending on the 
>    architecture.  There is also a benefit to the location of the initial
>    data access within the cache line... that is, accessing the first word
>    of a multi-word burst being loaded into the cache line from external
>    memory will often unclog instruction flow earlier, but whether
>    the 'first' word is the low address of the cache line or the high address
>    depends on the architecture.  e.g. cache lines are loaded backwards on
>    MIPS.

More precicely - there is a performance *penalty* when two items wanted by
different CPU-s are in the same cache line (line is increasingly something like
128 bytes in case of L2/L3 these days) over such a cache line being marked
as exclusive. The penalty is considerably worse should either of the data
structures get modified. It definitely more than offsets any benefit from
any packing.

>    Larger alignments can create performance penalties and this is the
>    performance penalty being talked about above.  By larger alignments I am
>    talking about the case where, say, you try to allocate 800 bytes and the
>    allocation is thrown into a 1K block (which is what the old kernel

you can very easily get non-obvious results by not alligning such on cacheline
boundaries though, as you would get by aligning 800 bytes on 4-byte boundaries.
An additional small troop of goblins comes out when things are on more than 
one page.

>    malloc did).  If you are trying to allocate 1024 bytes then presumably
>    you intend to use all 1K and you might as well 1K align it since there
>    is no data loss and no likely performance loss either.  Also, once you
>    reach PAGE_SIZE you almost always want to take advantage of the VM system
>    to allocate whole pages.  The slab allocator does this for power-of-2
>    sized requests beyond PAGE_SIZE but does NOT page-align oddly sized
>    requests (like a 6K request) beyond PAGE_SIZE, at least until the requests
>    get large (greater then 16K).
>    So keeping the power-of-2-allocation-is-power-of-2-aligned characteristic
>    is reasonable for power-of-2-sized requests.

structures smaller than say 128 bytes should be rounded up to the next larger
2^n size though.

>                                        -Matt
>                                        Matthew Dillon 
>                                        <dillon@xxxxxxxxxxxxx>


+++ Out of cheese error +++

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]