DragonFly BSD
DragonFly commits List (threaded) for 2010-09
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

git: kernel - Optimize kfree() to greatly reduce IPI traffic


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 18 Sep 2010 13:38:01 -0700 (PDT)

commit 5fee07e60dc4c041779e46c199f144a7f7c550ee
Author: Matthew Dillon <dillon@apollo.backplane.com>
Date:   Sat Sep 18 13:23:41 2010 -0700

    kernel - Optimize kfree() to greatly reduce IPI traffic
    
    * Instead of IPIing the chunk being freed to the originating cpu we
      use atomic ops to directly link the chunk onto the target slab.
      We then notify the target cpu via an IPI message only in the case where
      we believe the slab has to be entered back onto the target cpu's
      ZoneAry.
    
      This reduces the IPI messaging load by a factor of 100x or more.
      kfree() sends virtually no IPIs any more.
    
    * Move malloc_type accounting to the cpu issuing the kmalloc or kfree
      (kfree used to forward the accounting to the target cpu).  The
      accounting is done using the per-cpu malloc_type accounting array
      so large deltas will likely accumulate, but they should all cancel
      out properly in the summation.
    
    * Use the kmemusage array and kup->ku_pagecnt to track whether a
      SLAB is active or not, which allows the handler for the asynchronous IPI
      to validate that the SLAB still exists before trying to access it.
    
      This is necessary because once the cpu doing the kfree() successfully
      links the chunk into z_RChunks, the target slab can get ripped out
      from under it by the owning cpu.
    
    * The special cpu-competing linked list is different from the linked list
      normally used to find free chunks, so the localized code and the
      MP code is segregated.
    
      We pay special attention to list ordering to try to avoid unnecessary
      cache mastership changes, though it should be noted that the c_Next
      link field in the chunk creates an issue no matter what we do.
    
      A 100% lockless algorithm is used.  atomic_cmpset_ptr() is used
      to manage the z_RChunks singly-linked list.
    
    * Remove the page localization code for now.  For the life of the
      typically chunk of memory I don't think this provided much of
      an advantage.
    
    Prodded-by: Venkatesh Srinivas

Summary of changes:
 sys/kern/kern_slaballoc.c |  483 ++++++++++++++++++++++++++++++---------------
 sys/sys/malloc.h          |    5 +-
 sys/sys/slaballoc.h       |   12 +-
 3 files changed, 338 insertions(+), 162 deletions(-)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/5fee07e60dc4c041779e46c199f144a7f7c550ee


-- 
DragonFly BSD source repository



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]