• Vladimir Davydov's avatar
    slub: make dead caches discard free slabs immediately · d6e0b7fa
    Vladimir Davydov authored
    To speed up further allocations SLUB may store empty slabs in per cpu/node
    partial lists instead of freeing them immediately.  This prevents per
    memcg caches destruction, because kmem caches created for a memory cgroup
    are only destroyed after the last page charged to the cgroup is freed.
    
    To fix this issue, this patch resurrects approach first proposed in [1].
    It forbids SLUB to cache empty slabs after the memory cgroup that the
    cache belongs to was destroyed.  It is achieved by setting kmem_cache's
    cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() so
    that it would drop frozen empty slabs immediately if cpu_partial = 0.
    
    The runtime overhead is minimal.  From all the hot functions, we only
    touch relatively cold put_cpu_partial(): we make it call
    unfreeze_partials() after freezing a slab that belongs to an offline
    memory cgroup.  Since slab freezing exists to avoid moving slabs from/to a
    partial list on free/alloc, and there can't be allocations from dead
    caches, it shouldn't cause any overhead.  We do have to disable preemption
    for put_cpu_partial() to achieve that though.
    
    The original patch was accepted well and even merged to the mm tree.
    However, I decided to withdraw it due to changes happening to the memcg
    core at that time.  I had an idea of introducing per-memcg shrinkers for
    kmem caches, but now, as memcg has finally settled down, I do not see it
    as an option, because SLUB shrinker would be too costly to call since SLUB
    does not keep free slabs on a separate list.  Besides, we currently do not
    even call per-memcg shrinkers for offline memcgs.  Overall, it would
    introduce much more complexity to both SLUB and memcg than this small
    patch.
    
    Regarding to SLAB, there's no problem with it, because it shrinks
    per-cpu/node caches periodically.  Thanks to list_lru reparenting, we no
    longer keep entries for offline cgroups in per-memcg arrays (such as
    memcg_cache_params->memcg_caches), so we do not have to bother if a
    per-memcg cache will be shrunk a bit later than it could be.
    
    [1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650Signed-off-by: 's avatarVladimir Davydov <vdavydov@parallels.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    d6e0b7fa
slab.c 107 KB