Skip to content
  • Kirill Tkhai's avatar
    mm: make counting of list_lru_one::nr_items lockless · 0c7c1bed
    Kirill Tkhai authored
    During the reclaiming slab of a memcg, shrink_slab iterates over all
    registered shrinkers in the system, and tries to count and consume
    objects related to the cgroup.  In case of memory pressure, this behaves
    bad: I observe high system time and time spent in list_lru_count_one()
    for many processes on RHEL7 kernel.
    
    This patch makes list_lru_node::memcg_lrus rcu protected, that allows to
    skip taking spinlock in list_lru_count_one().
    
    Shakeel Butt with the patch observes significant perf graph change.  He
    says:
    
    ========================================================================
    Setup: running a fork-bomb in a memcg of 200MiB on a 8GiB and 4 vcpu
    VM and recording the trace with 'perf record -g -a'.
    
    The trace without the patch:
    
    +  34.19%     fb.sh  [kernel.kallsyms]  [k] queued_spin_lock_slowpath
    +  30.77%     fb.sh  [kernel.kallsyms]  [k] _raw_spin_lock
    +   3.53%     fb.sh  [kernel.kallsyms]  [k] list_lru_count_one
    +   2.26%     fb.sh  [kernel.kallsyms]  [k] super_cache_count
    +   1.68%     fb.sh  [kernel.kallsyms]  [k] shrink_slab
    +   0.59%     fb.sh  [kernel.kallsyms]  [k] down_read_trylock
    +   0.48%     fb.sh  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
    +   0.38%     fb.sh  [kernel.kallsyms]  [k] shrink_node_memcg
    +   0.32%     fb.sh  [kernel.kallsyms]  [k] queue_work_on
    +   0.26%     fb.sh  [kernel.kallsyms]  [k] count_shadow_nodes
    
    With the patch:
    
    +   0.16%     swapper  [kernel.kallsyms]    [k] default_idle
    +   0.13%     oom_reaper  [kernel.kallsyms]    [k] mutex_spin_on_owner
    +   0.05%     perf  [kernel.kallsyms]    [k] copy_user_generic_string
    +   0.05%     init.real  [kernel.kallsyms]    [k] wait_consider_task
    +   0.05%     kworker/0:0  [kernel.kallsyms]    [k] finish_task_switch
    +   0.04%     kworker/2:1  [kernel.kallsyms]    [k] finish_task_switch
    +   0.04%     kworker/3:1  [kernel.kallsyms]    [k] finish_task_switch
    +   0.04%     kworker/1:0  [kernel.kallsyms]    [k] finish_task_switch
    +   0.03%     binary  [kernel.kallsyms]    [k] copy_page
    ========================================================================
    
    Thanks Shakeel for the testing.
    
    [ktkhai@virtuozzo.com: v2]
      Link: http://lkml.kernel.org/r/151203869520.3915.2587549826865799173.stgit@localhost.localdomain
    Link: http://lkml.kernel.org/r/150583358557.26700.8490036563698102569.stgit@localhost.localdomain
    
    
    Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
    Tested-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    0c7c1bed