• Martin KaFai Lau's avatar
    bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 · 695ba265
    Martin KaFai Lau authored
    After doing map_perf_test with a much bigger
    BPF_F_NO_COMMON_LRU map, the perf report shows a
    lot of time spent in rotating the inactive list (i.e.
    > map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}'
    19644783 (19M/s)
    > map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
    6283930 (6.28M/s)
    By inactive, it usually means the element is not in cache.  Hence,
    there is a need to tune the PERCPU_NR_SCANS value.
    This patch finds a better number of elements to
    scan during each list rotation.  The PERCPU_NR_SCANS (which
    is defined the same as PERCPU_FREE_TARGET) decreases
    from 16 elements to 4 elements.  This change only
    affects the BPF_F_NO_COMMON_LRU map.
    The test_lru_dist does not show meaningful difference
    between 16 and 4.  Our production L4 load balancer which uses
    the LRU map for conntrack-ing also shows little change in cache
    hit rate.  Since both benchmark and production data show no
    cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
    We can consider making it configurable if we find a usecase
    later that shows another value works better and/or use
    a different rotation strategy.
    After this change:
    > map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
    9240324 (9.2M/s)
    i.e. 6.28M/s -> 9.2M/s
    The test_lru_dist has not shown meaningful difference:
    > test_lru_dist zipf.100k.a1_01.out 4000 1:
    nr_misses: 31575 (Before) vs 31566 (After)
    > test_lru_dist zipf.100k.a0_01.out 40000 1
    nr_misses: 67036 (Before) vs 67031 (After)
    Signed-off-by: 's avatarMartin KaFai Lau <kafai@fb.com>
    Acked-by: 's avatarAlexei Starovoitov <ast@kernel.org>
    Acked-by: 's avatarDaniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
bpf_lru_list.c 17.5 KB