Skip to content
  • Daniel Borkmann's avatar
    bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K · fdadd049
    Daniel Borkmann authored
    Michael and Sandipan report:
    
      Commit ede95a63 introduced a bpf_jit_limit tuneable to limit BPF
      JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
      and is adjusted again at init time if MODULES_VADDR is defined.
    
      For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
      the compile-time default at boot-time, which is 0x9c400000 when
      using 64K page size. This overflows the signed 32-bit bpf_jit_limit
      value:
    
      root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
      -1673527296
    
      and can cause various unexpected failures throughout the network
      stack. In one case `strace dhclient eth0` reported:
    
      setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
                 16) = -1 ENOTSUPP (Unknown error 524)
    
      and similar failures can be seen with tools like tcpdump. This doesn't
      always reproduce however, and I'm not sure why. The more consistent
      failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
      host would time out on systemd/netplan configuring a virtio-net NIC
      with no noticeable errors in the logs.
    
    Given this and also given that in near future some architectures like
    arm64 will have a custom area for BPF JIT image allocations we should
    get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
    4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
    so therefore add another overridable bpf_jit_alloc_exec_limit() helper
    function which returns the possible size of the memory area for deriving
    the default heuristic in bpf_jit_charge_init().
    
    Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
    bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
    JIT memory provider, and therefore in case archs implement their custom
    module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
    vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.
    
    Additionally, for archs supporting large page sizes, we should change
    the sysctl to be handled as long to not run into sysctl restrictions
    in future.
    
    Fixes: ede95a63
    
     ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
    Reported-by: default avatarSandipan Das <sandipan@linux.ibm.com>
    Reported-by: default avatarMichael Roth <mdroth@linux.vnet.ibm.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Tested-by: default avatarMichael Roth <mdroth@linux.vnet.ibm.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    fdadd049