• Daniel Borkmann's avatar
    kbuild: disable clang's default use of -fmerge-all-constants · 87e0d4f0
    Daniel Borkmann authored
    Prasad reported that he has seen crashes in BPF subsystem with netd
    on Android with arm64 in the form of (note, the taint is unrelated):
    
      [ 4134.721483] Unable to handle kernel paging request at virtual address 800000001
      [ 4134.820925] Mem abort info:
      [ 4134.901283]   Exception class = DABT (current EL), IL = 32 bits
      [ 4135.016736]   SET = 0, FnV = 0
      [ 4135.119820]   EA = 0, S1PTW = 0
      [ 4135.201431] Data abort info:
      [ 4135.301388]   ISV = 0, ISS = 0x00000021
      [ 4135.359599]   CM = 0, WnR = 0
      [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000
      [ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000
      [ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP
      [ 4135.674610] Modules linked in:
      [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S      W       4.14.19+ #1
      [ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000
      [ 4135.731599] PC is at bpf_prog_add+0x20/0x68
      [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
      [ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145
      [ 4135.769062] sp : ffffff801d4e3ce0
      [...]
      [ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000)
      [ 4136.273746] Call trace:
      [...]
      [ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006
      [ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584
      [ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68
      [ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c
      [ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c
      [ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88
    
    Android's netd was basically pinning the uid cookie BPF map in BPF
    fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it
    again resulting in above panic. Issue is that the map was wrongly
    identified as a prog! Above kernel was compiled with clang 4.0,
    and it turns out that clang decided to merge the bpf_prog_iops and
    bpf_map_iops into a single memory location, such that the two i_ops
    could then not be distinguished anymore.
    
    Reason for this miscompilation is that clang has the more aggressive
    -fmerge-all-constants enabled by default. In fact, clang source code
    has a comment about it in lib/AST/ExprConstant.cpp on why it is okay
    to do so:
    
      Pointers with different bases cannot represent the same object.
      (Note that clang defaults to -fmerge-all-constants, which can
      lead to inconsistent results for comparisons involving the address
      of a constant; this generally doesn't matter in practice.)
    
    The issue never appeared with gcc however, since gcc does not enable
    -fmerge-all-constants by default and even *explicitly* states in
    it's option description that using this flag results in non-conforming
    behavior, quote from man gcc:
    
      Languages like C or C++ require each variable, including multiple
      instances of the same variable in recursive calls, to have distinct
      locations, so using this option results in non-conforming behavior.
    
    There are also various clang bug reports open on that matter [1],
    where clang developers acknowledge the non-conforming behavior,
    and refer to disabling it with -fno-merge-all-constants. But even
    if this gets fixed in clang today, there are already users out there
    that triggered this. Thus, fix this issue by explicitly adding
    -fno-merge-all-constants to the kernel's Makefile to generically
    disable this optimization, since potentially other places in the
    kernel could subtly break as well.
    
    Note, there is also a flag called -fmerge-constants (not supported
    by clang), which is more conservative and only applies to strings
    and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In
    gcc's code, the two flags -fmerge-{all-,}constants share the same
    variable internally, so when disabling it via -fno-merge-all-constants,
    then we really don't merge any const data (e.g. strings), and text
    size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o).
    
      $ gcc -fverbose-asm -O2 foo.c -S -o foo.S
        -> foo.S lists -fmerge-constants under options enabled
      $ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S
        -> foo.S doesn't list -fmerge-constants under options enabled
      $ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S
        -> foo.S lists -fmerge-constants under options enabled
    
    Thus, as a workaround we need to set both -fno-merge-all-constants
    *and* -fmerge-constants in the Makefile in order for text size to
    stay as is.
    
      [1] https://bugs.llvm.org/show_bug.cgi?id=18538
    
    
    
    Reported-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Chenbo Feng <fengc@google.com>
    Cc: Richard Smith <richard-llvm@metafoo.co.uk>
    Cc: Chandler Carruth <chandlerc@gmail.com>
    Cc: linux-kernel@vger.kernel.org
    Tested-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
    Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    87e0d4f0