Skip to content
Snippets Groups Projects
  1. Dec 08, 2021
  2. Dec 07, 2021
  3. Dec 04, 2021
    • Lee Jones's avatar
      net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero · 2be6d4d1
      Lee Jones authored
      
      Currently, due to the sequential use of min_t() and clamp_t() macros,
      in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is not set, the logic
      sets tx_max to 0.  This is then used to allocate the data area of the
      SKB requested later in cdc_ncm_fill_tx_frame().
      
      This does not cause an issue presently because when memory is
      allocated during initialisation phase of SKB creation, more memory
      (512b) is allocated than is required for the SKB headers alone (320b),
      leaving some space (512b - 320b = 192b) for CDC data (172b).
      
      However, if more elements (for example 3 x u64 = [24b]) were added to
      one of the SKB header structs, say 'struct skb_shared_info',
      increasing its original size (320b [320b aligned]) to something larger
      (344b [384b aligned]), then suddenly the CDC data (172b) no longer
      fits in the spare SKB data area (512b - 384b = 128b).
      
      Consequently the SKB bounds checking semantics fails and panics:
      
        skbuff: skb_over_panic: text:ffffffff830a5b5f len:184 put:172   \
           head:ffff888119227c00 data:ffff888119227c00 tail:0xb8 end:0x80 dev:<NULL>
      
        ------------[ cut here ]------------
        kernel BUG at net/core/skbuff.c:110!
        RIP: 0010:skb_panic+0x14f/0x160 net/core/skbuff.c:106
        <snip>
        Call Trace:
         <IRQ>
         skb_over_panic+0x2c/0x30 net/core/skbuff.c:115
         skb_put+0x205/0x210 net/core/skbuff.c:1877
         skb_put_zero include/linux/skbuff.h:2270 [inline]
         cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1116 [inline]
         cdc_ncm_fill_tx_frame+0x127f/0x3d50 drivers/net/usb/cdc_ncm.c:1293
         cdc_ncm_tx_fixup+0x98/0xf0 drivers/net/usb/cdc_ncm.c:1514
      
      By overriding the max value with the default CDC_NCM_NTB_MAX_SIZE_TX
      when not offered through the system provided params, we ensure enough
      data space is allocated to handle the CDC data, meaning no crash will
      occur.
      
      Cc: Oliver Neukum <oliver@neukum.org>
      Fixes: 289507d3 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Reviewed-by: default avatarBjørn Mork <bjorn@mork.no>
      Link: https://lore.kernel.org/r/20211202143437.1411410-1-lee.jones@linaro.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2be6d4d1
    • Manish Chopra's avatar
      qede: validate non LSO skb length · 8e227b19
      Manish Chopra authored
      
      Although it is unlikely that stack could transmit a non LSO
      skb with length > MTU, however in some cases or environment such
      occurrences actually resulted into firmware asserts due to packet
      length being greater than the max supported by the device (~9700B).
      
      This patch adds the safeguard for such odd cases to avoid firmware
      asserts.
      
      v2: Added "Fixes" tag with one of the initial driver commit
          which enabled the TX traffic actually (as this was probably
          day1 issue which was discovered recently by some customer
          environment)
      
      Fixes: a2ec6172 ("qede: Add support for link")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAlok Prasad <palok@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Link: https://lore.kernel.org/r/20211203174413.13090-1-manishc@marvell.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8e227b19
  4. Dec 03, 2021
    • Dan Carpenter's avatar
      net: altera: set a couple error code in probe() · badd7857
      Dan Carpenter authored
      
      There are two error paths which accidentally return success instead of
      a negative error code.
      
      Fixes: bbd2190c ("Altera TSE: Add main and header file for Altera Ethernet Driver")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      badd7857
    • Jiasheng Jiang's avatar
      net: bcm4908: Handle dma_set_coherent_mask error codes · 128f6ec9
      Jiasheng Jiang authored
      
      The return value of dma_set_coherent_mask() is not always 0.
      To catch the exception in case that dma is not support the mask.
      
      Fixes: 9d61d138 ("net: broadcom: rename BCM4908 driver & update DT binding")
      Signed-off-by: default avatarJiasheng Jiang <jiasheng@iscas.ac.cn>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      128f6ec9
    • Li Zhijian's avatar
      selftests: net/fcnal-test.sh: add exit code · 0f8a3b48
      Li Zhijian authored
      
      Previously, the selftest framework always treats it as *ok* even though
      some of them are failed actually. That's because the script always
      returns 0.
      
      It supports PASS/FAIL/SKIP exit code now.
      
      CC: Philip Li <philip.li@intel.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f8a3b48
    • Eric Dumazet's avatar
      bonding: make tx_rebalance_counter an atomic · dac8e00f
      Eric Dumazet authored
      
      KCSAN reported a data-race [1] around tx_rebalance_counter
      which can be accessed from different contexts, without
      the protection of a lock/mutex.
      
      [1]
      BUG: KCSAN: data-race in bond_alb_init_slave / bond_alb_monitor
      
      write to 0xffff888157e8ca24 of 4 bytes by task 7075 on cpu 0:
       bond_alb_init_slave+0x713/0x860 drivers/net/bonding/bond_alb.c:1613
       bond_enslave+0xd94/0x3010 drivers/net/bonding/bond_main.c:1949
       do_set_master net/core/rtnetlink.c:2521 [inline]
       __rtnl_newlink net/core/rtnetlink.c:3475 [inline]
       rtnl_newlink+0x1298/0x13b0 net/core/rtnetlink.c:3506
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2491
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x6e1/0x7d0 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2492
       __do_sys_sendmsg net/socket.c:2501 [inline]
       __se_sys_sendmsg net/socket.c:2499 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888157e8ca24 of 4 bytes by task 1082 on cpu 1:
       bond_alb_monitor+0x8f/0xc00 drivers/net/bonding/bond_alb.c:1511
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       worker_thread+0x616/0xa70 kernel/workqueue.c:2445
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000001 -> 0x00000064
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1082 Comm: kworker/u4:3 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: bond1 bond_alb_monitor
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dac8e00f
    • Eric Dumazet's avatar
      tcp: fix another uninit-value (sk_rx_queue_mapping) · 03cfda4f
      Eric Dumazet authored
      KMSAN is still not happy [1].
      
      I missed that passive connections do not inherit their
      sk_rx_queue_mapping values from the request socket,
      but instead tcp_child_process() is calling
      sk_mark_napi_id(child, skb)
      
      We have many sk_mark_napi_id() callers, so I am providing
      a new helper, forcing the setting sk_rx_queue_mapping
      and sk_napi_id.
      
      Note that we had no KMSAN report for sk_napi_id because
      passive connections got a copy of this field from the listener.
      sk_rx_queue_mapping in the other hand is inside the
      sk_dontcopy_begin/sk_dontcopy_end so sk_clone_lock()
      leaves this field uninitialized.
      
      We might remove dead code populating req->sk_rx_queue_mapping
      in the future.
      
      [1]
      
      BUG: KMSAN: uninit-value in __sk_rx_queue_set include/net/sock.h:1924 [inline]
      BUG: KMSAN: uninit-value in sk_rx_queue_update include/net/sock.h:1938 [inline]
      BUG: KMSAN: uninit-value in sk_mark_napi_id include/net/busy_poll.h:136 [inline]
      BUG: KMSAN: uninit-value in tcp_child_process+0...
      03cfda4f
    • Eric Dumazet's avatar
      inet: use #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING consistently · a9418924
      Eric Dumazet authored
      
      Since commit 4e1beecc ("net/sock: Add kernel config
      SOCK_RX_QUEUE_MAPPING"),
      sk_rx_queue_mapping access is guarded by CONFIG_SOCK_RX_QUEUE_MAPPING.
      
      Fixes: 54b92e84 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Tariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9418924
    • Li Zhijian's avatar
      selftests/tc-testing: Fix cannot create /sys/bus/netdevsim/new_device: Directory nonexistent · db925bca
      Li Zhijian authored
      
      Install netdevsim to provide /sys/bus/netdevsim/new_device interface.
      
      It helps to fix:
       # ok 97 9a7d - Change ETS strict band without quantum # skipped - skipped - previous setup failed 11 ce7d
       #
       #
       # -----> prepare stage *** Could not execute: "echo "1 1 4" > /sys/bus/netdevsim/new_device"
       #
       # -----> prepare stage *** Error message: "/bin/sh: 1: cannot create /sys/bus/netdevsim/new_device: Directory nonexistent
       # "
       #
       # -----> prepare stage *** Aborting test run.
       #
       #
       # <_io.BufferedReader name=5> *** stdout ***
       #
      
      Signed-off-by: default avatarLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db925bca
    • Li Zhijian's avatar
      selftests/tc-testing: add missing config · a8c9505c
      Li Zhijian authored
      
      qdiscs/fq_pie requires CONFIG_NET_SCH_FQ_PIE, otherwise tc will fail
      to create a fq_pie qdisc.
      
      It fixes following issue:
       # not ok 57 83be - Create FQ-PIE with invalid number of flows
       #       Command exited with 2, expected 0
       # Error: Specified qdisc not found.
      
      Signed-off-by: default avatarLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8c9505c
    • Li Zhijian's avatar
      selftests/tc-testing: add exit code · 96f38967
      Li Zhijian authored
      
      Mark the summary result as FAIL to prevent from confusing the selftest
      framework if some of them are failed.
      
      Previously, the selftest framework always treats it as *ok* even though
      some of them are failed actually. That's because the script tdc.sh always
      return 0.
      
       # All test results:
       #
       # 1..97
       # ok 1 83be - Create FQ-PIE with invalid number of flows
       # ok 2 8b6e - Create RED with no flags
      [...snip]
       # ok 6 5f15 - Create RED with flags ECN, harddrop
       # ok 7 53e8 - Create RED with flags ECN, nodrop
       # ok 8 d091 - Fail to create RED with only nodrop flag
       # ok 9 af8e - Create RED with flags ECN, nodrop, harddrop
       # not ok 10 ce7d - Add mq Qdisc to multi-queue device (4 queues)
       #       Could not match regex pattern. Verify command output:
       # qdisc mq 1: root
       # qdisc fq_codel 0: parent 1:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
       # qdisc fq_codel 0: parent 1:3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
      [...snip]
       # ok 96 6979 - Change quantum of a strict ETS band
       # ok 97 9a7d - Change ETS strict band without quantum
       #
       #
       #
       #
       ok 1 selftests: tc-testing: tdc.sh <<< summary result
      
      CC: Philip Li <philip.li@intel.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarLi Zhijian <zhijianx.li@intel.com>
      Acked-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96f38967
    • Peilin Ye's avatar
      selftests/fib_tests: Rework fib_rp_filter_test() · f6071e5e
      Peilin Ye authored
      Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
      failing.  ping sockets are bound to dummy1 using the "-I" option
      (SO_BINDTODEVICE), but socket lookup is failing when receiving ping
      replies, since the routing table thinks they belong to dummy0.
      
      For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
      When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
      is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
      raw_sk_bound_dev_eq() check fails.  Similar things happen in
      ping_lookup() for SOCK_DGRAM sockets.
      
      These tests used to pass due to a bug [1] in iputils, where "ping -I"
      actually did not bind ICMP message sockets to device.  The bug has been
      fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
      to the specific device") in 2016, which is why our rp_filter tests
      started to fail.  See [2] .
      
      Fixing the tests while keeping everything in one netns turns out to be
      nontrivial.  Rework the tests and build the following topology:
      
       ┌─────────────────────────────┐    ┌─────────────────────────────┐
       │  network namespace 1 (ns1)  │    │  network namespace 2 (ns2)  │
       │                             │    │                             │
       │  ┌────┐     ┌─────┐         │    │  ┌─────┐            ┌────┐  │
       │  │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │  │
       │  └────┘     ├─────┴──────┐  │    │  ├─────┴──────┐     └────┘  │
       │             │192.0.2.1/24│  │    │  │192.0.2.1/24│             │
       │             └────────────┘  │    │  └────────────┘             │
       └─────────────────────────────┘    └─────────────────────────────┘
      
      Consider sending an ICMP_ECHO packet A in ns2.  Both source and
      destination IP addresses are 192.0.2.1, and we use strict mode rp_filter
      in both ns1 and ns2:
      
        1. A is routed to lo since its destination IP address is one of ns2's
           local addresses (veth2);
        2. A is redirected from lo's egress to veth2's egress using mirred;
        3. A arrives at veth1's ingress in ns1;
        4. A is redirected from veth1's ingress to lo's ingress, again, using
           mirred;
        5. In __fib_validate_source(), fib_info_nh_uses_dev() returns false,
           since A was received on lo, but reverse path lookup says veth1;
        6. However A is not dropped since we have relaxed this check for lo in
           commit 66f82095 ("fib: relax source validation check for loopback
           packets");
      
      Making sure A is not dropped here in this corner case is the whole point
      of having this test.
      
        7. As A reaches the ICMP layer, an ICMP_ECHOREPLY packet, B, is
           generated;
        8. Similarly, B is redirected from lo's egress to veth1's egress (in
           ns1), then redirected once again from veth2's ingress to lo's
           ingress (in ns2), using mirred.
      
      Also test "ping 127.0.0.1" from ns2.  It does not trigger the relaxed
      check in __fib_validate_source(), but just to make sure the topology
      works with loopback addresses.
      
      Tested with ping from iputils 20210722-41-gf9fb573:
      
      $ ./fib_tests.sh -t rp_filter
      
      IPv4 rp_filter tests
          TEST: rp_filter passes local packets		[ OK ]
          TEST: rp_filter passes loopback packets		[ OK ]
      
      [1] https://github.com/iputils/iputils/issues/55
      [2] https://github.com/iputils/iputils/commit/f455fee41c077d4b700a473b2f5b3487b8febc1d
      
      
      
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Fixes: adb701d6 ("selftests: add a test case for rp_filter")
      Reviewed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211201004720.6357-1-yepeilin.cs@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f6071e5e
  5. Dec 02, 2021
Loading