1. 28 Feb, 2019 1 commit
    • Sheng Lan's avatar
      net: netem: fix skb length BUG_ON in __skb_to_sgvec · 5845f706
      Sheng Lan authored
      It can be reproduced by following steps:
      1. virtio_net NIC is configured with gso/tso on
      2. configure nginx as http server with an index file bigger than 1M bytes
      3. use tc netem to produce duplicate packets and delay:
         tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
      4. continually curl the nginx http server to get index file on client
      5. BUG_ON is seen quickly
      
      [10258690.371129] kernel BUG at net/core/skbuff.c:4028!
      [10258690.371748] invalid opcode: 0000 [#1] SMP PTI
      [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
      [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
      [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
      [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
      [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
      [10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
      [10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
      [10258690.372094] Call Trace:
      [10258690.372094]  <IRQ>
      [10258690.372094]  skb_to_sgvec+0x11/0x40
      [10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
      [10258690.372094]  dev_hard_start_xmit+0x9b/0x200
      [10258690.372094]  sch_direct_xmit+0xff/0x260
      [10258690.372094]  __qdisc_run+0x15e/0x4e0
      [10258690.372094]  net_tx_action+0x137/0x210
      [10258690.372094]  __do_softirq+0xd6/0x2a9
      [10258690.372094]  irq_exit+0xde/0xf0
      [10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
      [10258690.372094]  apic_timer_interrupt+0xf/0x20
      [10258690.372094]  </IRQ>
      
      In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
      linear data size and nonlinear data size, thus BUG_ON triggered.
      Because the skb is cloned and a part of nonlinear data is split off.
      
      Duplicate packet is cloned in netem_enqueue() and may be delayed
      some time in qdisc. When qdisc len reached the limit and returns
      NET_XMIT_DROP, the skb will be retransmit later in write queue.
      the skb will be fragmented by tso_fragment(), the limit size
      that depends on cwnd and mss decrease, the skb's nonlinear
      data will be split off. The length of the skb cloned by netem
      will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
      the BUG_ON trigger.
      
      To fix it, netem returns NET_XMIT_SUCCESS to upper stack
      when it clones a duplicate packet.
      
      Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()")
      Signed-off-by: default avatarSheng Lan <lansheng@huawei.com>
      Reported-by: default avatarQin Ji <jiqin.ji@huawei.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5845f706
  2. 25 Feb, 2019 3 commits
    • Vlad Buslov's avatar
      net: sched: act_tunnel_key: fix NULL pointer dereference during init · a3df633a
      Vlad Buslov authored
      Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but
      it is unconditionally dereferenced in tunnel_key_init() error handler.
      Verify that metadata pointer is not NULL before dereferencing it in
      tunnel_key_init error handling code.
      
      Fixes: ee28bb56 ("net/sched: fix memory leak in act_tunnel_key_init()")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3df633a
    • Davide Caratti's avatar
      net/sched: act_skbedit: fix refcount leak when replace fails · 6191da98
      Davide Caratti authored
      when act_skbedit was converted to use RCU in the data plane, we added an
      error path, but we forgot to drop the action refcount in case of failure
      during a 'replace' operation:
      
       # tc actions add action skbedit ptype otherhost pass index 100
       # tc action show action skbedit
       total acts 1
      
               action order 0: skbedit  ptype otherhost pass
                index 100 ref 1 bind 0
       # tc actions replace action skbedit ptype otherhost drop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action show action skbedit
       total acts 1
      
               action order 0: skbedit  ptype otherhost pass
                index 100 ref 2 bind 0
      
      Ensure we call tcf_idr_release(), in case 'params_new' allocation failed,
      also when the action is being replaced.
      
      Fixes: c749cdda ("net/sched: act_skbedit: don't use spinlock in the data path")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6191da98
    • Davide Caratti's avatar
      net/sched: act_ipt: fix refcount leak when replace fails · 8f67c90e
      Davide Caratti authored
      After commit 4e8ddd7f ("net: sched: don't release reference on action
      overwrite"), the error path of all actions was converted to drop refcount
      also when the action was being overwritten. But we forgot act_ipt_init(),
      in case allocation of 'tname' was not successful:
      
       # tc action add action xt -j LOG --log-prefix hello index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "hello" index 100
       # tc action show action xt
       total acts 1
      
               action order 0: tablename: mangle  hook: NF_IP_POST_ROUTING
               target  LOG level warning prefix "hello"
               index 100 ref 1 bind 0
       # tc action replace action xt -j LOG --log-prefix world index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "world" index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action show action xt
       total acts 1
      
               action order 0: tablename: mangle  hook: NF_IP_POST_ROUTING
               target  LOG level warning prefix "hello"
               index 100 ref 2 bind 0
      
      Ensure we call tcf_idr_release(), in case 'tname' allocation failed, also
      when the action is being replaced.
      
      Fixes: 4e8ddd7f ("net: sched: don't release reference on action overwrite")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f67c90e
  3. 12 Feb, 2019 3 commits
    • Cong Wang's avatar
      net_sched: fix two more memory leaks in cls_tcindex · 1db817e7
      Cong Wang authored
      struct tcindex_filter_result contains two parts:
      struct tcf_exts and struct tcf_result.
      
      For the local variable 'cr', its exts part is never used but
      initialized without being released properly on success path. So
      just completely remove the exts part to fix this leak.
      
      For the local variable 'new_filter_result', it is never properly
      released if not used by 'r' on success path.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1db817e7
    • Cong Wang's avatar
      net_sched: fix a memory leak in cls_tcindex · 033b228e
      Cong Wang authored
      When tcindex_destroy() destroys all the filter results in
      the perfect hash table, it invokes the walker to delete
      each of them. However, results with class==0 are skipped
      in either tcindex_walk() or tcindex_delete(), which causes
      a memory leak reported by kmemleak.
      
      This patch fixes it by skipping the walker and directly
      deleting these filter results so we don't miss any filter
      result.
      
      As a result of this change, we have to initialize exts->net
      properly in tcindex_alloc_perfect_hash(). For net-next, we
      need to consider whether we should initialize ->net in
      tcf_exts_init() instead, before that just directly test
      CONFIG_NET_CLS_ACT=y.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      033b228e
    • Cong Wang's avatar
      net_sched: fix a race condition in tcindex_destroy() · 8015d93e
      Cong Wang authored
      tcindex_destroy() invokes tcindex_destroy_element() via
      a walker to delete each filter result in its perfect hash
      table, and tcindex_destroy_element() calls tcindex_delete()
      which schedules tcf RCU works to do the final deletion work.
      Unfortunately this races with the RCU callback
      __tcindex_destroy(), which could lead to use-after-free as
      reported by Adrian.
      
      Fix this by migrating this RCU callback to tcf RCU work too,
      as that workqueue is ordered, we will not have use-after-free.
      
      Note, we don't need to hold netns refcnt because we don't call
      tcf_exts_destroy() here.
      
      Fixes: 27ce4f05 ("net_sched: use tcf_queue_work() in tcindex filter")
      Reported-by: default avatarAdrian <bugs@abtelecom.ro>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8015d93e
  4. 11 Feb, 2019 1 commit
  5. 04 Feb, 2019 1 commit
  6. 17 Jan, 2019 1 commit
  7. 16 Jan, 2019 5 commits
  8. 20 Dec, 2018 1 commit
  9. 15 Dec, 2018 1 commit
  10. 14 Dec, 2018 1 commit
  11. 10 Dec, 2018 1 commit
  12. 09 Dec, 2018 1 commit
  13. 06 Dec, 2018 1 commit
  14. 05 Dec, 2018 2 commits
  15. 01 Dec, 2018 2 commits
    • Paul E. McKenney's avatar
      net/sched: Replace call_rcu_bh() and rcu_barrier_bh() · ae0e3349
      Paul E. McKenney authored
      Now that call_rcu()'s callback is not invoked until after bh-disable
      regions of code have completed (in addition to explicitly marked
      RCU read-side critical sections), call_rcu() can be used in place
      of call_rcu_bh().  Similarly, rcu_barrier() can be used in place o
      frcu_barrier_bh().  This commit therefore makes these changes.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: <netdev@vger.kernel.org>
      ae0e3349
    • Davide Caratti's avatar
      net/sched: act_police: fix memory leak in case of invalid control action · fd6d4338
      Davide Caratti authored
      when users set an invalid control action, kmemleak complains as follows:
      
       # echo clear >/sys/kernel/debug/kmemleak
       # ./tdc.py -e b48b
       Test b48b: Add police action with exceed goto chain control action
       All test results:
      
       1..1
       ok 1 - b48b # Add police action with exceed goto chain control action
       about to flush the tap output if tests need to be skipped
       done flushing skipped test tap output
       # echo scan >/sys/kernel/debug/kmemleak
       # cat /sys/kernel/debug/kmemleak
       unreferenced object 0xffffa0fafbc3dde0 (size 96):
        comm "tc", pid 2358, jiffies 4294922738 (age 17.022s)
        hex dump (first 32 bytes):
          2a 00 00 20 00 00 00 00 00 00 7d 00 00 00 00 00  *.. ......}.....
          f8 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000648803d2>] tcf_action_init_1+0x384/0x4c0
          [<00000000cb69382e>] tcf_action_init+0x12b/0x1a0
          [<00000000847ef0d4>] tcf_action_add+0x73/0x170
          [<0000000093656e14>] tc_ctl_action+0x122/0x160
          [<0000000023c98e32>] rtnetlink_rcv_msg+0x263/0x2d0
          [<000000003493ae9c>] netlink_rcv_skb+0x4d/0x130
          [<00000000de63f8ba>] netlink_unicast+0x209/0x2d0
          [<00000000c3da0ebe>] netlink_sendmsg+0x2c1/0x3c0
          [<000000007a9e0753>] sock_sendmsg+0x33/0x40
          [<00000000457c6d2e>] ___sys_sendmsg+0x2a0/0x2f0
          [<00000000c5c6a086>] __sys_sendmsg+0x5e/0xa0
          [<00000000446eafce>] do_syscall_64+0x5b/0x180
          [<000000004aa871f2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
          [<00000000450c38ef>] 0xffffffffffffffff
      
      change tcf_police_init() to avoid leaking 'new' in case TCA_POLICE_RESULT
      contains TC_ACT_GOTO_CHAIN extended action.
      
      Fixes: c08f5ed5 ("net/sched: act_police: disallow 'goto chain' on fallback control action")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd6d4338
  16. 30 Nov, 2018 1 commit
    • Christoph Paasch's avatar
      net: Prevent invalid access to skb->prev in __qdisc_drop_all · 9410d386
      Christoph Paasch authored
      __qdisc_drop_all() accesses skb->prev to get to the tail of the
      segment-list.
      
      With commit 68d2f84a ("net: gro: properly remove skb from list")
      the skb-list handling has been changed to set skb->next to NULL and set
      the list-poison on skb->prev.
      
      With that change, __qdisc_drop_all() will panic when it tries to
      dereference skb->prev.
      
      Since commit 992cba7e ("net: Add and use skb_list_del_init().")
      __list_del_entry is used, leaving skb->prev unchanged (thus,
      pointing to the list-head if it's the first skb of the list).
      This will make __qdisc_drop_all modify the next-pointer of the list-head
      and result in a panic later on:
      
      [   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
      [   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
      [   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
      [   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
      [   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
      [   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
      [   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
      [   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
      [   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
      [   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
      [   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
      [   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
      [   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
      [   34.514593] Call Trace:
      [   34.514893]  <IRQ>
      [   34.515157]  napi_gro_receive+0x93/0x150
      [   34.515632]  receive_buf+0x893/0x3700
      [   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
      [   34.516629]  ? virtnet_probe+0x1b40/0x1b40
      [   34.517153]  ? __stable_node_chain+0x4d0/0x850
      [   34.517684]  ? kfree+0x9a/0x180
      [   34.518067]  ? __kasan_slab_free+0x171/0x190
      [   34.518582]  ? detach_buf+0x1df/0x650
      [   34.519061]  ? lapic_next_event+0x5a/0x90
      [   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
      [   34.520093]  virtnet_poll+0x2df/0xd60
      [   34.520533]  ? receive_buf+0x3700/0x3700
      [   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
      [   34.521631]  ? htb_dequeue+0x1817/0x25f0
      [   34.522107]  ? sch_direct_xmit+0x142/0xf30
      [   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
      [   34.523155]  net_rx_action+0x2f6/0xc50
      [   34.523601]  ? napi_complete_done+0x2f0/0x2f0
      [   34.524126]  ? kasan_check_read+0x11/0x20
      [   34.524608]  ? _raw_spin_lock+0x7d/0xd0
      [   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
      [   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
      [   34.526130]  ? apic_ack_irq+0x9e/0xe0
      [   34.526567]  __do_softirq+0x188/0x4b5
      [   34.527015]  irq_exit+0x151/0x180
      [   34.527417]  do_IRQ+0xdb/0x150
      [   34.527783]  common_interrupt+0xf/0xf
      [   34.528223]  </IRQ>
      
      This patch makes sure that skb->prev is set to NULL when entering
      netem_enqueue.
      
      Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Fixes: 68d2f84a ("net: gro: properly remove skb from list")
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9410d386
  17. 23 Nov, 2018 1 commit
    • Davide Caratti's avatar
      net/sched: act_police: add missing spinlock initialization · 484afd1b
      Davide Caratti authored
      commit f2cbd485 ("net/sched: act_police: fix race condition on state
      variables") introduces a new spinlock, but forgets its initialization.
      Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
      action is newly created, to avoid the following lockdep splat:
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       <...>
       Call Trace:
        dump_stack+0x85/0xcb
        register_lock_class+0x581/0x590
        __lock_acquire+0xd4/0x1330
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? lock_acquire+0x9e/0x1a0
        lock_acquire+0x9e/0x1a0
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? tcf_police_init+0x55a/0x650 [act_police]
        _raw_spin_lock_bh+0x34/0x40
        ? tcf_police_init+0x2fa/0x650 [act_police]
        tcf_police_init+0x2fa/0x650 [act_police]
        tcf_action_init_1+0x384/0x4c0
        tcf_action_init+0xf6/0x160
        tcf_action_add+0x73/0x170
        tc_ctl_action+0x122/0x160
        rtnetlink_rcv_msg+0x2a4/0x490
        ? netlink_deliver_tap+0x99/0x400
        ? validate_linkmsg+0x370/0x370
        netlink_rcv_skb+0x4d/0x130
        netlink_unicast+0x196/0x230
        netlink_sendmsg+0x2e5/0x3e0
        sock_sendmsg+0x36/0x40
        ___sys_sendmsg+0x280/0x2f0
        ? _raw_spin_unlock+0x24/0x30
        ? handle_pte_fault+0xafe/0xf30
        ? find_held_lock+0x2d/0x90
        ? syscall_trace_enter+0x1df/0x360
        ? __sys_sendmsg+0x5e/0xa0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f1841c7cf10
       Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
       RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
       RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
       RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
       R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
       R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000
      
      Fixes: f2cbd485 ("net/sched: act_police: fix race condition on state variables")
      Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      484afd1b
  18. 20 Nov, 2018 5 commits
  19. 17 Nov, 2018 8 commits