1. 08 Jan, 2019 1 commit
    • Ido Schimmel's avatar
      net: bridge: Fix VLANs memory leak · 27973793
      Ido Schimmel authored
      When adding / deleting VLANs to / from a bridge port, the bridge driver
      first tries to propagate the information via switchdev and falls back to
      the 8021q driver in case the underlying driver does not support
      switchdev. This can result in a memory leak [1] when VXLAN and mlxsw
      ports are enslaved to the bridge:
      
      $ ip link set dev vxlan0 master br0
      # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev
      # notification and the bridge driver adds the VLAN on 'vxlan0' via the
      # 8021q driver
      $ bridge vlan add vid 10 dev vxlan0 pvid untagged
      # mlxsw port is enslaved to the bridge
      $ ip link set dev swp1 master br0
      # mlxsw processes the switchdev notification and the 8021q driver is
      # skipped
      $ bridge vlan del vid 10 dev vxlan0
      
      This results in 'struct vlan_info' and 'struct vlan_vid_info' being
      leaked, as they were allocated by the 8021q driver during VLAN addition,
      but never freed as the 8021q driver was skipped during deletion.
      
      Fix this by introducing a new VLAN private flag that indicates whether
      the VLAN was added on the port by switchdev or the 8021q driver. If the
      VLAN was added by the 8021q driver, then we make sure to delete it via
      the 8021q driver as well.
      
      [1]
      unreferenced object 0xffff88822d20b1e8 (size 256):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s)
        hex dump (first 32 bytes):
          e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00  .B..............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<00000000e0178b02>] vlan_vid_add+0x661/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      unreferenced object 0xffff888227454308 (size 32):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s)
        hex dump (first 32 bytes):
          88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff  .. -...... -....
          81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<0000000018050631>] vlan_vid_add+0x3e6/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      
      Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: bridge@lists.linux-foundation.org
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27973793
  2. 16 Dec, 2018 1 commit
  3. 13 Dec, 2018 2 commits
  4. 06 Dec, 2018 4 commits
  5. 27 Nov, 2018 2 commits
    • Nikolay Aleksandrov's avatar
      net: bridge: add no_linklocal_learn bool option · 70e4272b
      Nikolay Aleksandrov authored
      Use the new boolopt API to add an option which disables learning from
      link-local packets. The default is kept as before and learning is
      enabled. This is a simple map from a boolopt bit to a bridge private
      flag that is tested before learning.
      
      v2: pass NULL for extack via sysfs
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70e4272b
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for user-controlled bool options · a428afe8
      Nikolay Aleksandrov authored
      We have been adding many new bridge options, a big number of which are
      boolean but still take up netlink attribute ids and waste space in the skb.
      Recently we discussed learning from link-local packets[1] and decided
      yet another new boolean option will be needed, thus introducing this API
      to save some bridge nl space.
      The API supports changing the value of multiple boolean options at once
      via the br_boolopt_multi struct which has an optmask (which options to
      set, bit per opt) and optval (options' new values). Future boolean
      options will only be added to the br_boolopt_id enum and then will have
      to be handled in br_boolopt_toggle/get. The API will automatically
      add the ability to change and export them via netlink, sysfs can use the
      single boolopt function versions to do the same. The behaviour with
      failing/succeeding is the same as with normal netlink option changing.
      
      If an option requires mapping to internal kernel flag or needs special
      configuration to be enabled then it should be handled in
      br_boolopt_toggle. It should also be able to retrieve an option's current
      state via br_boolopt_get.
      
      v2: WARN_ON() on unsupported option as that shouldn't be possible and
          also will help catch people who add new options without handling
          them for both set and get. Pass down extack so if an option desires
          it could set it on error and be more user-friendly.
      
      [1] https://www.spinics.net/lists/netdev/msg532698.htmlSigned-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a428afe8
  6. 18 Nov, 2018 1 commit
    • Nikolay Aleksandrov's avatar
      net: bridge: fix vlan stats use-after-free on destruction · 9d332e69
      Nikolay Aleksandrov authored
      Syzbot reported a use-after-free of the global vlan context on port vlan
      destruction. When I added per-port vlan stats I missed the fact that the
      global vlan context can be freed before the per-port vlan rcu callback.
      There're a few different ways to deal with this, I've chosen to add a
      new private flag that is set only when per-port stats are allocated so
      we can directly check it on destruction without dereferencing the global
      context at all. The new field in net_bridge_vlan uses a hole.
      
      v2: cosmetic change, move the check to br_process_vlan_info where the
          other checks are done
      v3: add change log in the patch, add private (in-kernel only) flags in a
          hole in net_bridge_vlan struct and use that instead of mixing
          user-space flags with private flags
      
      Fixes: 9163a0fc ("net: bridge: add support for per-port vlan stats")
      Reported-by: syzbot+04681da557a0e49a52e5@syzkaller.appspotmail.com
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d332e69
  7. 09 Nov, 2018 1 commit
  8. 18 Oct, 2018 1 commit
  9. 12 Oct, 2018 1 commit
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for per-port vlan stats · 9163a0fc
      Nikolay Aleksandrov authored
      This patch adds an option to have per-port vlan stats instead of the
      default global stats. The option can be set only when there are no port
      vlans in the bridge since we need to allocate the stats if it is set
      when vlans are being added to ports (and respectively free them
      when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
      largest user of options. The current stats design allows us to add
      these without any changes to the fast-path, it all comes down to
      the per-vlan stats pointer which, if this option is enabled, will
      be allocated for each port vlan instead of using the global bridge-wide
      one.
      
      CC: bridge@lists.linux-foundation.org
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9163a0fc
  10. 26 Sep, 2018 9 commits
  11. 13 Sep, 2018 1 commit
  12. 23 Jul, 2018 1 commit
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for backup port · 2756f68c
      Nikolay Aleksandrov authored
      This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
      allows to set a backup port to be used for known unicast traffic if the
      port has gone carrier down. The backup pointer is rcu protected and set
      only under RTNL, a counter is maintained so when deleting a port we know
      how many other ports reference it as a backup and we remove it from all.
      Also the pointer is in the first cache line which is hot at the time of
      the check and thus in the common case we only add one more test.
      The backup port will be used only for the non-flooding case since
      it's a part of the bridge and the flooded packets will be forwarded to it
      anyway. To remove the forwarding just send a 0/non-existing backup port.
      This is used to avoid numerous scalability problems when using MLAG most
      notably if we have thousands of fdbs one would need to change all of them
      on port carrier going down which takes too long and causes a storm of fdb
      notifications (and again when the port comes back up). In a Multi-chassis
      Link Aggregation setup usually hosts are connected to two different
      switches which act as a single logical switch. Those switches usually have
      a control and backup link between them called peerlink which might be used
      for communication in case a host loses connectivity to one of them.
      We need a fast way to failover in case a host port goes down and currently
      none of the solutions (like bond) cannot fulfill the requirements because
      the participating ports are actually the "master" devices and must have the
      same peerlink as their backup interface and at the same time all of them
      must participate in the bridge device. As Roopa noted it's normal practice
      in routing called fast re-route where a precalculated backup path is used
      when the main one is down.
      Another use case of this is with EVPN, having a single vxlan device which
      is backup of every port. Due to the nature of master devices it's not
      currently possible to use one device as a backup for many and still have
      all of them participate in the bridge (which is master itself).
      More detailed information about MLAG is available at the link below.
      https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
      
      Further explanation and a diagram by Roopa:
      Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2756f68c
  13. 21 Jul, 2018 1 commit
  14. 31 May, 2018 1 commit
  15. 25 May, 2018 1 commit
  16. 03 May, 2018 2 commits
  17. 30 Apr, 2018 1 commit
  18. 01 Apr, 2018 2 commits
  19. 23 Mar, 2018 1 commit
  20. 22 Jan, 2018 1 commit
  21. 13 Dec, 2017 1 commit
    • Nikolay Aleksandrov's avatar
      net: bridge: use rhashtable for fdbs · eb793583
      Nikolay Aleksandrov authored
      Before this patch the bridge used a fixed 256 element hash table which
      was fine for small use cases (in my tests it starts to degrade
      above 1000 entries), but it wasn't enough for medium or large
      scale deployments. Modern setups have thousands of participants in a
      single bridge, even only enabling vlans and adding a few thousand vlan
      entries will cause a few thousand fdbs to be automatically inserted per
      participating port. So we need to scale the fdb table considerably to
      cope with modern workloads, and this patch converts it to use a
      rhashtable for its operations thus improving the bridge scalability.
      Tests show the following results (10 runs each), at up to 1000 entries
      rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
      is 2 times faster and at 30000 it is 50 times faster.
      Obviously this happens because of the properties of the two constructs
      and is expected, rhashtable keeps pretty much a constant time even with
      10000000 entries (tested), while the fixed hash table struggles
      considerably even above 10000.
      As a side effect this also reduces the net_bridge struct size from 3248
      bytes to 1344 bytes. Also note that the key struct is 8 bytes.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb793583
  22. 10 Nov, 2017 1 commit
  23. 02 Nov, 2017 1 commit
    • Nikolay Aleksandrov's avatar
      net: bridge: add notifications for the bridge dev on vlan change · 92899063
      Nikolay Aleksandrov authored
      Currently the bridge device doesn't generate any notifications upon vlan
      modifications on itself because it doesn't use the generic bridge
      notifications.
      With the recent changes we know if anything was modified in the vlan config
      thus we can generate a notification when necessary for the bridge device
      so add support to br_ifinfo_notify() similar to how other combined
      functions are done - if port is present it takes precedence, otherwise
      notify about the bridge. I've explicitly marked the locations where the
      notification should be always for the port by setting bridge to NULL.
      I've also taken the liberty to rearrange each modified function's local
      variables in reverse xmas tree as well.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92899063
  24. 29 Oct, 2017 1 commit
  25. 09 Oct, 2017 1 commit