Skip to content
Snippets Groups Projects
  1. Jun 28, 2021
    • Vladimir Oltean's avatar
      net: bridge: allow the switchdev replay functions to be called for deletion · 7e8c1858
      Vladimir Oltean authored
      
      When a switchdev port leaves a LAG that is a bridge port, the switchdev
      objects and port attributes offloaded to that port are not removed:
      
      ip link add br0 type bridge
      ip link add bond0 type bond mode 802.3ad
      ip link set swp0 master bond0
      ip link set bond0 master br0
      bridge vlan add dev bond0 vid 100
      ip link set swp0 nomaster
      
      VLAN 100 will remain installed on swp0 despite it going into standalone
      mode, because as far as the bridge is concerned, nothing ever happened
      to its bridge port.
      
      Let's extend the bridge vlan, fdb and mdb replay functions to take a
      'bool adding' argument, and make DSA and ocelot call the replay
      functions with 'adding' as false from the switchdev unsync path, for the
      switch port that leaves the bridge.
      
      Note that this patch in itself does not salvage anything, because in the
      current pull mode of operation, DSA still needs to call the replay
      helpers with adding=false. This will be done in another patch.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e8c1858
    • Vladimir Oltean's avatar
      net: bridge: ignore switchdev events for LAG ports which didn't request replay · 0d2cfbd4
      Vladimir Oltean authored
      
      There is a slight inconvenience in the switchdev replay helpers added
      recently, and this is when:
      
      ip link add br0 type bridge
      ip link add bond0 type bond
      ip link set bond0 master br0
      bridge vlan add dev bond0 vid 100
      ip link set swp0 master bond0
      ip link set swp1 master bond0
      
      Since the underlying driver (currently only DSA) asks for a replay of
      VLANs when swp0 and swp1 join the LAG because it is bridged, what will
      happen is that DSA will try to react twice on the VLAN event for swp0.
      This is not really a huge problem right now, because most drivers accept
      duplicates since the bridge itself does, but it will become a problem
      when we add support for replaying switchdev object deletions.
      
      Let's fix this by adding a blank void *ctx in the replay helpers, which
      will be passed on by the bridge in the switchdev notifications. If the
      context is NULL, everything is the same as before. But if the context is
      populated with a valid pointer, the underlying switchdev driver
      (currently DSA) can use the pointer to 'see through' the bridge port
      (which in the example above is bond0) and 'know' that the event is only
      for a particular physical port offloading that bridge port, and not for
      all of them.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d2cfbd4
  2. Jun 21, 2021
    • Vladimir Oltean's avatar
      net: dsa: targeted MTU notifiers should only match on one port · 88faba20
      Vladimir Oltean authored
      
      dsa_slave_change_mtu() calls dsa_port_mtu_change() twice:
      - it sends a cross-chip notifier with the MTU of the CPU port which is
        used to update the DSA links.
      - it sends one targeted MTU notifier which is supposed to only match the
        user port on which we are changing the MTU. The "propagate_upstream"
        variable is used here to bypass the cross-chip notifier system from
        switch.c
      
      But due to a mistake, the second, targeted notifier matches not only on
      the user port, but also on the DSA link which is a member of the same
      switch, if that exists.
      
      And because the DSA links of the entire dst were programmed in a
      previous round to the largest_mtu via a "propagate_upstream == true"
      notification, then the dsa_port_mtu_change(propagate_upstream == false)
      call that is immediately upcoming will break the MTU on the one DSA link
      which is chip-wise local to the dp whose MTU is changing right now.
      
      Example given this daisy chain topology:
      
         sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
      [  cpu  ] [  user ] [  user ] [  dsa  ] [  user ]
      [   x   ] [       ] [       ] [   x   ] [       ]
                                        |
                                        +---------+
                                                  |
         sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
      [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
      [       ] [       ] [       ] [       ] [   x   ]
      
      ip link set sw0p1 mtu 9000
      ip link set sw1p1 mtu 9000 # at this stage, sw0p1 and sw1p1 can talk
                                 # to one another using jumbo frames
      ip link set sw0p2 mtu 1500 # this programs the sw0p3 DSA link first to
                                 # the largest_mtu of 9000, then reprograms it to
                                 # 1500 with the "propagate_upstream == false"
                                 # notifier, breaking communication between
                                 # sw0p1 and sw1p1
      
      To escape from this situation, make the targeted match really match on a
      single port - the user port, and rename the "propagate_upstream"
      variable to "targeted_match" to clarify the intention and avoid future
      issues.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88faba20
  3. Apr 21, 2021
  4. Mar 23, 2021
  5. Feb 16, 2021
  6. Feb 15, 2021
  7. Feb 13, 2021
    • Vladimir Oltean's avatar
      net: dsa: act as passthrough for bridge port flags · a8b659e7
      Vladimir Oltean authored
      
      There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
      expressed by the bridge through switchdev, and not all of them can be
      emulated by DSA mid-layer API at the same time.
      
      One possible configuration is when the bridge offloads the port flags
      using a mask that has a single bit set - therefore only one feature
      should change. However, DSA currently groups together unicast and
      multicast flooding in the .port_egress_floods method, which limits our
      options when we try to add support for turning off broadcast flooding:
      do we extend .port_egress_floods with a third parameter which b53 and
      mv88e6xxx will ignore? But that means that the DSA layer, which
      currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
      see that .port_egress_floods is implemented, and will report that all 3
      types of flooding are supported - not necessarily true.
      
      Another configuration is when the user specifies more than one flag at
      the same time, in the same netlink message. If we were to create one
      individual function per offloadable bridge port flag, we would limit the
      expressiveness of the switch driver of refusing certain combinations of
      flag values. For example, a switch may not have an explicit knob for
      flooding of unknown multicast, just for flooding in general. In that
      case, the only correct thing to do is to allow changes to BR_FLOOD and
      BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
      a separate .port_set_unicast_flood and .port_set_multicast_flood would
      not allow the driver to possibly reject that.
      
      Also, DSA doesn't consider it necessary to inform the driver that a
      SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
      just calls .port_egress_floods for the CPU port. When we'll add support
      for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
      problem because the flood settings will need to be held statefully in
      the DSA middle layer, otherwise changing the mrouter port attribute will
      impact the flooding attribute. And that's _assuming_ that the underlying
      hardware doesn't have anything else to do when a multicast router
      attaches to a port than flood unknown traffic to it.  If it does, there
      will need to be a dedicated .port_set_mrouter anyway.
      
      So we need to let the DSA drivers see the exact form that the bridge
      passes this switchdev attribute in, otherwise we are standing in the
      way. Therefore we also need to use this form of language when
      communicating to the driver that it needs to configure its initial
      (before bridge join) and final (after bridge leave) port flags.
      
      The b53 and mv88e6xxx drivers are converted to the passthrough API and
      their implementation of .port_egress_floods is split into two: a
      function that configures unicast flooding and another for multicast.
      The mv88e6xxx implementation is quite hairy, and it turns out that
      the implementations of unknown unicast flooding are actually the same
      for 6185 and for 6352:
      
      behind the confusing names actually lie two individual bits:
      NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
      NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)
      
      so there was no reason to entangle them in the first place.
      
      Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
      PORT_CTL0, which has the exact same bit index. I have left the
      implementations separate though, for the only reason that the names are
      different enough to confuse me, since I am not able to double-check with
      a user manual. The multicast flooding setting for 6185 is in a different
      register than for 6352 though.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8b659e7
    • Vladimir Oltean's avatar
      net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes · e18f4c18
      Vladimir Oltean authored
      
      This switchdev attribute offers a counterproductive API for a driver
      writer, because although br_switchdev_set_port_flag gets passed a
      "flags" and a "mask", those are passed piecemeal to the driver, so while
      the PRE_BRIDGE_FLAGS listener knows what changed because it has the
      "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
      value. But certain drivers can offload only certain combinations of
      settings, like for example they cannot change unicast flooding
      independently of multicast flooding - they must be both on or both off.
      The way the information is passed to switchdev makes drivers not
      expressive enough, and unable to reject this request ahead of time, in
      the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
      the deferred BRIDGE_FLAGS attribute, where the rejection is currently
      ignored.
      
      This patch also changes drivers to make use of the "mask" field for edge
      detection when possible.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e18f4c18
    • Vladimir Oltean's avatar
      net: dsa: configure better brport flags when ports leave the bridge · 5e38c158
      Vladimir Oltean authored
      For a DSA switch port operating in standalone mode, address learning
      doesn't make much sense since that is a bridge function. In fact,
      address learning even breaks setups such as this one:
      
         +---------------------------------------------+
         |                                             |
         | +-------------------+                       |
         | |        br0        |    send      receive  |
         | +--------+-+--------+ +--------+ +--------+ |
         | |        | |        | |        | |        | |
         | |  swp0  | |  swp1  | |  swp2  | |  swp3  | |
         | |        | |        | |        | |        | |
         +-+--------+-+--------+-+--------+-+--------+-+
                |         ^           |          ^
                |         |           |          |
                |         +-----------+          |
                |                                |
                +--------------------------------+
      
      because if the switch has a single FDB (can offload a single bridge)
      then source address learning on swp3 can "steal" the source MAC address
      of swp2 from br0's FDB, because learning frames coming from swp2 will be
      done twice: first on the swp1 ingress port, second on the swp3 ingress
      port. So the hardware FDB will become out of sync with the software
      bridge, and when swp2 tries to send one more packet towards swp1, the
      ASIC will attempt to short-circuit the forwarding path and send it
      directly to swp3 (since that's the last port it learned that address on),
      which it obviously can't, because swp3 operates in standalone mode.
      
      So DSA drivers operating in standalone mode should still configure a
      list of bridge port flags even when they are standalone. Currently DSA
      attempts to call dsa_port_bridge_flags with 0, which disables egress
      flooding of unknown unicast and multicast, something which doesn't make
      much sense. For the switches that implement .port_egress_floods - b53
      and mv88e6xxx, it probably doesn't matter too much either, since they
      can possibly inject traffic from the CPU into a standalone port,
      regardless of MAC DA, even if egress flooding is turned off for that
      port, but certainly not all DSA switches can do that - sja1105, for
      example, can't. So it makes sense to use a better common default there,
      such as "flood everything".
      
      It should also be noted that what DSA calls "dsa_port_bridge_flags()"
      is a degenerate name for just calling .port_egress_floods(), since
      nothing else is implemented - not learning, in particular. But disabling
      address learning, something that this driver is also coding up for, will
      be supported by individual drivers once .port_egress_floods is replaced
      with a more generic .port_bridge_flags.
      
      Previous attempts to code up this logic have been in the common bridge
      layer, but as pointed out by Ido Schimmel, there are corner cases that
      are missed when doing that:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210209151936.97382-5-olteanv@gmail.com/
      
      
      
      So, at least for now, let's leave DSA in charge of setting port flags
      before and after the bridge join and leave.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e38c158
  8. Feb 11, 2021
  9. Jan 30, 2021
    • Vladimir Oltean's avatar
      net: dsa: allow changing the tag protocol via the "tagging" device attribute · 53da0eba
      Vladimir Oltean authored
      
      Currently DSA exposes the following sysfs:
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      
      which is a read-only device attribute, introduced in the kernel as
      commit 98cdb480 ("net: dsa: Expose tagging protocol to user-space"),
      and used by libpcap since its commit 993db3800d7d ("Add support for DSA
      link-layer types").
      
      It would be nice if we could extend this device attribute by making it
      writable:
      $ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
      
      This is useful with DSA switches that can make use of more than one
      tagging protocol. It may be useful in dsa_loop in the future too, to
      perform offline testing of various taggers, or for changing between dsa
      and edsa on Marvell switches, if that is desirable.
      
      In terms of implementation, drivers can support this feature by
      implementing .change_tag_protocol, which should always leave the switch
      in a consistent state: either with the new protocol if things went well,
      or with the old one if something failed. Teardown of the old protocol,
      if necessary, must be handled by the driver.
      
      Some things remain as before:
      - The .get_tag_protocol is currently only called at probe time, to load
        the initial tagging protocol driver. Nonetheless, new drivers should
        report the tagging protocol in current use now.
      - The driver should manage by itself the initial setup of tagging
        protocol, no later than the .setup() method, as well as destroying
        resources used by the last tagger in use, no earlier than the
        .teardown() method.
      
      For multi-switch DSA trees, error handling is a bit more complicated,
      since e.g. the 5th out of 7 switches may fail to change the tag
      protocol. When that happens, a revert to the original tag protocol is
      attempted, but that may fail too, leaving the tree in an inconsistent
      state despite each individual switch implementing .change_tag_protocol
      transactionally. Since the intersection between drivers that implement
      .change_tag_protocol and drivers that support D in DSA is currently the
      empty set, the possibility for this error to happen is ignored for now.
      
      Testing:
      
      $ insmod mscc_felix.ko
      [   79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14
      [   79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
      $ insmod tag_ocelot.ko
      $ rmmod mscc_felix.ko
      $ insmod mscc_felix.ko
      [   97.261724] libphy: VSC9959 internal MDIO bus: probed
      [   97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
      [   97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
      [   97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
      [   97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
      [   97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
      [   97.964278] 8021q: adding VLAN 0 to HW filter on device swp0
      [   98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
      [   98.251845] 8021q: adding VLAN 0 to HW filter on device swp1
      [   98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
      [   98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx
      [   98.527948] device eno2 entered promiscuous mode
      [   98.544755] DSA: tree 0 setup
      
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms
      ^C
       -  10.0.0.1 ping statistics  -
      2 packets transmitted, 2 packets received, 0% packet loss
      round-trip min/avg/max = 0.754/1.545/2.337 ms
      
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      $ cat ./test_ocelot_8021q.sh
              #!/bin/bash
      
              ip link set swp0 down
              ip link set swp1 down
              ip link set swp2 down
              ip link set swp3 down
              ip link set swp5 down
              ip link set eno2 down
              echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
              ip link set eno2 up
              ip link set swp0 up
              ip link set swp1 up
              ip link set swp2 up
              ip link set swp3 up
              ip link set swp5 up
      $ ./test_ocelot_8021q.sh
      ./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available
      $ rmmod tag_ocelot.ko
      rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable
      $ insmod tag_ocelot_8021q.ko
      $ ./test_ocelot_8021q.sh
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot-8021q
      $ rmmod tag_ocelot.ko
      $ rmmod tag_ocelot_8021q.ko
      rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms
      64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms
      $ rmmod mscc_felix.ko
      [  645.544426] mscc_felix 0000:00:00.5: Link is Down
      [  645.838608] DSA: tree 0 torn down
      $ rmmod tag_ocelot_8021q.ko
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53da0eba
    • Vladimir Oltean's avatar
      net: dsa: document the existing switch tree notifiers and add a new one · 886f8e26
      Vladimir Oltean authored
      The existence of dsa_broadcast has generated some confusion in the past:
      https://www.mail-archive.com/netdev@vger.kernel.org/msg365042.html
      
      
      
      So let's document the existing dsa_port_notify and dsa_broadcast
      functions and explain when each of them should be used.
      
      Also, in fact, the in-between function has always been there but was
      lacking a name, and is the main reason for this patch: dsa_tree_notify.
      Refactor dsa_broadcast to use it.
      
      This patch also moves dsa_broadcast (a top-level function) to dsa2.c,
      where it really belonged in the first place, but had no companion so it
      stood with dsa_port_notify.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      886f8e26
  10. Jan 15, 2021
    • Tobias Waldekranz's avatar
      net: dsa: Link aggregation support · 058102a6
      Tobias Waldekranz authored
      
      Monitor the following events and notify the driver when:
      
      - A DSA port joins/leaves a LAG.
      - A LAG, made up of DSA ports, joins/leaves a bridge.
      - A DSA port in a LAG is enabled/disabled (enabled meaning
        "distributing" in 802.3ad LACP terms).
      
      When a LAG joins a bridge, the DSA subsystem will treat that as each
      individual port joining the bridge. The driver may look at the port's
      LAG device pointer to see if it is associated with any LAG, if that is
      required. This is analogue to how switchdev events are replicated out
      to all lower devices when reaching e.g. a LAG.
      
      Drivers can optionally request that DSA maintain a linear mapping from
      a LAG ID to the corresponding netdev by setting ds->num_lag_ids to the
      desired size.
      
      In the event that the hardware is not capable of offloading a
      particular LAG for any reason (the typical case being use of exotic
      modes like broadcast), DSA will take a hands-off approach, allowing
      the LAG to be formed as a pure software construct. This is reported
      back through the extended ACK, but is otherwise transparent to the
      user.
      
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Tested-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      058102a6
  11. Jan 12, 2021
    • Vladimir Oltean's avatar
      net: dsa: remove the transactional logic from ageing time notifiers · 77b61365
      Vladimir Oltean authored
      
      Remove the shim introduced in DSA for offloading the bridge ageing time
      from switchdev, by first checking whether the ageing time is within the
      range limits requested by the driver.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Acked-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      77b61365
    • Vladimir Oltean's avatar
      net: switchdev: remove the transaction structure from port attributes · bae33f2b
      Vladimir Oltean authored
      
      Since the introduction of the switchdev API, port attributes were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      attribute notifier structures, and converts drivers to not look at this
      member.
      
      In part, this patch contains a revert of my previous commit 2e554a7a
      ("net: dsa: propagate switchdev vlan_filtering prepare phase to
      drivers").
      
      For the most part, the conversion was trivial except for:
      - Rocker's world implementation based on Broadcom OF-DPA had an odd
        implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
        done mechanically, by pasting the implementation twice, then only
        keeping the code that would get executed during prepare phase on top,
        then only keeping the code that gets executed during the commit phase
        on bottom, then simplifying the resulting code until this was obtained.
      - DSA's offloading of STP state, bridge flags, VLAN filtering and
        multicast router could be converted right away. But the ageing time
        could not, so a shim was introduced and this was left for a further
        commit.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Acked-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bae33f2b
    • Vladimir Oltean's avatar
      net: switchdev: remove the transaction structure from port object notifiers · ffb68fc5
      Vladimir Oltean authored
      
      Since the introduction of the switchdev API, port objects were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      object notifier structures, and converts drivers to not look at this
      member.
      
      Where driver conversion is trivial (like in the case of the Marvell
      Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
      done in this patch.
      
      Where driver conversion needs more attention (DSA, Mellanox Spectrum),
      the conversion is left for subsequent patches and here we only fake the
      prepare/commit phases at a lower level, just not in the switchdev
      notifier itself.
      
      Where the code has a natural structure that is best left alone as a
      preparation and a commit phase (as in the case of the Ocelot switch),
      that structure is left in place, just made to not depend upon the
      switchdev transactional model.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Acked-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ffb68fc5
  12. Oct 05, 2020
    • Vladimir Oltean's avatar
      net: dsa: propagate switchdev vlan_filtering prepare phase to drivers · 2e554a7a
      Vladimir Oltean authored
      
      A driver may refuse to enable VLAN filtering for any reason beyond what
      the DSA framework cares about, such as:
      - having tc-flower rules that rely on the switch being VLAN-aware
      - the particular switch does not support VLAN, even if the driver does
        (the DSA framework just checks for the presence of the .port_vlan_add
        and .port_vlan_del pointers)
      - simply not supporting this configuration to be toggled at runtime
      
      Currently, when a driver rejects a configuration it cannot support, it
      does this from the commit phase, which triggers various warnings in
      switchdev.
      
      So propagate the prepare phase to drivers, to give them the ability to
      refuse invalid configurations cleanly and avoid the warnings.
      
      Since we need to modify all function prototypes and check for the
      prepare phase from within the drivers, take that opportunity and move
      the existing driver restrictions within the prepare phase where that is
      possible and easy.
      
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
      Cc: Hauke Mehrtens <hauke@hauke-m.de>
      Cc: Woojung Huh <woojung.huh@microchip.com>
      Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
      Cc: Sean Wang <sean.wang@mediatek.com>
      Cc: Landen Chao <Landen.Chao@mediatek.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Jonathan McDowell <noodles@earth.li>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e554a7a
  13. Sep 21, 2020
    • Vladimir Oltean's avatar
      net: dsa: allow 8021q uppers while the bridge has vlan_filtering=0 · adb256eb
      Vladimir Oltean authored
      
      When the bridge has VLAN awareness disabled there isn't any duplication
      of functionality, since the bridge does not process VLAN. Don't deny
      adding 8021q uppers to DSA switch ports in that case. The switch is
      supposed to simply pass traffic leaving the VLAN tag as-is, and the
      stack will happily strip the VLAN tag for all 8021q uppers that exist.
      
      We need to ensure that there are no 8021q uppers when the user attempts
      to enable bridge vlan_filtering.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adb256eb
    • Vladimir Oltean's avatar
      net: dsa: refuse configuration in prepare phase of dsa_port_vlan_filtering() · 707ec383
      Vladimir Oltean authored
      
      The current logic beats me a little bit. The comment that "bridge skips
      -EOPNOTSUPP, so skip the prepare phase" was introduced in commit
      fb2dabad ("net: dsa: support VLAN filtering switchdev attr").
      
      I'm not sure:
      (a) ok, the bridge skips -EOPNOTSUPP, but, so what, where are we
          returning -EOPNOTSUPP?
      (b) even if we are, and I'm just not seeing it, what is the causality
          relationship between the bridge skipping -EOPNOTSUPP and DSA
          skipping the prepare phase, and just returning zero?
      
      One thing is certain beyond doubt though, and that is that DSA currently
      refuses VLAN filtering from the "commit" phase instead of "prepare", and
      that this is not a good thing:
      
      ip link add br0 type bridge
      ip link add br1 type bridge vlan_filtering 1
      ip link set swp2 master br0
      ip link set swp3 master br1
      [ 3790.379389] 001: sja1105 spi0.1: VLAN filtering is a global setting
      [ 3790.379399] 001: ------------[ cut here ]------------
      [ 3790.379403] 001: WARNING: CPU: 1 PID: 515 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
      [ 3790.379420] 001: swp3: Commit of attribute (id=6) failed.
      [ 3790.379533] 001: [<c11ff588>] (switchdev_port_attr_set_now) from [<c11b62e4>] (nbp_vlan_init+0x84/0x148)
      [ 3790.379544] 001: [<c11b62e4>] (nbp_vlan_init) from [<c11a2ff0>] (br_add_if+0x514/0x670)
      [ 3790.379554] 001: [<c11a2ff0>] (br_add_if) from [<c1031b5c>] (do_setlink+0x38c/0xab0)
      [ 3790.379565] 001: [<c1031b5c>] (do_setlink) from [<c1036fe8>] (__rtnl_newlink+0x44c/0x748)
      [ 3790.379573] 001: [<c1036fe8>] (__rtnl_newlink) from [<c1037328>] (rtnl_newlink+0x44/0x60)
      [ 3790.379580] 001: [<c1037328>] (rtnl_newlink) from [<c10315fc>] (rtnetlink_rcv_msg+0x124/0x2f8)
      [ 3790.379590] 001: [<c10315fc>] (rtnetlink_rcv_msg) from [<c10926b8>] (netlink_rcv_skb+0xb8/0x110)
      [ 3790.379806] 001: ---[ end trace 0000000000000002 ]---
      [ 3790.379819] 001: sja1105 spi0.1 swp3: failed to initialize vlan filtering on this port
      
      So move the current logic that may fail (except ds->ops->port_vlan_filtering,
      that is way harder) into the prepare stage of the switchdev transaction.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      707ec383
  14. Sep 12, 2020
  15. May 12, 2020
    • Russell King's avatar
      net: dsa: provide an option for drivers to always receive bridge VLANs · 54a0ed0d
      Russell King authored
      
      DSA assumes that a bridge which has vlan filtering disabled is not
      vlan aware, and ignores all vlan configuration. However, the kernel
      software bridge code allows configuration in this state.
      
      This causes the kernel's idea of the bridge vlan state and the
      hardware state to disagree, so "bridge vlan show" indicates a correct
      configuration but the hardware lacks all configuration. Even worse,
      enabling vlan filtering on a DSA bridge immediately blocks all traffic
      which, given the output of "bridge vlan show", is very confusing.
      
      Provide an option that drivers can set to indicate they want to receive
      vlan configuration even when vlan filtering is disabled. At the very
      least, this is safe for Marvell DSA bridges, which do not look up
      ingress traffic in the VTU if the port is in 8021Q disabled state. It is
      also safe for the Ocelot switch family. Whether this change is suitable
      for all DSA bridges is not known.
      
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54a0ed0d
  16. May 11, 2020
    • Vladimir Oltean's avatar
      net: dsa: permit cross-chip bridging between all trees in the system · f66a6a69
      Vladimir Oltean authored
      
      One way of utilizing DSA is by cascading switches which do not all have
      compatible taggers. Consider the following real-life topology:
      
            +---------------------------------------------------------------+
            | LS1028A                                                       |
            |               +------------------------------+                |
            |               |      DSA master for Felix    |                |
            |               |(internal ENETC port 2: eno2))|                |
            |  +------------+------------------------------+-------------+  |
            |  | Felix embedded L2 switch                                |  |
            |  |                                                         |  |
            |  | +--------------+   +--------------+   +--------------+  |  |
            |  | |DSA master for|   |DSA master for|   |DSA master for|  |  |
            |  | |  SJA1105 1   |   |  SJA1105 2   |   |  SJA1105 3   |  |  |
            |  | |(Felix port 1)|   |(Felix port 2)|   |(Felix port 3)|  |  |
            +--+-+--------------+---+--------------+---+--------------+--+--+
      
      +-----------------------+ +-----------------------+ +-----------------------+
      |   SJA1105 switch 1    | |   SJA1105 switch 2    | |   SJA1105 switch 3    |
      +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
      |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3|
      +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
      
      The above can be described in the device tree as follows (obviously not
      complete):
      
      mscc_felix {
      	dsa,member = <0 0>;
      	ports {
      		port@4 {
      			ethernet = <&enetc_port2>;
      		};
      	};
      };
      
      sja1105_switch1 {
      	dsa,member = <1 1>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port1>;
      		};
      	};
      };
      
      sja1105_switch2 {
      	dsa,member = <2 2>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port2>;
      		};
      	};
      };
      
      sja1105_switch3 {
      	dsa,member = <3 3>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port3>;
      		};
      	};
      };
      
      Basically we instantiate one DSA switch tree for every hardware switch
      in the system, but we still give them globally unique switch IDs (will
      come back to that later). Having 3 disjoint switch trees makes the
      tagger drivers "just work", because net devices are registered for the
      3 Felix DSA master ports, and they are also DSA slave ports to the ENETC
      port. So packets received on the ENETC port are stripped of their
      stacked DSA tags one by one.
      
      Currently, hardware bridging between ports on the same sja1105 chip is
      possible, but switching between sja1105 ports on different chips is
      handled by the software bridge. This is fine, but we can do better.
      
      In fact, the dsa_8021q tag used by sja1105 is compatible with cascading.
      In other words, a sja1105 switch can correctly parse and route a packet
      containing a dsa_8021q tag. So if we could enable hardware bridging on
      the Felix DSA master ports, cross-chip bridging could be completely
      offloaded.
      
      Such as system would be used as follows:
      
      ip link add dev br0 type bridge && ip link set dev br0 up
      for port in sw0p0 sw0p1 sw0p2 sw0p3 \
      	    sw1p0 sw1p1 sw1p2 sw1p3 \
      	    sw2p0 sw2p1 sw2p2 sw2p3; do
      	ip link set dev $port master br0
      done
      
      The above makes switching between ports on the same row be performed in
      hardware, and between ports on different rows in software. Now assume
      the Felix switch ports are called swp0, swp1, swp2. By running the
      following extra commands:
      
      ip link add dev br1 type bridge && ip link set dev br1 up
      for port in swp0 swp1 swp2; do
      	ip link set dev $port master br1
      done
      
      the CPU no longer sees packets which traverse sja1105 switch boundaries
      and can be forwarded directly by Felix. The br1 bridge would not be used
      for any sort of traffic termination.
      
      For this to work, we need to give drivers an opportunity to listen for
      bridging events on DSA trees other than their own, and pass that other
      tree index as argument. I have made the assumption, for the moment, that
      the other existing DSA notifiers don't need to be broadcast to other
      trees. That assumption might turn out to be incorrect. But in the
      meantime, introduce a dsa_broadcast function, similar in purpose to
      dsa_port_notify, which is used only by the bridging notifiers.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f66a6a69
  17. Apr 14, 2020
    • Andrew Lunn's avatar
      net: dsa: Down cpu/dsa ports phylink will control · 3be98b2d
      Andrew Lunn authored
      
      DSA and CPU ports can be configured in two ways. By default, the
      driver should configure such ports to there maximum bandwidth. For
      most use cases, this is sufficient. When this default is insufficient,
      a phylink instance can be bound to such ports, and phylink will
      configure the port, e.g. based on fixed-link properties. phylink
      assumes the port is initially down. Given that the driver should have
      already configured it to its maximum speed, ask the driver to down
      the port before instantiating the phylink instance.
      
      Fixes: 30c4a5b0 ("net: mv88e6xxx: use resolved link config in mac_link_up()")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3be98b2d
  18. Mar 27, 2020
    • Vladimir Oltean's avatar
      net: dsa: configure the MTU for switch ports · bfcb8132
      Vladimir Oltean authored
      
      It is useful be able to configure port policers on a switch to accept
      frames of various sizes:
      
      - Increase the MTU for better throughput from the default of 1500 if it
        is known that there is no 10/100 Mbps device in the network.
      - Decrease the MTU to limit the latency of high-priority frames under
        congestion, or work around various network segments that add extra
        headers to packets which can't be fragmented.
      
      For DSA slave ports, this is mostly a pass-through callback, called
      through the regular ndo ops and at probe time (to ensure consistency
      across all supported switches).
      
      The CPU port is called with an MTU equal to the largest configured MTU
      of the slave ports. The assumption is that the user might want to
      sustain a bidirectional conversation with a partner over any switch
      port.
      
      The DSA master is configured the same as the CPU port, plus the tagger
      overhead. Since the MTU is by definition L2 payload (sans Ethernet
      header), it is up to each individual driver to figure out if it needs to
      do anything special for its frame tags on the CPU port (it shouldn't
      except in special cases). So the MTU does not contain the tagger
      overhead on the CPU port.
      However the MTU of the DSA master, minus the tagger overhead, is used as
      a proxy for the MTU of the CPU port, which does not have a net device.
      This is to avoid uselessly calling the .change_mtu function on the CPU
      port when nothing should change.
      
      So it is safe to assume that the DSA master and the CPU port MTUs are
      apart by exactly the tagger's overhead in bytes.
      
      Some changes were made around dsa_master_set_mtu(), function which was
      now removed, for 2 reasons:
        - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to
          do the same thing in DSA
        - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method
      That is to say, there's no need for this function in DSA, we can safely
      call dev_set_mtu() directly, take the rtnl lock when necessary, and just
      propagate whatever errors get reported (since the user probably wants to
      be informed).
      
      Some inspiration (mainly in the MTU DSA notifier) was taken from a
      vaguely similar patch from Murali and Florian, who are credited as
      co-developers down below.
      
      Co-developed-by: default avatarMurali Krishna Policharla <murali.policharla@broadcom.com>
      Signed-off-by: default avatarMurali Krishna Policharla <murali.policharla@broadcom.com>
      Co-developed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfcb8132
  19. Mar 16, 2020
  20. Mar 12, 2020
    • Andrew Lunn's avatar
      net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed · a20f9970
      Andrew Lunn authored
      
      By default, DSA drivers should configure CPU and DSA ports to their
      maximum speed. In many configurations this is sufficient to make the
      link work.
      
      In some cases it is necessary to configure the link to run slower,
      e.g. because of limitations of the SoC it is connected to. Or back to
      back PHYs are used and the PHY needs to be driven in order to
      establish link. In this case, phylink is used.
      
      Only instantiate phylink if it is required. If there is no PHY, or no
      fixed link properties, phylink can upset a link which works in the
      default configuration.
      
      Fixes: 0e279218 ("net: dsa: Use PHYLINK for the CPU/DSA ports")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a20f9970
  21. Mar 03, 2020
  22. Feb 27, 2020
  23. Jan 06, 2020
  24. Dec 17, 2019
  25. Nov 24, 2019
  26. Nov 04, 2019
    • Andrew Lunn's avatar
      net: of_get_phy_mode: Change API to solve int/unit warnings · 0c65b2b9
      Andrew Lunn authored
      
      Before this change of_get_phy_mode() returned an enum,
      phy_interface_t. On error, -ENODEV etc, is returned. If the result of
      the function is stored in a variable of type phy_interface_t, and the
      compiler has decided to represent this as an unsigned int, comparision
      with -ENODEV etc, is a signed vs unsigned comparision.
      
      Fix this problem by changing the API. Make the function return an
      error, or 0 on success, and pass a pointer, of type phy_interface_t,
      where the phy mode should be stored.
      
      v2:
      Return with *interface set to PHY_INTERFACE_MODE_NA on error.
      Add error checks to all users of of_get_phy_mode()
      Fixup a few reverse christmas tree errors
      Fixup a few slightly malformed reverse christmas trees
      
      v3:
      Fix 0-day reported errors.
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c65b2b9
  27. Aug 28, 2019
Loading