Skip to content
  • Jerry Chu's avatar
    net-gre-gro: Add GRE support to the GRO stack · bf5a755f
    Jerry Chu authored
    This patch built on top of Commit 299603e8
    ("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
    the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
    stack. It also serves as an example for supporting other encapsulation
    protocols in the GRO stack in the future.
    
    The patch supports version 0 and all the flags (key, csum, seq#) but
    will flush any pkt with the S (seq#) flag. This is because the S flag
    is not support by GSO, and a GRO pkt may end up in the forwarding path,
    thus requiring GSO support to break it up correctly.
    
    Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
    ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
    IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
    be easily added, so is the support for GRE variations like NVGRE.
    
    The patch also support csum offload. Specifically if the csum flag is on
    and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
    the code will take advantage of the csum computed by the h/w when
    validating the GRE csum.
    
    Note that commit 60769a5d
    
     "ipv4: gre:
    add GRO capability" already introduces GRO capability to IPv4 GRE
    tunnels, using the gro_cells infrastructure. But GRO is done after
    GRE hdr has been removed (i.e., decapped). The following patch applies
    GRO when pkts first come in (before hitting the GRE tunnel code). There
    is some performance advantage for applying GRO as early as possible.
    Also this approach is transparent to other subsystem like Open vSwitch
    where GRE decap is handled outside of the IP stack hence making it
    harder for the gro_cells stuff to apply. On the other hand, some NICs
    are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
    In that case the GRO processing of pkts from the same remote host will
    all happen on the same CPU and the performance may be suboptimal.
    
    I'm including some rough preliminary performance numbers below. Note
    that the performance will be highly dependent on traffic load, mix as
    usual. Moreover it also depends on NIC offload features hence the
    following is by no means a comprehesive study. Local testing and tuning
    will be needed to decide the best setting.
    
    All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
    (super_netperf 50 -H 192.168.1.18 -l 30)
    
    An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
    mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
    is configured.
    
    The GRO support for pkts AFTER decap are controlled through the device
    feature of the GRE device (e.g., ethtool -K gre1 gro on/off).
    
    1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
    thruput: 9.16Gbps
    CPU utilization: 19%
    
    1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
    thruput: 5.9Gbps
    CPU utilization: 15%
    
    1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
    thruput: 9.26Gbps
    CPU utilization: 12-13%
    
    1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
    thruput: 9.26Gbps
    CPU utilization: 10%
    
    The following tests were performed on a different NIC that is capable of
    csum offload. I.e., the h/w is capable of computing IP payload csum
    (CHECKSUM_COMPLETE).
    
    2.1 ethtool -K gre1 gro on (hence will use gro_cells)
    
    2.1.1 ethtool -K eth0 gro off; csum offload disabled
    thruput: 8.53Gbps
    CPU utilization: 9%
    
    2.1.2 ethtool -K eth0 gro off; csum offload enabled
    thruput: 8.97Gbps
    CPU utilization: 7-8%
    
    2.1.3 ethtool -K eth0 gro on; csum offload disabled
    thruput: 8.83Gbps
    CPU utilization: 5-6%
    
    2.1.4 ethtool -K eth0 gro on; csum offload enabled
    thruput: 8.98Gbps
    CPU utilization: 5%
    
    2.2 ethtool -K gre1 gro off
    
    2.2.1 ethtool -K eth0 gro off; csum offload disabled
    thruput: 5.93Gbps
    CPU utilization: 9%
    
    2.2.2 ethtool -K eth0 gro off; csum offload enabled
    thruput: 5.62Gbps
    CPU utilization: 8%
    
    2.2.3 ethtool -K eth0 gro on; csum offload disabled
    thruput: 7.69Gbps
    CPU utilization: 8%
    
    2.2.4 ethtool -K eth0 gro on; csum offload enabled
    thruput: 8.96Gbps
    CPU utilization: 5-6%
    
    Signed-off-by: default avatarH.K. Jerry Chu <hkchu@google.com>
    Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    bf5a755f