- Sep 05, 2019
-
-
Shannon Nelson authored
The current tag set is still rather small and needs a couple more tags to help with ASIC identification and to have a more generic FW version. Cc: Jiri Pirko <jiri@resnulli.us> Acked-by:
Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by:
Shannon Nelson <snelson@pensando.io> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 03, 2019
-
-
Maor Gottlieb authored
Add flow steering actions: modify header and packet reformat to the fs_cmd shim layer. This allows each namespace to define possibly different functionality for alloc/dealloc action commands. Signed-off-by:
Maor Gottlieb <maorg@mellanox.com> Reviewed-by:
Mark Bloch <markb@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
- Sep 02, 2019
-
-
Maor Gottlieb authored
Return MLX5_ESWITCH_NONE when CONFIG_MLX5_ESWITCH is not selected. Signed-off-by:
Maor Gottlieb <maorg@mellanox.com> Reviewed-by:
Mark Bloch <markb@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Alex Vesker authored
Add the required Software Steering hardware definitions and bits to mlx5_ifc. Signed-off-by:
Alex Vesker <valex@mellanox.com> Signed-off-by:
Yevgeny Klitenik <kliten@mellanox.com> Reviewed-by:
Mark Bloch <markb@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Ariel Levkovich authored
Move the device memory allocation and deallocation commands SW ICM memory to mlx5_core to expose this API for all mlx5_core users. This comes as preparation for supporting SW steering in kernel where it will be required to allocate and register device memory for direct rule insertion. In addition, an API to register this device memory for future remote access operations is introduced using the create_mkey commands. Signed-off-by:
Ariel Levkovich <lariel@mellanox.com> Reviewed-by:
Mark Bloch <markb@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
- Sep 01, 2019
-
-
Parav Pandit authored
Devlink port index attribute is returned to users as u32 through netlink response. Change index data type from 'unsigned' to 'unsigned int' to avoid below checkpatch.pl warning. WARNING: Prefer 'unsigned int' to bare use of 'unsigned' 81: FILE: include/net/devlink.h:81: + unsigned index; Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
Parav Pandit <parav@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
When an application configures kernel TLS on top of a TCP socket, it's now possible for inet_diag_handler() to collect information regarding the protocol version, the cipher type and TX / RX configuration, in case INET_DIAG_INFO is requested. Signed-off-by:
Davide Caratti <dcaratti@redhat.com> Reviewed-by:
Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
currently, only getsockopt(TCP_ULP) can be invoked to know if a ULP is on top of a TCP socket. Extend idiag_get_aux() and idiag_get_aux_size(), introduced by commit b37e8840 ("inet_diag: allow protocols to provide additional data"), to report the ULP name and other information that can be made available by the ULP through optional functions. Users having CAP_NET_ADMIN privileges will then be able to retrieve this information through inet_diag_handler, if they specify INET_DIAG_INFO in the request. Signed-off-by:
Davide Caratti <dcaratti@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We need to make sure context does not get freed while diag code is interrogating it. Free struct tls_context with kfree_rcu(). We add the __rcu annotation directly in icsk, and cast it away in the datapath accessor. Presumably all ULPs will do a similar thing. Signed-off-by:
Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 31, 2019
-
-
Sudarsana Reddy Kalluru authored
The patch adds driver support for configuring the grc dump config flags. Signed-off-by:
Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by:
Ariel Elior <aelior@marvell.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Sudarsana Reddy Kalluru authored
The patch adds driver support for reading the config id attributes from NVM flash partition. Signed-off-by:
Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by:
Ariel Elior <aelior@marvell.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Denis Efremov authored
The function ftrace_set_clr_event is declared static and marked EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the function was decided to be a part of API, this commit removes the static attribute and adds the declaration to the header. Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com Fixes: f45d1225 ("tracing: Kernel access to Ftrace instances") Reviewed-by:
Joe Jin <joe.jin@oracle.com> Signed-off-by:
Denis Efremov <efremov@linux.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Denis Efremov authored
"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses unlikely() internally. Signed-off-by:
Denis Efremov <efremov@linux.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: netdev@vger.kernel.org Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Roman Gushchin authored
I've noticed that the "slab" value in memory.stat is sometimes 0, even if some children memory cgroups have a non-zero "slab" value. The following investigation showed that this is the result of the kmem_cache reparenting in combination with the per-cpu batching of slab vmstats. At the offlining some vmstat value may leave in the percpu cache, not being propagated upwards by the cgroup hierarchy. It means that stats on ancestor levels are lower than actual. Later when slab pages are released, the precise number of pages is substracted on the parent level, making the value negative. We don't show negative values, 0 is printed instead. To fix this issue, let's flush percpu slab memcg and lruvec stats on memcg offlining. This guarantees that numbers on all ancestor levels are accurate and match the actual number of outstanding slab pages. Link: http://lkml.kernel.org/r/20190819202338.363363-3-guro@fb.com Fixes: fb2f2b0a ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal") Signed-off-by:
Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Aug 29, 2019
-
-
Gustavo A. R. Silva authored
Mark switch cases where we are expecting to fall through. This patch fixes the following warnings (Building: allmodconfig nds32): include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/nds32/kernel/signal.c:362:20: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/nds32/kernel/signal.c:315:7: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=] include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=] Reported-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com>
-
- Aug 28, 2019
-
-
Vlad Buslov authored
Action sample doesn't properly handle psample_group pointer in overwrite case. Following issues need to be fixed: - In tcf_sample_init() function RCU_INIT_POINTER() is used to set s->psample_group, even though we neither setting the pointer to NULL, nor preventing concurrent readers from accessing the pointer in some way. Use rcu_swap_protected() instead to safely reset the pointer. - Old value of s->psample_group is not released or deallocated in any way, which results resource leak. Use psample_group_put() on non-NULL value obtained with rcu_swap_protected(). - The function psample_group_put() that released reference to struct psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu grace period when deallocating it. Extend struct psample_group with rcu head and use kfree_rcu when freeing it. Fixes: 5c5670fa ("net/sched: Introduce sample tc action") Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Remove two holes on 64bit arches, to bring the size to one cache line exactly. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Michael Guralnik authored
In mlx5_core initialization, query max ODP capabilities for DC transport from FW and set as current capabilities. Signed-off-by:
Michael Guralnik <michaelgur@mellanox.com> Signed-off-by:
Leon Romanovsky <leonro@mellanox.com>
-
Voon Weifeng authored
EHL DW EQOS is running on a 200MHz clock. Setting up stmmac-clk, ptp clock and ptp_max_adj to 200MHz. Signed-off-by:
Voon Weifeng <weifeng.voon@intel.com> Signed-off-by:
Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Xin Long authored
SCTP_ECN_SUPPORTED sockopt will be added to allow users to change ep ecn flag, and it's similar with other feature flags. Signed-off-by:
Xin Long <lucien.xin@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Xin Long authored
This patch is to add ecn flag for both netns_sctp and sctp_endpoint, net->sctp.ecn_enable is set 1 by default, and ep->ecn_enable will be initialized with net->sctp.ecn_enable. asoc->peer.ecn_capable will be set during negotiation only when ep->ecn_enable is set on both sides. Signed-off-by:
Xin Long <lucien.xin@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Marco Hartmann authored
Commit 34786005 ("net: phy: prevent PHYs w/o Clause 22 regs from calling genphy_config_aneg") introduced a check that aborts phy_config_aneg() if the phy is a C45 phy. This causes phy_state_machine() to call phy_error() so that the phy ends up in PHY_HALTED state. Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg() (analogous to the C22 case) so that the state machine can run correctly. genphy_c45_config_aneg() closely resembles mv3310_config_aneg() in drivers/net/phy/marvell10g.c, excluding vendor specific configurations for 1000BaseT. Fixes: 22b56e82 ("net: phy: replace genphy_10g_driver with genphy_c45_driver") Signed-off-by:
Marco Hartmann <marco.hartmann@nxp.com> Reviewed-by:
Andrew Lunn <andrew@lunn.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
The bitmap operations were introduced to simplify the switch drivers in the future, since most of them could implement the common VLAN and MDB operations (add, del, dump) with simple functions taking all target ports at once, and thus limiting the number of hardware accesses. Programming an MDB or VLAN this way in a single operation would clearly simplify the drivers a lot but would require a new get-set interface in DSA. The usage of such bitmap from the stack also raised concerned in the past, leading to the dynamic allocation of a new ds->_bitmap member in the dsa_switch structure. So let's get rid of them for now. This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add} switch operations into new dsa_switch_{mdb,vlan}_{prepare,add} variants not using any bitmap argument anymore. New dsa_switch_{mdb,vlan}_match helpers have been introduced to make clear which local port of a switch must be programmed with the target object. While the targeted user port is an obvious candidate, the DSA links must also be programmed, as well as the CPU port for VLANs. While at it, also remove local variables that are only used once. Signed-off-by:
Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 27, 2019
-
-
Cong Wang authored
The net pointer in struct xt_tgdtor_param is not explicitly initialized therefore is still NULL when dereferencing it. So we have to find a way to pass the correct net pointer to ipt_destroy_target(). The best way I find is just saving the net pointer inside the per netns struct tcf_idrinfo, which could make this patch smaller. Fixes: 0c66dc1e ("netfilter: conntrack: register hooks in netns when needed by ruleset") Reported-and-tested-by:
<itugrok@yahoo.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David Howells authored
The in-place decryption routines in AF_RXRPC's rxkad security module currently call skb_cow_data() to make sure the data isn't shared and that the skb can be written over. This has a problem, however, as the softirq handler may be still holding a ref or the Rx ring may be holding multiple refs when skb_cow_data() is called in rxkad_verify_packet() - and so skb_shared() returns true and __pskb_pull_tail() dislikes that. If this occurs, something like the following report will be generated. kernel BUG at net/core/skbuff.c:1463! ... RIP: 0010:pskb_expand_head+0x253/0x2b0 ... Call Trace: __pskb_pull_tail+0x49/0x460 skb_cow_data+0x6f/0x300 rxkad_verify_packet+0x18b/0xb10 [rxrpc] rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc] rxrpc_kernel_recv_data+0x126/0x240 [rxrpc] afs_extract_data+0x51/0x2d0 [kafs] afs_deliver_fs_fetch_data+0x188/0x400 [kafs] afs_deliver_to_call+0xac/0x430 [kafs] afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs] afs_make_call+0x282/0x3f0 [kafs] afs_fs_fetch_data+0x164/0x300 [kafs] afs_fetch_data+0x54/0x130 [kafs] afs_readpages+0x20d/0x340 [kafs] read_pages+0x66/0x180 __do_page_cache_readahead+0x188/0x1a0 ondemand_readahead+0x17d/0x2e0 generic_file_read_iter+0x740/0xc10 __vfs_read+0x145/0x1a0 vfs_read+0x8c/0x140 ksys_read+0x4a/0xb0 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by using skb_unshare() instead in the input path for DATA packets that have a security index != 0. Non-DATA packets don't need in-place encryption and neither do unencrypted DATA packets. Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code") Reported-by:
Julian Wollrath <jwollrath@web.de> Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Use the previously-added transmit-phase skbuff private flag to simplify the socket buffer tracing a bit. Which phase the skbuff comes from can now be divined from the skb rather than having to be guessed from the call state. We can also reduce the number of rxrpc_skb_trace values by eliminating the difference between Tx and Rx in the symbols. Signed-off-by:
David Howells <dhowells@redhat.com>
-
- Aug 26, 2019
-
-
Vlad Buslov authored
In order to remove dependency on rtnl lock, modify tc_setup_flow_action() to copy tunnel info, instead of just saving pointer to tunnel_key action tunnel info. This is necessary to prevent concurrent action overwrite from releasing tunnel info while it is being used by rtnl-unlocked driver. Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info with all its options to dynamically allocated memory block. Modify tc_cleanup_flow_action() to free dynamically allocated tunnel info. Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
In order to remove dependency on rtnl lock when calling hardware offload API, take reference to action mirred dev when initializing flow_action structure in tc_setup_flow_action(). Implement function tc_cleanup_flow_action(), use it to release the device after hardware offload API is done using it. Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
In order to allow using new flow_action infrastructure from unlocked classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held' argument. Take rtnl lock before accessing tc_action data. This is necessary to protect from concurrent action replace. Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow registering and unregistering block hardware offload callbacks that do not require caller to hold rtnl lock. Extend tcf_block with additional lockeddevcnt counter that is incremented for each non-unlocked driver callback attached to device. This counter is necessary to conditionally obtain rtnl lock before calling hardware callbacks in following patches. Register mlx5 tc block offload callbacks as "unlocked". Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
To remove dependency on rtnl lock, extend classifier ops with new ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while holding cb_lock every time filter if successfully added to or deleted from hardware. Implement the new API in flower classifier. Use it to manage hw_filters list under cb_lock protection, instead of relying on rtnl lock to synchronize with concurrent fl_reoffload() call. Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
Without rtnl lock protection filters can no longer safely manage block offloads counter themselves. Refactor cls API to protect block offloadcnt with tcf_block->cb_lock that is already used to protect driver callback list and nooffloaddevcnt counter. The counter can be modified by concurrent tasks by new functions that execute block callbacks (which is safe with previous patch that changed its type to atomic_t), however, block bind/unbind code that checks the counter value takes cb_lock in write mode to exclude any concurrent modifications. This approach prevents race conditions between bind/unbind and callback execution code but allows for concurrency for tc rule update path. Move block offload counter, filter in hardware counter and filter flags management from classifiers into cls hardware offloads API. Make functions tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API private. Implement following new cls API to be used instead: tc_setup_cb_add() - non-destructive filter add. If filter that wasn't already in hardware is successfully offloaded, increment block offloads counter, set filter in hardware counter and flag. On failure, previously offloaded filter is considered to be intact and offloads counter is not decremented. tc_setup_cb_replace() - destructive filter replace. Release existing filter block offload counter and reset its in hardware counter and flag. Set new filter in hardware counter and flag. On failure, previously offloaded filter is considered to be destroyed and offload counter is decremented. tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block offloads counter. tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and call tc_cls_offload_cnt_update() if cb() didn't return an error. Refactor all offload-capable classifiers to atomically offload filters to hardware, change block offload counter, and set filter in hardware counter and flag by means of the new cls API functions. Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
As a preparation for running proto ops functions without rtnl lock, change offload counter type to atomic. This is necessary to allow updating the counter by multiple concurrent users when offloading filters to hardware from unlocked classifiers. Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock' rwsem and use it to protect flow_block->cb_list and related counters from concurrent modification. The lock is taken in read mode for read-only traversal of cb_list in tc_setup_cb_call() and write mode in all other cases. This approach ensures that: - cb_list is not changed concurrently while filters is being offloaded on block. - block->nooffloaddevcnt is checked while holding the lock in read mode, but is only changed by bind/unbind code when holding the cb_lock in write mode to prevent concurrent modification. Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Trond Myklebust authored
This reverts commit a79f194a. The mechanism for aborting I/O is racy, since we are not guaranteed that the request is asleep while we're changing both task->tk_status and task->tk_action. Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1
-
Mischa Jonker authored
This adds support for an optional extra interrupt cell to specify edge vs level triggered. It is backward compatible with dts files with only one cell, and will default to level-triggered in such a case. Note that I had to make a change to idu_irq_set_affinity as well, as this function was setting the interrupt type to "level" unconditionally, since this was the only type supported previously. Signed-off-by:
Mischa Jonker <mischa.jonker@synopsys.com> Reviewed-by:
Vineet Gupta <vgupta@synopsys.com> Signed-off-by:
Vineet Gupta <vgupta@synopsys.com>
-
- Aug 25, 2019
-
-
David Ahern authored
Donald reported this sequence: ip next add id 1 blackhole ip next add id 2 blackhole ip ro add 1.1.1.1/32 nhid 1 ip ro add 1.1.1.2/32 nhid 2 would cause a crash. Backtrace is: [ 151.302790] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 151.304043] CPU: 1 PID: 277 Comm: ip Not tainted 5.3.0-rc5+ #37 [ 151.305078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 [ 151.306526] RIP: 0010:fib_add_nexthop+0x8b/0x2aa [ 151.307343] Code: 35 f7 81 48 8d 14 01 c7 02 f1 f1 f1 f1 c7 42 04 01 f4 f4 f4 48 89 f2 48 c1 ea 03 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 <80> 3c 02 00 74 08 48 89 f7 e8 1a e8 53 ff be 08 00 00 00 4c 89 e7 [ 151.310549] RSP: 0018:ffff888116c27340 EFLAGS: 00010246 [ 151.311469] RAX: dffffc0000000000 RBX: ffff8881154ece00 RCX: 0000000000000000 [ 151.312713] RDX: 0000000000000004 RSI: 0000000000000020 RDI: ffff888115649b40 [ 151.313968] RBP: ffff888116c273d8 R08: ffffed10221e3757 R09: ffff888110f1bab8 [ 151.315212] R10: 0000000000000001 R11: ffff888110f1bab3 R12: ffff888115649b40 [ 151.316456] R13: 0000000000000020 R14: ffff888116c273b0 R15: ffff888115649b40 [ 151.317707] FS: 00007f60b4d8d800(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000 [ 151.319113] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 151.320119] CR2: 0000555671ffdc00 CR3: 00000001136ba005 CR4: 0000000000020ee0 [ 151.321367] Call Trace: [ 151.321820] ? fib_nexthop_info+0x635/0x635 [ 151.322572] fib_dump_info+0xaa4/0xde0 [ 151.323247] ? fib_create_info+0x2431/0x2431 [ 151.324008] ? napi_alloc_frag+0x2a/0x2a [ 151.324711] rtmsg_fib+0x2c4/0x3be [ 151.325339] fib_table_insert+0xe2f/0xeee ... fib_dump_info incorrectly has nhs = 0 for blackhole nexthops, so it believes the nexthop object is a multipath group (nhs != 1) and ends up down the nexthop_mpath_fill_node() path which is wrong for a blackhole. The blackhole check in nexthop_num_path is leftover from early days of the blackhole implementation which did not initialize the device. In the end the design was simpler (fewer special case checks) to set the device to loopback in nh_info, so the check in nexthop_num_path should have been removed. Fixes: 430a0491 ("nexthop: Add support for nexthop groups") Reported-by:
Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by:
David Ahern <dsahern@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 24, 2019
-
-
Zhu Yanjun authored
>From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL) is used to identify different flows within an IBA subnet. It is carried in the local route header of the packet. Before this commit, run "rds-info -I". The outputs are as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " After this commit, the output is as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " The commit fe3475af ("net: rds: add per rds connection cache statistics") adds cache_allocs in struct rds_info_rdma_connection as below: struct rds_info_rdma_connection { ... __u32 rdma_mr_max; __u32 rdma_mr_size; __u8 tos; __u32 cache_allocs; }; The peer struct in rds-tools of struct rds_info_rdma_connection is as below: struct rds_info_rdma_connection { ... uint32_t rdma_mr_max; uint32_t rdma_mr_size; uint8_t tos; uint8_t sl; uint32_t cache_allocs; }; The difference between userspace and kernel is the member variable sl. In the kernel struct, the member variable sl is missing. This will introduce risks. So it is necessary to use this commit to avoid this risk. Fixes: fe3475af ("net: rds: add per rds connection cache statistics") CC: Joe Jin <joe.jin@oracle.com> CC: JUNXIAO_BI <junxiao.bi@oracle.com> Suggested-by:
Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by:
Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by:
Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
An excerpt from netlink(7) man page, In multipart messages (multiple nlmsghdr headers with associated payload in one byte stream) the first and all following headers have the NLM_F_MULTI flag set, except for the last header which has the type NLMSG_DONE. but, after (ee28906f) there is a missing NLM_F_MULTI flag in the middle of a FIB dump. The result is user space applications following above man page excerpt may get confused and may stop parsing msg believing something went wrong. In the golang netlink lib [0] the library logic stops parsing believing the message is not a multipart message. Found this running Cilium[1] against net-next while adding a feature to auto-detect routes. I noticed with multiple route tables we no longer could detect the default routes on net tree kernels because the library logic was not returning them. Fix this by handling the fib_dump_info_fnhe() case the same way the fib_dump_info() handles it by passing the flags argument through the call chain and adding a flags argument to rt_fill_info(). Tested with Cilium stack and auto-detection of routes works again. Also annotated libs to dump netlink msgs and inspected NLM_F_MULTI and NLMSG_DONE flags look correct after this. Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same as is done for fib_dump_info() so this looks correct to me. [0] https://github.com/vishvananda/netlink/ [1] https://github.com/cilium/ Fixes: ee28906f ("ipv4: Dump route exceptions if requested") Signed-off-by:
John Fastabend <john.fastabend@gmail.com> Reviewed-by:
Stefano Brivio <sbrivio@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
xiaolinkui authored
This is an unlikely case, use unlikely() on it seems logical. Signed-off-by:
xiaolinkui <xiaolinkui@kylinos.cn> Signed-off-by:
David S. Miller <davem@davemloft.net>
-