Commit graph

1155346 commits

Author SHA1 Message Date
Jakub Kicinski
e8ac615fe1 Merge branch 'ipv6-fix-socket-connection-with-dscp-fib-rules'
Guillaume Nault says:

====================
ipv6: Fix socket connection with DSCP fib-rules.

The "flowlabel" field of struct flowi6 is used to store both the actual
flow label and the DS Field (or Traffic Class). However the .connect
handlers of datagram and TCP sockets don't set the DS Field part when
doing their route lookup. This breaks fib-rules that match on DSCP.
====================

Link: https://lore.kernel.org/r/cover.1675875519.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 22:49:06 -08:00
Guillaume Nault
c21a20d9d1 selftests: fib_rule_tests: Test UDP and TCP connections with DSCP rules.
Add the fib_rule6_send and fib_rule4_send tests to verify that DSCP
values are properly taken into account when UDP or TCP sockets try to
connect().

Tests are done with nettest, which needs a new option to specify
the DS Field value of the socket being tested. This new option is
named '-Q', in reference to the similar option used by ping.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 22:49:04 -08:00
Guillaume Nault
8230680f36 ipv6: Fix tcp socket connection with DSCP.
Take into account the IPV6_TCLASS socket option (DSCP) in
tcp_v6_connect(). Otherwise fib6_rule_match() can't properly
match the DSCP value, resulting in invalid route lookup.

For example:

  ip route add unreachable table main 2001:db8::10/124

  ip route add table 100 2001:db8::10/124 dev eth0
  ip -6 rule add dsfield 0x04 table 100

  echo test | socat - TCP6:[2001:db8::11]:54321,ipv6-tclass=0x04

Without this patch, socat fails at connect() time ("No route to host")
because the fib-rule doesn't jump to table 100 and the lookup ends up
being done in the main table.

Fixes: 2cc67cc731 ("[IPV6] ROUTE: Routing by Traffic Class.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 22:49:04 -08:00
Guillaume Nault
e010ae08c7 ipv6: Fix datagram socket connection with DSCP.
Take into account the IPV6_TCLASS socket option (DSCP) in
ip6_datagram_flow_key_init(). Otherwise fib6_rule_match() can't
properly match the DSCP value, resulting in invalid route lookup.

For example:

  ip route add unreachable table main 2001:db8::10/124

  ip route add table 100 2001:db8::10/124 dev eth0
  ip -6 rule add dsfield 0x04 table 100

  echo test | socat - UDP6:[2001:db8::11]:54321,ipv6-tclass=0x04

Without this patch, socat fails at connect() time ("No route to host")
because the fib-rule doesn't jump to table 100 and the lookup ends up
being done in the main table.

Fixes: 2cc67cc731 ("[IPV6] ROUTE: Routing by Traffic Class.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 22:49:04 -08:00
Jakub Kicinski
6e16e67a6b Merge branch 'nfp-fix-schedule-in-atomic-context-when-offloading-sa'
Simon Horman says:

====================
nfp: fix schedule in atomic context when offloading sa

Yinjun Zhang says:

IPsec offloading callbacks may be called in atomic context, sleep is
not allowed in the implementation. Now use workqueue mechanism to
avoid this issue.

Extend existing workqueue mechanism for multicast configuration only
to universal use, so that all configuring through mailbox asynchoronously
can utilize it.

Also fix another two incorrect use of mailbox in IPsec:
 1. Need lock for race condition when accessing mbox
 2. Offset of mbox access should depends on tlv caps
====================

Link: https://lore.kernel.org/r/20230208102258.29639-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 22:28:09 -08:00
Yinjun Zhang
71f814cda6 nfp: fix schedule in atomic context when offloading sa
IPsec offloading callbacks may be called in atomic context, sleep is
not allowed in the implementation. Now use workqueue mechanism to
avoid this issue.

Extend existing workqueue mechanism for multicast configuration only
to universal use, so that all configuring through mailbox asynchronously
can utilize it.

Fixes: 859a497fe8 ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 22:28:06 -08:00
Yinjun Zhang
7a13a2eef6 nfp: fix incorrect use of mbox in IPsec code
The mailbox configuration mechanism requires writing several registers,
which shouldn't be interrupted, so need lock to avoid race condition.

The base offset of mailbox configuration registers is not fixed, it
depends on TLV caps read from application firmware.

Fixes: 859a497fe8 ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 22:28:06 -08:00
Rafał Miłecki
d61615c366 net: bgmac: fix BCM5358 support by setting correct flags
Code blocks handling BCMA_CHIP_ID_BCM5357 and BCMA_CHIP_ID_BCM53572 were
incorrectly unified. Chip package values are not unique and cannot be
checked independently. They are meaningful only in a context of a given
chip.

Packages BCM5358 and BCM47188 share the same value but then belong to
different chips. Code unification resulted in treating BCM5358 as
BCM47188 and broke its initialization.

Link: https://github.com/openwrt/openwrt/issues/8278
Fixes: cb1b0f90ac ("net: ethernet: bgmac: unify code of the same family")
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230208091637.16291-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 22:25:31 -08:00
Linus Torvalds
35674e7875 Networking fixes for 6.2-rc8, including fixes from can and
ipsec subtrees
 
 Current release - regressions:
 
  - sched: fix off by one in htb_activate_prios()
 
  - eth: mana: fix accessing freed irq affinity_hint
 
  - eth: ice: fix out-of-bounds KASAN warning in virtchnl
 
 Current release - new code bugs:
 
  - eth: mtk_eth_soc: enable special tag when any MAC uses DSA
 
 Previous releases - always broken:
 
  - core: fix sk->sk_txrehash default
 
  - neigh: make sure used and confirmed times are valid
 
  - mptcp: be careful on subflow status propagation on errors
 
  - xfrm: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
 
  - phylink: move phy_device_free() to correctly release phy device
 
  - eth: mlx5:
    - fix crash unsetting rx-vlan-filter in switchdev mode
    - fix hang on firmware reset
    - serialize module cleanup with reload and remove
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmPlBMISHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkAZsQAJl3UBV0kKfxcuNNf4Qwrphpcg4TiZzC
 wD21ebpjG5MzULQ/r0J0Ry3ZlQJmLMs6l2zM3QO6U6obHxzSQ5BrhCHIv4lw6ODC
 Zfv3a82wVVTd123Zykx+bd8jz2uV5nIps5MtBmf7Dqm1ldVvpNAOnXSxVi52UCk5
 gXDYtYqGmO+ZU1IUJxUrRoPPCTGEnfRQvU23Cm7WgzUHjDh7yKihdRVajLKUxVq4
 9mEE9mwu7tf7AVMQ/oMCCPyh8PVAXkzCEoCm5V9MO4CApS4jgSGkX3OEjDKtEwUA
 efhES/3OLNCaPXauc6xkudXzooYoYpa1L0m/91pIHw4+ur3pueDpmBYfpC9cmPy6
 JJtpT47RWGSXaRiP8RpzGKhKhTf33T9lGYm6PRelUr+x/dnaM87u+IwKDr0nb9+H
 P0MnoI8pcaLpEFlwpTzGiNS8vJw1bC5NBTZuoSSJSxT8ItRwgzh7txfHAveR3etW
 Nue726RG/9rTyukG7zBdZVpqffe8r2mybar82Ns+zIsU8x4ZKTNj7WIhMNx0tL0/
 26tS5BdL4DqkDKFFJ+HJDXLpYCt7KWvG1k78xBdBBsQ/CjcBi3Nhk7NyMni5FTsa
 qnZd1OR4ruUHeFDqz+3q/PfxLncfRr/ZlSIF1ogq3ij5M0CoxwAWSrgh4tch7XsL
 YtdbL1xpTSk/
 =sJO+
 -----END PGP SIGNATURE-----

Merge tag 'net-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can and ipsec subtrees.

  Current release - regressions:

   - sched: fix off by one in htb_activate_prios()

   - eth: mana: fix accessing freed irq affinity_hint

   - eth: ice: fix out-of-bounds KASAN warning in virtchnl

  Current release - new code bugs:

   - eth: mtk_eth_soc: enable special tag when any MAC uses DSA

  Previous releases - always broken:

   - core: fix sk->sk_txrehash default

   - neigh: make sure used and confirmed times are valid

   - mptcp: be careful on subflow status propagation on errors

   - xfrm: prevent potential spectre v1 gadget in xfrm_xlate32_attr()

   - phylink: move phy_device_free() to correctly release phy device

   - eth: mlx5:
      - fix crash unsetting rx-vlan-filter in switchdev mode
      - fix hang on firmware reset
      - serialize module cleanup with reload and remove"

* tag 'net-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  selftests: forwarding: lib: quote the sysctl values
  net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used
  rds: rds_rm_zerocopy_callback() use list_first_entry()
  net: txgbe: Update support email address
  selftests: Fix failing VXLAN VNI filtering test
  selftests: mptcp: stop tests earlier
  selftests: mptcp: allow more slack for slow test-case
  mptcp: be careful on subflow status propagation on errors
  mptcp: fix locking for in-kernel listener creation
  mptcp: fix locking for setsockopt corner-case
  mptcp: do not wait for bare sockets' timeout
  net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0
  nfp: ethtool: fix the bug of setting unsupported port speed
  txhash: fix sk->sk_txrehash default
  net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg()
  net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSA
  net: sched: sch: Fix off by one in htb_activate_prios()
  igc: Add ndo_tx_timeout support
  net: mana: Fix accessing freed irq affinity_hint
  hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMIC
  ...
2023-02-09 09:17:38 -08:00
Linus Torvalds
0b028189d1 for-linus-2023020901
-----BEGIN PGP SIGNATURE-----
 
 iQJSBAABCAA8FiEEoEVH9lhNrxiMPSyI7MXwXhnZSjYFAmPlALYeHGJlbmphbWlu
 LnRpc3NvaXJlc0ByZWRoYXQuY29tAAoJEOzF8F4Z2Uo2mrgP/2VeORZZkTkIoWHi
 zHW40NRzf4rx+ou6ZagXvGbLQA1NCrZ1eiMceaD8P1U3s2BQ8CH09j2icWp9DtjB
 bEbMZhnn9AlW7PKvVad2U9V31EYxh7yawnyrsCK/jVnJrE7s2O+mFPPYdBT4bSgt
 lkLBy7bIIv216kNVSeoTuk7iiaer6cJkY6JFeuSVnKCNoMQyHAg2mrjV7WMjZhcx
 LMgnMLjgfAgPnb65eyLlRQotn2qaHbnKHWZ/0q7RMlRNoKnvvi+HToCChrq6E6ir
 dxGwyD4nXgZ8MQLGWOChmK1jHrf8lu325t1YWoVw8ITM9wIkL8e/P+KqvMsjDgbH
 1mm1SyiC0GtLcuelcafJOlttTJ+Kh3XlKrLTSMpYGYNbrnF2B0hfik8bWeT/Cbng
 aknJXeBU6L4Hta2t1sGVZVMXLQSl9/8sSp3LJNHSd+YPZUcHhtCx+gUglqVu0xlT
 aG7Gg/LRXneHW7XxpsnI+f40Bfc2fKz/Mz/ToUBMbHYjvrJt8Pv3j4hjUF400FFn
 /671TzBA5nMWTy4yJ3FQQReXc4nfWmCQyhOhHpK4gWck7Zk+HU6pL0yN25a3KVg2
 uTvHOzBZk6fUcOTiMYS7xBSJYGhi0GqettqBnRnikItTd84XBZnZFoy0x9TaxECs
 AnKjpOWGxxQIuHqBTQZfglFrqJf2
 =Z7v3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2023020901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Benjamin Tissoires:

 - fix potential infinite loop with a badly crafted HID device (Xin
   Zhao)

 - fix regression from 6.1 in USB logitech devices potentially making
   their mouse wheel not working (Bastien Nocera)

 - clean up in AMD sensors, which fixes a long time resume bug (Mario
   Limonciello)

 - few device small fixes and quirks

* tag 'for-linus-2023020901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: Ignore battery for ELAN touchscreen 29DF on HP
  HID: amd_sfh: if no sensors are enabled, clean up
  HID: logitech: Disable hi-res scrolling on USB
  HID: core: Fix deadloop in hid_apply_multiplier.
  HID: Ignore battery for Elan touchscreen on Asus TP420IA
  HID: elecom: add support for TrackBall 056E:011C
2023-02-09 09:09:13 -08:00
Linus Torvalds
94a1f56db6 small fix for use after free
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmPitC8ACgkQiiy9cAdy
 T1GEggv/e1xd1mYPgjCBVxbCh2GpHh+N4OSy1TiKrgRP5xNi5ZAj2y51dAqHpXbi
 m15h5aFFHN7gyxYl6lZz7gwvYX9WS6Dc46YnOH613ai6gjROKy/xKSoY9zipZ+gQ
 cm3lZuTqTctmNVFjg0HzkTZjryIoWOtrrhViK/bJYqiMOAqTyOAnzQ/01WVwsuzJ
 oLlEZhQqE7LS9OUeU3zNfUS/YD5EjqLwG9iwJUGEfxHh5h22oyFv1uroB1HBIqPw
 eodR0+H67fkuxxTGbjL44yid2nalvZB/F6X028SX0smWfAsC23L0bZaj4Cb4OIIU
 tpE5FvSeHisHPmtnh3gOmiqbxtvYmtKj9UM4JfC60qzNkAGAYYhuTWz6pdDD5I7h
 4UtYVUMa6rvZF+5zDfz7YDazu+drvcBG6Hw9acNGBdgTc/lAleb1dktpAGg1Kq+A
 0wZYlh6/8Q+r6hFxd5x5nzs4l+glyJ5UGltnt5c3/3suH4YO525j89agjAJnxxwS
 0WObkpJ6
 =OzkK
 -----END PGP SIGNATURE-----

Merge tag '6.2-rc8-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull cifx fix from Steve French:
 "Small fix for use after free"

* tag '6.2-rc8-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix use-after-free in rdata->read_into_pages()
2023-02-09 09:00:26 -08:00
Hangbin Liu
3a082086aa selftests: forwarding: lib: quote the sysctl values
When set/restore sysctl value, we should quote the value as some keys
may have multi values, e.g. net.ipv4.ping_group_range

Fixes: f5ae57784b ("selftests: forwarding: lib: Add sysctl_set(), sysctl_restore()")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20230208032110.879205-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-09 11:05:38 +01:00
Vladimir Oltean
2fcde9fe25 net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used
While running this selftest which usually passes:

~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
TEST: swp0: Multicast IPv6 to unknown group                         [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]

if I start PTP timestamping then run it again (debug prints added by me),
the unknown IPv6 MC traffic is seen by the CPU port even when it should
have been dropped:

~/selftests/drivers/net/dsa# ptp4l -i swp0 -2 -P -m
ptp4l[225.410]: selected /dev/ptp1 as PTP clock
[  225.445746] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_add: port 0 adding L2 PTP trap
[  225.453815] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP event trap
[  225.462703] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP general trap
[  225.471768] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP event trap
[  225.480651] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP general trap
ptp4l[225.488]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[225.488]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
^C
~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
TEST: swp0: Multicast IPv6 to unknown group                         [FAIL]
        reception succeeded, but should have failed
TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]

The PGID_MCIPV6 is configured correctly to not flood to the CPU,
I checked that.

Furthermore, when I disable back PTP RX timestamping (ptp4l doesn't do
that when it exists), packets are RX filtered again as they should be:

~/selftests/drivers/net/dsa# hwstamp_ctl -i swp0 -r 0
[  218.202854] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_del: port 0 removing L2 PTP trap
[  218.212656] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP event trap
[  218.222975] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP general trap
[  218.233133] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP event trap
[  218.242251] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP general trap
current settings:
tx_type 1
rx_filter 12
new settings:
tx_type 1
rx_filter 0
~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
TEST: swp0: Multicast IPv6 to unknown group                         [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]

So it's clear that something in the PTP RX trapping logic went wrong.

Looking a bit at the code, I can see that there are 4 typos, which
populate "ipv4" VCAP IS2 key filter fields for IPv6 keys.

VCAP IS2 keys of type OCELOT_VCAP_KEY_IPV4 and OCELOT_VCAP_KEY_IPV6 are
handled by is2_entry_set(). OCELOT_VCAP_KEY_IPV4 looks at
&filter->key.ipv4, and OCELOT_VCAP_KEY_IPV6 at &filter->key.ipv6.
Simply put, when we populate the wrong key field, &filter->key.ipv6
fields "proto.mask" and "proto.value" remain all zeroes (or "don't care").
So is2_entry_set() will enter the "else" of this "if" condition:

	if (msk == 0xff && (val == IPPROTO_TCP || val == IPPROTO_UDP))

and proceed to ignore the "proto" field. The resulting rule will match
on all IPv6 traffic, trapping it to the CPU.

This is the reason why the local_termination.sh selftest sees it,
because control traps are stronger than the PGID_MCIPV6 used for
flooding (from the forwarding data path).

But the problem is in fact much deeper. We trap all IPv6 traffic to the
CPU, but if we're bridged, we set skb->offload_fwd_mark = 1, so software
forwarding will not take place and IPv6 traffic will never reach its
destination.

The fix is simple - correct the typos.

I was intentionally inaccurate in the commit message about the breakage
occurring when any PTP timestamping is enabled. In fact it only happens
when L4 timestamping is requested (HWTSTAMP_FILTER_PTP_V2_EVENT or
HWTSTAMP_FILTER_PTP_V2_L4_EVENT). But ptp4l requests a larger RX
timestamping filter than it needs for "-2": HWTSTAMP_FILTER_PTP_V2_EVENT.
I wanted people skimming through git logs to not think that the bug
doesn't affect them because they only use ptp4l in L2 mode.

Fixes: 96ca08c058 ("net: mscc: ocelot: set up traps for PTP packets")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230207183117.1745754-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-09 10:52:44 +01:00
Pietro Borrello
f753a68980 rds: rds_rm_zerocopy_callback() use list_first_entry()
rds_rm_zerocopy_callback() uses list_entry() on the head of a list
causing a type confusion.
Use list_first_entry() to actually access the first element of the
rs_zcookie_queue list.

Fixes: 9426bbc6de ("rds: use list structure to track information for zerocopy completion notification")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
Link: https://lore.kernel.org/r/20230202-rds-zerocopy-v3-1-83b0df974f9a@diag.uniroma1.it
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-09 10:37:26 +01:00
Jakub Kicinski
646be03ec4 ipsec-2023-02-08
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmPjhfIACgkQrB3Eaf9P
 W7c5cg//b8Kpjp9AVOHzQ2UfZ5YbFNC3F+goJRWtCVIfXTft2N+1gn80uXk5q1r1
 GjSAlOxh2lQcNAZR4pWAXXqDq8vQuSmuNn7H6YZg8n6a9oFBi3fX5m1tNdu+G0wu
 Hv28M1jZgmnwJXcUQax3N59NHe1dv/CdtDd9qg6stvFDW3lcD96EbJUkKNsPwpin
 RZ/CqZX8UBpc+JRQmPPHw7dRSlvUh0hmr2m4HD63RJiYDrgOG9SO/Xx/GW6t/i9u
 XbuXezJxaulOvbB77uhevPkJqyDGy7IT3Ou0s5GGswiYZ6+ctcZZTayRsKcPWjnk
 n+OS7jk0yEApN1ZAo+wZD9inYKZuYOFNs2hkSetbca9NJ7ZpnXUoqvFxbZKl8biR
 v/ecf2yd31T+Vk0mhwBlgmcVfKNQzmRZA45UnOnxuYEXWNok89uQMqHmLdbSBrpR
 T0LptPzP5Vd7N33zRIiZS7t9vrlw9Xjfbm9O/sSHh0LuRED2gfHHp85MwBhcNfYA
 AKuxpj8Q9vEf1VW0OYhXyz8Te2+6oB2JJhqNdFsjea64zEoXFeJuSMYq/9DBzIDu
 glToJME9CV9BCN+r/q96zXBJao2UU4BhSGUVActKOktCnLJOE+XRsR7tpm2hHzr1
 wY1PHvCvbpvNXPTGsAf6nijwSFSAcrErfD3ZmyvWvkmopdEajW0=
 =Bkiy
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
ipsec 2023-02-08

1) Fix policy checks for nested IPsec tunnels when using
   xfrm interfaces. From Benedict Wong.

2) Fix netlink message expression on 32=>64-bit
   messages translators. From Anastasia Belova.

3) Prevent potential spectre v1 gadget in xfrm_xlate32_attr.
   From Eric Dumazet.

4) Always consistently use time64_t in xfrm_timer_handler.
   From Eric Dumazet.

5) Fix KCSAN reported bug: Multiple cpus can update use_time
   at the same time. From Eric Dumazet.

6) Fix SCP copy from IPv4 to IPv6 on interfamily tunnel.
   From Christian Hopps.

* tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: fix bug with DSCP copy to v6 from v4 tunnel
  xfrm: annotate data-race around use_time
  xfrm: consistently use time64_t in xfrm_timer_handler()
  xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
  xfrm: compat: change expression for switch in xfrm_xlate64
  Fix XFRM-I support for nested ESP tunnels
====================

Link: https://lore.kernel.org/r/20230208114322.266510-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08 21:35:39 -08:00
Jiawen Wu
363d7c2298 net: txgbe: Update support email address
Update new email address for Wangxun 10Gb NIC support team.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://lore.kernel.org/r/20230208023035.3371250-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08 20:48:37 -08:00
Jakub Kicinski
ff8ced4eef mlx5-fixes-2023-02-07
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmPjEHMACgkQSD+KveBX
 +j5OQQf/f2IFOfjX7x8ENe5ax31XWo6h7YG2TQX8fP7w71e+ehlg9H1x1PwGYcRj
 phP9koGxOwBlI2zPpS4zOM13ppTphbjneuS95rg1uwhRhlPy7Ahk2KwvZ48FqZGI
 UsECJ9+XKU6huV1kbfIHghVMLPLvrBrK7yXlO2R9iGHFSmUZz+olbsEpnpI07xHo
 nV0Di4zgbZJF4JS6fxHSGoQV/CE6aRHir4nCZIexunRwPeLzArQI7KH3bFxohGQ9
 Tv0/OwHKUOTqktEzH8/GbmQPbmOuQm7kGN4usw7VTxErdbhd1dNoTMDOHjIf8NlS
 MIkew3D6dc5G7oGq5BA9Y+Z+XW9H1g==
 =HDTV
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2023-02-07

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Serialize module cleanup with reload and remove
  net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
  net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
  net/mlx5: Expose SF firmware pages counter
  net/mlx5: Store page counters in a single array
  net/mlx5e: IPoIB, Show unknown speed instead of error
  net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode
  net/mlx5: Bridge, fix ageing of peer FDB entries
  net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic
  net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
====================

Link: https://lore.kernel.org/r/20230208030302.95378-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08 19:23:45 -08:00
Ido Schimmel
b963d9d5b9 selftests: Fix failing VXLAN VNI filtering test
iproute2 does not recognize the "group6" and "remote6" keywords. Fix by
using "group" and "remote" instead.

Before:

 # ./test_vxlan_vnifiltering.sh
 [...]
 Tests passed:  25
 Tests failed:   2

After:

 # ./test_vxlan_vnifiltering.sh
 [...]
 Tests passed:  27
 Tests failed:   0

Fixes: 3edf5f66c1 ("selftests: add new tests for vxlan vnifiltering")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230207141819.256689-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08 16:54:03 -08:00
David S. Miller
965bffd2dd Merge branch 'mptcp-fixes'
Matthieu Baerts says:

====================
mptcp: fixes for v6.2

Patch 1 clears resources earlier if there is no more reasons to keep
MPTCP sockets alive.

Patches 2 and 3 fix some locking issues visible in some rare corner
cases: the linked issues should be quite hard to reproduce.

Patch 4 makes sure subflows are correctly cleaned after the end of a
connection.

Patch 5 and 6 improve the selftests stability when running in a slow
environment by transfering data for a longer period on one hand and by
stopping the tests when all expected events have been observed on the
other hand.

All these patches fix issues introduced before v6.2.
====================

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:39:34 +00:00
Matthieu Baerts
070d6dafac selftests: mptcp: stop tests earlier
These 'endpoint' tests from 'mptcp_join.sh' selftest start a transfer in
the background and check the status during this transfer.

Once the expected events have been recorded, there is no reason to wait
for the data transfer to finish. It can be stopped earlier to reduce the
execution time by more than half.

For these tests, the exchanged data were not verified. Errors, if any,
were ignored but that's fine, plenty of other tests are looking at that.
It is then OK to mute stderr now that we are sure errors will be printed
(and still ignored) because the transfer is stopped before the end.

Fixes: e274f71540 ("selftests: mptcp: add subflow limits test-cases")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:39:34 +00:00
Paolo Abeni
a635a8c3df selftests: mptcp: allow more slack for slow test-case
A test-case is frequently failing on some extremely slow VMs.
The mptcp transfer completes before the script is able to do
all the required PM manipulation.

Address the issue in the simplest possible way, making the
transfer even more slow.

Additionally dump more info in case of failures, to help debugging
similar problems in the future and init dump_stats var.

Fixes: e274f71540 ("selftests: mptcp: add subflow limits test-cases")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/323
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:39:34 +00:00
Paolo Abeni
1249db44a1 mptcp: be careful on subflow status propagation on errors
Currently the subflow error report callback unconditionally
propagates the fallback subflow status to the owning msk.

If the msk is already orphaned, the above prevents the code
from correctly tracking the msk moving to the TCP_CLOSE state
and doing the appropriate cleanup.

All the above causes increasing memory usage over time and
sporadic self-tests failures.

There is a great deal of infrastructure trying to propagate
correctly the fallback subflow status to the owning mptcp socket,
e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed():
in the error propagation path we need only to cope with unorphaned
sockets.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/339
Fixes: 15cc104533 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:39:34 +00:00
Paolo Abeni
ad2171009d mptcp: fix locking for in-kernel listener creation
For consistency, in mptcp_pm_nl_create_listen_socket(), we need to
call the __mptcp_nmpc_socket() under the msk socket lock.

Note that as a side effect, mptcp_subflow_create_socket() needs a
'nested' lockdep annotation, as it will acquire the subflow (kernel)
socket lock under the in-kernel listener msk socket lock.

The current lack of locking is almost harmless, because the relevant
socket is not exposed to the user space, but in future we will add
more complexity to the mentioned helper, let's play safe.

Fixes: 1729cf186d ("mptcp: create the listening socket for new port")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:39:34 +00:00
Paolo Abeni
21e4356968 mptcp: fix locking for setsockopt corner-case
We need to call the __mptcp_nmpc_socket(), and later subflow socket
access under the msk socket lock, or e.g. a racing connect() could
change the socket status under the hood, with unexpected results.

Fixes: 54635bd047 ("mptcp: add TCP_FASTOPEN_CONNECT socket option")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:39:34 +00:00
Paolo Abeni
d4e85922e3 mptcp: do not wait for bare sockets' timeout
If the peer closes all the existing subflows for a given
mptcp socket and later the application closes it, the current
implementation let it survive until the timewait timeout expires.

While the above is allowed by the protocol specification it
consumes resources for almost no reason and additionally
causes sporadic self-tests failures.

Let's move the mptcp socket to the TCP_CLOSE state when there are
no alive subflows at close time, so that the allocated resources
will be freed immediately.

Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:39:34 +00:00
Vladimir Oltean
1a3245fe0c net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0
Arınç reports that on his MT7621AT Unielec U7621-06 board and MT7623NI
Bananapi BPI-R2, packets received by the CPU over mt7530 switch port 0
(of which this driver acts as the DSA master) are not processed
correctly by software. More precisely, they arrive without a DSA tag
(in packet or in the hwaccel area - skb_metadata_dst()), so DSA cannot
demux them towards the switch's interface for port 0. Traffic from other
ports receives a skb_metadata_dst() with the correct port and is demuxed
properly.

Looking at mtk_poll_rx(), it becomes apparent that this driver uses the
skb vlan hwaccel area:

	union {
		u32		vlan_all;
		struct {
			__be16	vlan_proto;
			__u16	vlan_tci;
		};
	};

as a temporary storage for the VLAN hwaccel tag, or the DSA hwaccel tag.
If this is a DSA master it's a DSA hwaccel tag, and finally clears up
the skb VLAN hwaccel header.

I'm guessing that the problem is the (mis)use of API.
skb_vlan_tag_present() looks like this:

 #define skb_vlan_tag_present(__skb)	(!!(__skb)->vlan_all)

So if both vlan_proto and vlan_tci are zeroes, skb_vlan_tag_present()
returns precisely false. I don't know for sure what is the format of the
DSA hwaccel tag, but I surely know that lowermost 3 bits of vlan_proto
are 0 when receiving from port 0:

	unsigned int port = vlan_proto & GENMASK(2, 0);

If the RX descriptor has no other bits set to non-zero values in
RX_DMA_VTAG, then the call to __vlan_hwaccel_put_tag() will not, in
fact, make the subsequent skb_vlan_tag_present() return true, because
it's implemented like this:

static inline void __vlan_hwaccel_put_tag(struct sk_buff *skb,
					  __be16 vlan_proto, u16 vlan_tci)
{
	skb->vlan_proto = vlan_proto;
	skb->vlan_tci = vlan_tci;
}

What we need to do to fix this problem (assuming this is the problem) is
to stop using skb->vlan_all as temporary storage for driver affairs, and
just create some local variables that serve the same purpose, but
hopefully better. Instead of calling skb_vlan_tag_present(), let's look
at a boolean has_hwaccel_tag which we set to true when the RX DMA
descriptors have something. Disambiguate based on netdev_uses_dsa()
whether this is a VLAN or DSA hwaccel tag, and only call
__vlan_hwaccel_put_tag() if we're certain it's a VLAN tag.

Arınç confirms that the treatment works, so this validates the
assumption.

Link: https://lore.kernel.org/netdev/704f3a72-fc9e-714a-db54-272e17612637@arinc9.com/
Fixes: 2d7605a729 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:11:31 +00:00
Yu Xiao
821de68c1f nfp: ethtool: fix the bug of setting unsupported port speed
Unsupported port speed can be set and cause error. Now fixing it
and return an error if setting unsupported speed.

This fix depends on the following, which was included in v6.2-rc1:
commit a61474c41e ("nfp: ethtool: support reporting link modes").

Fixes: 7c69873727 ("nfp: add support for .set_link_ksettings()")
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:08:12 +00:00
Kevin Yang
c11204c78d txhash: fix sk->sk_txrehash default
This code fix a bug that sk->sk_txrehash gets its default enable
value from sysctl_txrehash only when the socket is a TCP listener.

We should have sysctl_txrehash to set the default sk->sk_txrehash,
no matter TCP, nor listerner/connector.

Tested by following packetdrill:
  0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
  +0 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 4
  // SO_TXREHASH == 74, default to sysctl_txrehash == 1
  +0 getsockopt(3, SOL_SOCKET, 74, [1], [4]) = 0
  +0 getsockopt(4, SOL_SOCKET, 74, [1], [4]) = 0

Fixes: 26859240e4 ("txhash: Add socket option to control TX hash rethink behavior")
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:07:11 +00:00
Tariq Toukan
c966153d12 net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg()
Parameters 'queue_index' and 'napi_id' are passed in a swapped order.
Fix it here.

Fixes: 23233e577e ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:04:33 +00:00
Arınç ÜNAL
21386e6926 net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSA
The special tag is only enabled when the first MAC uses DSA. However, it
must be enabled when any MAC uses DSA. Change the check accordingly.

This fixes hardware DSA untagging not working on the second MAC of the
MT7621 and MT7623 SoCs, and likely other SoCs too. Therefore, remove the
check that disables hardware DSA untagging for the second MAC of the MT7621
and MT7623 SoCs.

Fixes: a1f47752fd ("net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC")
Co-developed-by: Richard van Schagen <richard@routerhints.com>
Signed-off-by: Richard van Schagen <richard@routerhints.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08 09:00:29 +00:00
Dan Carpenter
9cec2aaffe net: sched: sch: Fix off by one in htb_activate_prios()
The > needs be >= to prevent an out of bounds access.

Fixes: de5ca4c385 ("net: sched: sch: Bounds check priority")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/Y+D+KN18FQI2DKLq@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07 23:38:53 -08:00
Jakub Kicinski
91701f63d8 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-02-06 (ice)

This series contains updates to ice driver only.

Ani removes WQ_MEM_RECLAIM flag from workqueue to resolve
check_flush_dependency warning.

Michal fixes KASAN out-of-bounds warning.

Brett corrects behaviour for port VLAN Rx filters to prevent receiving
of unintended traffic.

Dan Carpenter fixes possible off by one issue.

Zhang Changzhong adjusts error path for switch recipe to prevent memory
leak.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: switch: fix potential memleak in ice_add_adv_recipe()
  ice: Fix off by one in ice_tc_forward_to_queue()
  ice: Fix disabling Rx VLAN filtering with port VLAN enabled
  ice: fix out-of-bounds KASAN warning in virtchnl
  ice: Do not use WQ_MEM_RECLAIM flag for workqueue
====================

Link: https://lore.kernel.org/r/20230206232934.634298-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07 22:04:44 -08:00
Sasha Neftin
9b27517627 igc: Add ndo_tx_timeout support
On some platforms, 100/1000/2500 speeds seem to have sometimes problems
reporting false positive tx unit hang during stressful UDP traffic. Likely
other Intel drivers introduce responses to a tx hang. Update the 'tx hang'
comparator with the comparison of the head and tail of ring pointers and
restore the tx_timeout_factor to the previous value (one).

This can be test by using netperf or iperf3 applications.
Example:
iperf3 -s -p 5001
iperf3 -c 192.168.0.2 --udp -p 5001 --time 600 -b 0

netserver -p 16604
netperf -H 192.168.0.2 -l 600 -p 16604 -t UDP_STREAM -- -m 64000

Fixes: b27b8dc77b ("igc: Increase timeout value for Speed 100/1000/2500")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230206235818.662384-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07 21:57:26 -08:00
Haiyang Zhang
18a048370b net: mana: Fix accessing freed irq affinity_hint
After calling irq_set_affinity_and_hint(), the cpumask pointer is
saved in desc->affinity_hint, and will be used later when reading
/proc/irq/<num>/affinity_hint. So the cpumask variable needs to be
persistent. Otherwise, we are accessing freed memory when reading
the affinity_hint file.

Also, need to clear affinity_hint before free_irq(), otherwise there
is a one-time warning and stack trace during module unloading:

 [  243.948687] WARNING: CPU: 10 PID: 1589 at kernel/irq/manage.c:1913 free_irq+0x318/0x360
 ...
 [  243.948753] Call Trace:
 [  243.948754]  <TASK>
 [  243.948760]  mana_gd_remove_irqs+0x78/0xc0 [mana]
 [  243.948767]  mana_gd_remove+0x3e/0x80 [mana]
 [  243.948773]  pci_device_remove+0x3d/0xb0
 [  243.948778]  device_remove+0x46/0x70
 [  243.948782]  device_release_driver_internal+0x1fe/0x280
 [  243.948785]  driver_detach+0x4e/0xa0
 [  243.948787]  bus_remove_driver+0x70/0xf0
 [  243.948789]  driver_unregister+0x35/0x60
 [  243.948792]  pci_unregister_driver+0x44/0x90
 [  243.948794]  mana_driver_exit+0x14/0x3fe [mana]
 [  243.948800]  __do_sys_delete_module.constprop.0+0x185/0x2f0

To fix the bug, use the persistent mask, cpumask_of(cpu#), and set
affinity_hint to NULL before freeing the IRQ, as required by free_irq().

Cc: stable@vger.kernel.org
Fixes: 71fa6887ee ("net: mana: Assign interrupts to CPUs based on NUMA nodes")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/1675718929-19565-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07 21:30:22 -08:00
Jakub Kicinski
b1f4fbabbb linux-can-fixes-for-6.2-20230207
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmPiWbYTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRC+UBxKKooE6HT3CACcKiUxmkFfS3I2MUCLt+spz2/4q3cG
 +MX8m03fzSU8efdKN1ckUOmQfsPp2RNP7NSeI/Anu83oka40zFuejC/Dfg9V1SBF
 EKD39VIWQTuPcYJP4GlxqKTkDHSVsA5iN/4vxzCFpp7C6LR3h2f0bTwj6SA44IvD
 eatH6Vzr/Y1rG6Yj27hZzFpC8HXqHzaIcC1UJpODqNTmhrux/hskem0XqH68VQXK
 FRi8K7xHgIs4MonamWGJj/oz1NkbEje3xvT4nZRkaMLgZY0prTQfJetSDaGwkBci
 WJmqJ15PpBRedDpdTIeXkZK8U95BWt/gZEp28XwgkohP6A8DsilJFS7F
 =Tcyi
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-6.2-20230207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
can 2023-02-07

The patch is from Devid Antonio Filoni and fixes an address claiming
problem in the J1939 CAN protocol.

* tag 'linux-can-fixes-for-6.2-20230207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: j1939: do not wait 250 ms if the same addr was already claimed
====================

Link: https://lore.kernel.org/r/20230207140514.2885065-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07 20:50:30 -08:00
Michael Kelley
c6aa9d3b43 hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMIC
Memory allocations in the network transmit path must use GFP_ATOMIC
so they won't sleep.

Reported-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/lkml/8a4d08f94d3e6fe8b6da68440eaa89a088ad84f9.camel@redhat.com/
Fixes: 846da38de0 ("net: netvsc: Add Isolation VM support for netvsc driver")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1675714317-48577-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07 20:39:26 -08:00
Shay Drory
8f0d1451ec net/mlx5: Serialize module cleanup with reload and remove
Currently, remove and reload flows can run in parallel to module cleanup.
This design is error prone. For example: aux_drivers callbacks are called
from both cleanup and remove flows with different lockings, which can
cause a deadlock[1].
Hence, serialize module cleanup with reload and remove.

[1]
       cleanup                        remove
       -------                        ------
   auxiliary_driver_unregister();
                                     devl_lock()
                                      auxiliary_device_delete(mlx5e_aux)
    device_lock(mlx5e_aux)
     devl_lock()
                                       device_lock(mlx5e_aux)

Fixes: 912cebf420 ("net/mlx5e: Connect ethernet part to auxiliary bus")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:07 -08:00
Shay Drory
184e1e4474 net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
When tracer is reloaded, the device will log the traces at the
beginning of the log buffer. Also, driver is reading the log buffer in
chunks in accordance to the consumer index.
Hence, zero consumer index when reloading the tracer.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:06 -08:00
Shay Drory
db561fed6b net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
Whenever the driver is reading the string DBs into buffers, the driver
is setting the load bit, but the driver never clears this bit.
As a result, in case load bit is on and the driver query the device for
new string DBs, the driver won't read again the string DBs.
Fix it by clearing the load bit when query the device for new string
DBs.

Fixes: 2d69356752 ("net/mlx5: Add support for fw live patch event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:06 -08:00
Maher Sanalla
9965bbebae net/mlx5: Expose SF firmware pages counter
Currently, each core device has VF pages counter which stores number of
fw pages used by its VFs and SFs.

The current design led to a hang when performing firmware reset on DPU,
where the DPU PFs stalled in sriov unload flow due to waiting on release
of SFs pages instead of waiting on only VFs pages.

Thus, Add a separate counter for SF firmware pages, which will prevent
the stall scenario described above.

Fixes: 1958fc2f07 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:06 -08:00
Maher Sanalla
c3bdbaea65 net/mlx5: Store page counters in a single array
Currently, an independent page counter is used for tracking memory usage
for each function type such as VF, PF and host PF (DPU).

For better code-readibilty, use a single array that stores
the number of allocated memory pages for each function type.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:05 -08:00
Dragos Tatulea
8aa5f171d5 net/mlx5e: IPoIB, Show unknown speed instead of error
ethtool is returning an error for unknown speeds for the IPoIB interface:

$ ethtool ib0
netlink error: failed to retrieve link settings
netlink error: Invalid argument
netlink error: failed to retrieve link settings
netlink error: Invalid argument
Settings for ib0:
Link detected: no

After this change, ethtool will return success and show "unknown speed":

$ ethtool ib0
Settings for ib0:
Supported ports: [  ]
Supported link modes:   Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: no

Fixes: eb234ee9d5 ("net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:05 -08:00
Amir Tzin
8974aa9638 net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode
Moving to switchdev mode with rx-vlan-filter on and then setting it off
causes the kernel to crash since fs->vlan is freed during nic profile
cleanup flow.

RX VLAN filtering is not supported in switchdev mode so unset it when
changing to switchdev and restore its value when switching back to
legacy.

trace:
[] RIP: 0010:mlx5e_disable_cvlan_filter+0x43/0x70
[] set_feature_cvlan_filter+0x37/0x40 [mlx5_core]
[] mlx5e_handle_feature+0x3a/0x60 [mlx5_core]
[] mlx5e_set_features+0x6d/0x160 [mlx5_core]
[] __netdev_update_features+0x288/0xa70
[] ethnl_set_features+0x309/0x380
[] ? __nla_parse+0x21/0x30
[] genl_family_rcv_msg_doit.isra.17+0x110/0x150
[] genl_rcv_msg+0x112/0x260
[] ? features_reply_size+0xe0/0xe0
[] ? genl_family_rcv_msg_doit.isra.17+0x150/0x150
[] netlink_rcv_skb+0x4e/0x100
[] genl_rcv+0x24/0x40
[] netlink_unicast+0x1ab/0x290
[] netlink_sendmsg+0x257/0x4f0
[] sock_sendmsg+0x5c/0x70

Fixes: cb67b83292 ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:05 -08:00
Vlad Buslov
da0c52426c net/mlx5: Bridge, fix ageing of peer FDB entries
SWITCHDEV_FDB_ADD_TO_BRIDGE event handler that updates FDB entry 'lastuse'
field is only executed for eswitch that owns the entry. However, if peer
entry processed packets at least once it will have hardware counter 'used'
value greater than entry 'lastuse' from that point on, which will cause FDB
entry not being aged out.

Process the event on all eswitch instances.

Fixes: ff9b752146 ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:05 -08:00
Yevgeny Kliteynik
288d85e07f net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic
Selecting builder should be protected by the lock to prevent the case
where a new rule sets a builder in the nic_matcher while the previous
rule is still using the nic_matcher.

Fixing this issue and cleaning the error flow.

Fixes: b9b81e1e93 ("net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:04 -08:00
Adham Faris
1e66220948 net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
rq->hw_mtu is used in function en_rx.c/mlx5e_skb_from_cqe_mpwrq_linear()
to catch oversized packets. If FCS is concatenated to the end of the
packet then the check should be updated accordingly.

Rx rings initialization (mlx5e_init_rxq_rq()) invoked for every new set
of channels, as part of mlx5e_safe_switch_params(), unknowingly if it
runs with default configuration or not. Current rq->hw_mtu
initialization assumes default configuration and ignores
params->scatter_fcs_en flag state.
Fix this, by accounting for params->scatter_fcs_en flag state during
rq->hw_mtu initialization.

In addition, updating rq->hw_mtu value during ingress traffic might
lead to packets drop and oversize_pkts_sw_drop counter increase with no
good reason. Hence we remove this optimization and switch the set of
channels with a new one, to make sure we don't get false positives on
the oversize_pkts_sw_drop counter.

Fixes: 102722fc68 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07 19:01:04 -08:00
Linus Torvalds
0983f6bf2b Devicetree fixes for v6.2, part 2:
- Fix handling of multiple OF framebuffer devices
 
 - Fix booting on Socionext Synquacer with bad 'dma-ranges' entries
 
 - Add DT binding .yamllint to .gitignore
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmPiyF0ACgkQ+vtdtY28
 YcNZpA//adsF6J/7vPo9SiRj/Y+EFZ2m2J1mUubMP4UEAGtfG+JnkOmvjTgO3Rar
 aYivXNh2pqKqvOEw35XR5donzeygTxr953W4xzGRrPW282tZeWBeeKfz0bikVguv
 v7WzTlmYVL1I6P5KCNB2C5f5zZEXdDc7dA3pJmt+IM5sjEuduhqwlrw57Rd0uRdM
 OZOjlrLiwA1I9AIo081EmoPHUAcsckgxJzFqdLvh8Vz1JT3n8Rtj1GXuWQvSAueX
 p8vu771QeVJSgzEuZfA4g5Nia9T30g2XhxeD5bBtgndGoKdE8j2AzMoghBODJABa
 hvKd1DV9jP14c9HV7M/kgFb8JBEmXnKpARiSIyoP/KXLVraJS3HpbOSgbuTgdrp5
 lGxLbb44NYGGbPt1Sl25XPInr2b3BMfto/pvve9G9863fcY6SUk5aDuGIx6pNBsl
 lufniFQbaFTLdidskQ7bm187L0QEE4wGMY1ZDs77elMaMdtP6THXfim2Scq1UAKt
 tKMgYUciL/nps7rHCovmS8iXvysU7qztw6+F1nlOJMrrDld7LgrOZffWAjBnbNDa
 z1RZY1O1uyIIMam8VgeBbdMNoiqNv+2MNWH9STpY74DqS3+o2+pXyJitMS6+1U7p
 Rhlzbiwei4F1E0rFZqSGcWsgx+yy7HlgsfnEOKGkWjoFmBik26k=
 =T25l
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-fixes-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Fix handling of multiple OF framebuffer devices

 - Fix booting on Socionext Synquacer with bad 'dma-ranges' entries

 - Add DT binding .yamllint to .gitignore

* tag 'devicetree-fixes-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: interrupt-controller: arm,gic-v3: Fix typo in description of msi-controller property
  dt-bindings: Fix .gitignore
  of/address: Return an error when no valid dma-ranges are found
  of: Make OF framebuffer device names unique
2023-02-07 14:17:12 -08:00
Linus Torvalds
513c1a3d3f Fix regression in poll() and select()
With the fix that made poll() and select() block if read would block
 caused a slight regression in rasdaemon, as it needed that kind
 of behavior. Add a way to make that behavior come back by writing
 zero into the "buffer_percentage", which means to never block on read.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY+Jn3xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgQ6AQC30hHcPMPm8+drlH/P6wEYstRP6xbp
 nHYHcvT6qXNPtAD+OhUQR2Zav66m6cE0qvkdnZb72E0YHRTfBhN5OpshTgQ=
 =dJEF
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix regression in poll() and select()

  With the fix that made poll() and select() block if read would block
  caused a slight regression in rasdaemon, as it needed that kind of
  behavior. Add a way to make that behavior come back by writing zero
  into the 'buffer_percentage', which means to never block on read"

* tag 'trace-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
2023-02-07 07:54:40 -08:00
Devid Antonio Filoni
4ae5e1e97c can: j1939: do not wait 250 ms if the same addr was already claimed
The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states:
  d) No CF shall begin, or resume, transmission on the network until 250
     ms after it has successfully claimed an address except when
     responding to a request for address-claimed.

But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
prioritization" show that the CF begins the transmission after 250 ms
from the first AC (address-claimed) message even if it sends another AC
message during that time window to resolve the address contention with
another CF.

As stated in "4.4.2.3 - Address-claimed message":
  In order to successfully claim an address, the CF sending an address
  claimed message shall not receive a contending claim from another CF
  for at least 250 ms.

As stated in "4.4.3.2 - NAME management (NM) message":
  1) A commanding CF can
     d) request that a CF with a specified NAME transmit the address-
        claimed message with its current NAME.
  2) A target CF shall
     d) send an address-claimed message in response to a request for a
        matching NAME

Taking the above arguments into account, the 250 ms wait is requested
only during network initialization.

Do not restart the timer on AC message if both the NAME and the address
match and so if the address has already been claimed (timer has expired)
or the AC message has been sent to resolve the contention with another
CF (timer is still running).

Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20221125170418.34575-1-devid.filoni@egluetechnologies.com
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-07 15:00:22 +01:00
Jiri Pirko
565b4824c3 devlink: change port event netdev notifier from per-net to global
Currently only the network namespace of devlink instance is monitored
for port events. If netdev is moved to a different namespace and then
unregistered, NETDEV_PRE_UNINIT is missed which leads to trigger
following WARN_ON in devl_port_unregister().
WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET);

Fix this by changing the netdev notifier from per-net to global so no
event is missed.

Fixes: 02a68a47ea ("net: devlink: track netdev with devlink_port assigned")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230206094151.2557264-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-07 14:13:55 +01:00