Skip to content
Snippets Groups Projects
  1. May 06, 2024
  2. May 02, 2024
  3. Apr 27, 2024
    • Greg Kroah-Hartman's avatar
    • Oleg Nesterov's avatar
      selftests: kselftest: Fix build failure with NOLIBC · 63cc4f14
      Oleg Nesterov authored
      
      commit 16767502aa990cca2cb7d1372b31d328c4c85b40 upstream.
      
      As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
      it doesn't implement uname().
      
      Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
      Reported-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20240412123536.GA32444@redhat.com
      Closes: https://lore.kernel.org/all/f0523b3a-ea08-4615-b0fb-5b504a2d39df@sirena.org.uk/
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63cc4f14
    • Mika Westerberg's avatar
      thunderbolt: Reset only non-USB4 host routers in resume · c67f926e
      Mika Westerberg authored
      
      commit 8cf9926c537ce8b0c7783afebe752e084765d553 upstream.
      
      There is no need to reset the USB4 host routers on resume because they
      are reset already and this may cause problems if the link does not come
      up soon enough. For this reason limit this to happen in non-USB4 host
      routers only (that's Apple systems with Intel Thunderbolt controllers).
      
      Fixes: 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware")
      Cc: Sanath S <Sanath.S@amd.com>
      Reviewed-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c67f926e
    • Johan Hovold's avatar
      PCI/ASPM: Fix deadlock when enabling ASPM · b0f44788
      Johan Hovold authored
      commit 1e560864159d002b453da42bd2c13a1805515a20 upstream.
      
      A last minute revert in 6.7-final introduced a potential deadlock when
      enabling ASPM during probe of Qualcomm PCIe controllers as reported by
      lockdep:
      
        ============================================
        WARNING: possible recursive locking detected
        6.7.0 #40 Not tainted
        --------------------------------------------
        kworker/u16:5/90 is trying to acquire lock:
        ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
      
                    but task is already holding lock:
        ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
      
                    other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(pci_bus_sem);
          lock(pci_bus_sem);
      
                     *** DEADLOCK ***
      
        Call trace:
         print_deadlock_bug+0x25c/0x348
         __lock_acquire+0x10a4/0x2064
         lock_acquire+0x1e8/0x318
         down_read+0x60/0x184
         pcie_aspm_pm_state_change+0x58/0xdc
         pci_set_full_power_state+0xa8/0x114
         pci_set_power_state+0xc4/0x120
         qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
         pci_walk_bus+0x64/0xbc
         qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
      
      The deadlock can easily be reproduced on machines like the Lenovo ThinkPad
      X13s by adding a delay to increase the race window during asynchronous
      probe where another thread can take a write lock.
      
      Add a new pci_set_power_state_locked() and associated helper functions that
      can be called with the PCI bus semaphore held to avoid taking the read lock
      twice.
      
      Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com
      Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org
      
      
      Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
      Signed-off-by: default avatarJohan Hovold <johan+linaro@kernel.org>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: <stable@vger.kernel.org>	# 6.7
      [bhelgaas: backported to v6.6.y, which contains 8cc22ba3 ("Revert
       "PCI/ASPM: Remove pcie_aspm_pm_state_change()""), a backport of
       f93e71aea6c6.  This omits the drivers/pci/controller/dwc/pcie-qcom.c hunk
       that updates qcom_pcie_enable_aspm(), which was added by 9f4f3dfad8cf
       ("PCI: qcom: Enable ASPM for platforms supporting 1.9.0 ops"), which is not
       present in v6.6.28.]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0f44788
    • Namjae Jeon's avatar
      ksmbd: common: use struct_group_attr instead of struct_group for network_open_info · 3b629239
      Namjae Jeon authored
      
      commit 0268a7cc7fdc47d90b6c18859de7718d5059f6f1 upstream.
      
      4byte padding cause the connection issue with the applications of MacOS.
      smb2_close response size increases by 4 bytes by padding, And the smb
      client of MacOS check it and stop the connection. This patch use
      struct_group_attr instead of struct_group for network_open_info to use
       __packed to avoid padding.
      
      Fixes: 0015eb6e1238 ("smb: client, common: fix fortify warnings")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b629239
    • Marios Makassikis's avatar
      ksmbd: clear RENAME_NOREPLACE before calling vfs_rename · 4cbb8835
      Marios Makassikis authored
      
      commit 4973b04d3ea577db80c501c5f14e68ec69fe1794 upstream.
      
      File overwrite case is explicitly handled, so it is not necessary to
      pass RENAME_NOREPLACE to vfs_rename.
      
      Clearing the flag fixes rename operations when the share is a ntfs-3g
      mount. The latter uses an older version of fuse with no support for
      flags in the ->rename op.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarios Makassikis <mmakassikis@freebox.fr>
      Acked-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4cbb8835
    • Namjae Jeon's avatar
      ksmbd: validate request buffer size in smb2_allocate_rsp_buf() · 5c20b242
      Namjae Jeon authored
      
      commit 17cf0c2794bdb6f39671265aa18aea5c22ee8c4a upstream.
      
      The response buffer should be allocated in smb2_allocate_rsp_buf
      before validating request. But the fields in payload as well as smb2 header
      is used in smb2_allocate_rsp_buf(). This patch add simple buffer size
      validation to avoid potencial out-of-bounds in request buffer.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c20b242
    • Namjae Jeon's avatar
      ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf · 3160d973
      Namjae Jeon authored
      
      commit c119f4ede3fa90a9463f50831761c28f989bfb20 upstream.
      
      If ->ProtocolId is SMB2_TRANSFORM_PROTO_NUM, smb2 request size
      validation could be skipped. if request size is smaller than
      sizeof(struct smb2_query_info_req), slab-out-of-bounds read can happen in
      smb2_allocate_rsp_buf(). This patch allocate response buffer after
      decrypting transform request. smb3_decrypt_req() will validate transform
      request size and avoid slab-out-of-bound in smb2_allocate_rsp_buf().
      
      Reported-by: default avatarNorbert Szetei <norbert@doyensec.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3160d973
    • Naveen N Rao's avatar
      powerpc/ftrace: Ignore ftrace locations in exit text sections · 6355b468
      Naveen N Rao authored
      
      commit ea73179e64131bcd29ba6defd33732abdf8ca14b upstream.
      
      Michael reported that we are seeing an ftrace bug on bootup when KASAN
      is enabled and we are using -fpatchable-function-entry:
      
        ftrace: allocating 47780 entries in 18 pages
        ftrace-powerpc: 0xc0000000020b3d5c: No module provided for non-kernel address
        ------------[ ftrace bug ]------------
        ftrace faulted on modifying
        [<c0000000020b3d5c>] 0xc0000000020b3d5c
        Initializing ftrace call sites
        ftrace record flags: 0
         (0)
         expected tramp: c00000000008cef4
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2180 ftrace_bug+0x3c0/0x424
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rc3-00120-g0f71dcfb4aef #860
        Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
        NIP:  c0000000003aa81c LR: c0000000003aa818 CTR: 0000000000000000
        REGS: c0000000033cfab0 TRAP: 0700   Not tainted  (6.5.0-rc3-00120-g0f71dcfb4aef)
        MSR:  8000000002021033 <SF,VEC,ME,IR,DR,RI,LE>  CR: 28028240  XER: 00000000
        CFAR: c0000000002781a8 IRQMASK: 3
        ...
        NIP [c0000000003aa81c] ftrace_bug+0x3c0/0x424
        LR [c0000000003aa818] ftrace_bug+0x3bc/0x424
        Call Trace:
         ftrace_bug+0x3bc/0x424 (unreliable)
         ftrace_process_locs+0x5f4/0x8a0
         ftrace_init+0xc0/0x1d0
         start_kernel+0x1d8/0x484
      
      With CONFIG_FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY=y and
      CONFIG_KASAN=y, compiler emits nops in functions that it generates for
      registering and unregistering global variables (unlike with -pg and
      -mprofile-kernel where calls to _mcount() are not generated in those
      functions). Those functions then end up in INIT_TEXT and EXIT_TEXT
      respectively. We don't expect to see any profiled functions in
      EXIT_TEXT, so ftrace_init_nop() assumes that all addresses that aren't
      in the core kernel text belongs to a module. Since these functions do
      not match that criteria, we see the above bug.
      
      Address this by having ftrace ignore all locations in the text exit
      sections of vmlinux.
      
      Fixes: 0f71dcfb ("powerpc/ftrace: Add support for -fpatchable-function-entry")
      Cc: stable@vger.kernel.org # v6.6+
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarNaveen N Rao <naveen@kernel.org>
      Reviewed-by: default avatarBenjamin Gray <bgray@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240213175410.1091313-1-naveen@kernel.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6355b468
    • Breno Leitao's avatar
      virtio_net: Do not send RSS key if it is not supported · 43a71c1b
      Breno Leitao authored
      
      commit 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 upstream.
      
      There is a bug when setting the RSS options in virtio_net that can break
      the whole machine, getting the kernel into an infinite loop.
      
      Running the following command in any QEMU virtual machine with virtionet
      will reproduce this problem:
      
          # ethtool -X eth0  hfunc toeplitz
      
      This is how the problem happens:
      
      1) ethtool_set_rxfh() calls virtnet_set_rxfh()
      
      2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
      
      3) virtnet_commit_rss_command() populates 4 entries for the rss
      scatter-gather
      
      4) Since the command above does not have a key, then the last
      scatter-gatter entry will be zeroed, since rss_key_size == 0.
      sg_buf_size = vi->rss_key_size;
      
      5) This buffer is passed to qemu, but qemu is not happy with a buffer
      with zero length, and do the following in virtqueue_map_desc() (QEMU
      function):
      
        if (!sz) {
            virtio_error(vdev, "virtio: zero sized buffers are not allowed");
      
      6) virtio_error() (also QEMU function) set the device as broken
      
          vdev->broken = true;
      
      7) Qemu bails out, and do not repond this crazy kernel.
      
      8) The kernel is waiting for the response to come back (function
      virtnet_send_command())
      
      9) The kernel is waiting doing the following :
      
            while (!virtqueue_get_buf(vi->cvq, &tmp) &&
      	     !virtqueue_is_broken(vi->cvq))
      	      cpu_relax();
      
      10) None of the following functions above is true, thus, the kernel
      loops here forever. Keeping in mind that virtqueue_is_broken() does
      not look at the qemu `vdev->broken`, so, it never realizes that the
      vitio is broken at QEMU side.
      
      Fix it by not sending RSS commands if the feature is not available in
      the device.
      
      Fixes: c7114b12 ("drivers/net/virtio_net: Added basic RSS support.")
      Cc: stable@vger.kernel.org
      Cc: qemu-devel@nongnu.org
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarHeng Qi <hengqi@linux.alibaba.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarVlad Poenaru <vlad.wing@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43a71c1b
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards · bd41ee1e
      Arınç ÜNAL authored
      
      commit 06dfcd4098cfdc4d4577d94793a4f9125386da8b upstream.
      
      The commit 40b5d2f1 ("net: dsa: mt7530: Add support for EEE features")
      brought EEE support but did not enable EEE on MT7531 switch MACs. EEE is
      enabled on MT7531 switch MACs by pulling the LAN2LED0 pin low on the board
      (bootstrapping), unsetting the EEE_DIS bit on the trap register, or setting
      the internal EEE switch bit on the CORE_PLL_GROUP4 register. Thanks to
      SkyLake Huang (黃啟澤) from MediaTek for providing information on the
      internal EEE switch bit.
      
      There are existing boards that were not designed to pull the pin low.
      Because of that, the EEE status currently depends on the board design.
      
      The EEE_DIS bit on the trap pertains to the LAN2LED0 pin which is usually
      used to control an LED. Once the bit is unset, the pin will be low. That
      will make the active low LED turn on. The pin is controlled by the switch
      PHY. It seems that the PHY controls the pin in the way that it inverts the
      pin state. That means depending on the wiring of the LED connected to
      LAN2LED0 on the board, the LED may be on without an active link.
      
      To not cause this unwanted behaviour whilst enabling EEE on all boards, set
      the internal EEE switch bit on the CORE_PLL_GROUP4 register.
      
      My testing on MT7531 shows a certain amount of traffic loss when EEE is
      enabled. That said, I haven't come across a board that enables EEE. So
      enable EEE on the switch MACs but disable EEE advertisement on the switch
      PHYs. This way, we don't change the behaviour of the majority of the boards
      that have this switch. The mediatek-ge PHY driver already disables EEE
      advertisement on the switch PHYs but my testing shows that it is somehow
      enabled afterwards. Disabling EEE advertisement before the PHY driver
      initialises keeps it off.
      
      With this change, EEE can now be enabled using ethtool.
      
      Fixes: 40b5d2f1 ("net: dsa: mt7530: Add support for EEE features")
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Tested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Link: https://lore.kernel.org/r/20240408-for-net-mt7530-fix-eee-for-mt7531-mt7988-v3-1-84fdef1f008b@arinc9.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd41ee1e
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530 · 21b9d89d
      Arınç ÜNAL authored
      commit 5f563c31ff0c40ce395d0bae7daa94c7950dac97 upstream.
      
      The MT7530 switch after reset initialises with a core clock frequency that
      works with a 25MHz XTAL connected to it. For 40MHz XTAL, the core clock
      frequency must be set to 500MHz.
      
      The mt7530_pll_setup() function is responsible of setting the core clock
      frequency. Currently, it runs on MT7530 with 25MHz and 40MHz XTAL. This
      causes MT7530 switch with 25MHz XTAL to egress and ingress frames
      improperly.
      
      Introduce a check to run it only on MT7530 with 40MHz XTAL.
      
      The core clock frequency is set by writing to a switch PHY's register.
      Access to the PHY's register is done via the MDIO bus the switch is also
      on. Therefore, it works only when the switch makes switch PHYs listen on
      the MDIO bus the switch is on. This is controlled either by the state of
      the ESW_P1_LED_1 pin after reset deassertion or modifying bit 5 of the
      modifiable trap register.
      
      When ESW_P1_LED_1 is pulled high, PHY indirect access is used. That means
      accessing PHY registers via the PHY indirect access control register of the
      switch.
      
      When ESW_P1_LED_1 is pulled low, PHY direct access is used. That means
      accessing PHY registers via the MDIO bus the switch is on.
      
      For MT7530 switch with 40MHz XTAL on a board with ESW_P1_LED_1 pulled high,
      the core clock frequency won't be set to 500MHz, causing the switch to
      egress and ingress frames improperly.
      
      Run mt7530_pll_setup() after PHY direct access is set on the modifiable
      trap register.
      
      With these two changes, all MT7530 switches with 25MHz and 40MHz, and
      P1_LED_1 pulled high or low, will egress and ingress frames properly.
      
      Link: https://github.com/BPI-SINOVOIP/BPI-R2-bsp/blob/4a5dd143f2172ec97a2872fa29c7c4cd520f45b5/linux-mt/drivers/net/ethernet/mediatek/gsw_mt7623.c#L1039
      
      
      Fixes: b8f126a8 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Link: https://lore.kernel.org/r/20240320-for-net-mt7530-fix-25mhz-xtal-with-direct-phy-access-v1-1-d92f605f1160@arinc9.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21b9d89d
    • Jeongjun Park's avatar
      nilfs2: fix OOB in nilfs_set_de_type · 2382eae6
      Jeongjun Park authored
      commit c4a7dc9523b59b3e73fd522c73e95e072f876b16 upstream.
      
      The size of the nilfs_type_by_mode array in the fs/nilfs2/dir.c file is
      defined as "S_IFMT >> S_SHIFT", but the nilfs_set_de_type() function,
      which uses this array, specifies the index to read from the array in the
      same way as "(mode & S_IFMT) >> S_SHIFT".
      
      static void nilfs_set_de_type(struct nilfs_dir_entry *de, struct inode
       *inode)
      {
      	umode_t mode = inode->i_mode;
      
      	de->file_type = nilfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; // oob
      }
      
      However, when the index is determined this way, an out-of-bounds (OOB)
      error occurs by referring to an index that is 1 larger than the array size
      when the condition "mode & S_IFMT == S_IFMT" is satisfied.  Therefore, a
      patch to resize the nilfs_type_by_mode array should be applied to prevent
      OOB errors.
      
      Link: https://lkml.kernel.org/r/20240415182048.7144-1-konishi.ryusuke@gmail.com
      
      
      Reported-by: default avatar <syzbot+2e22057de05b9f3b30d8@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=2e22057de05b9f3b30d8
      
      
      Fixes: 2ba466d7 ("nilfs2: directory entry operations")
      Signed-off-by: default avatarJeongjun Park <aha310510@gmail.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2382eae6
    • Qiang Zhang's avatar
      bootconfig: use memblock_free_late to free xbc memory to buddy · e46d3be7
      Qiang Zhang authored
      commit 89f9a1e876b5a7ad884918c03a46831af202c8a0 upstream.
      
      On the time to free xbc memory in xbc_exit(), memblock may has handed
      over memory to buddy allocator. So it doesn't make sense to free memory
      back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
      on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
      Following KASAN logs shows this case.
      
      This patch fixes the xbc memory free problem by calling memblock_free()
      in early xbc init error rewind path and calling memblock_free_late() in
      xbc exit path to free memory to buddy allocator.
      
      [    9.410890] ==================================================================
      [    9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260
      [    9.426850] Read of size 8 at addr ffff88845dd30000 by task swapper/0/1
      
      [    9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G     U             6.9.0-rc3-00208-g586b5dfb51b9 #5
      [    9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D-00000000 Dec 28 2023
      [    9.460789] Call Trace:
      [    9.463518]  <TASK>
      [    9.465859]  dump_stack_lvl+0x53/0x70
      [    9.469949]  print_report+0xce/0x610
      [    9.473944]  ? __virt_addr_valid+0xf5/0x1b0
      [    9.478619]  ? memblock_isolate_range+0x12d/0x260
      [    9.483877]  kasan_report+0xc6/0x100
      [    9.487870]  ? memblock_isolate_range+0x12d/0x260
      [    9.493125]  memblock_isolate_range+0x12d/0x260
      [    9.498187]  memblock_phys_free+0xb4/0x160
      [    9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
      [    9.508021]  ? mutex_unlock+0x7e/0xd0
      [    9.512111]  ? __pfx_mutex_unlock+0x10/0x10
      [    9.516786]  ? kernel_init_freeable+0x2d4/0x430
      [    9.521850]  ? __pfx_kernel_init+0x10/0x10
      [    9.526426]  xbc_exit+0x17/0x70
      [    9.529935]  kernel_init+0x38/0x1e0
      [    9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
      [    9.538601]  ret_from_fork+0x2c/0x50
      [    9.542596]  ? __pfx_kernel_init+0x10/0x10
      [    9.547170]  ret_from_fork_asm+0x1a/0x30
      [    9.551552]  </TASK>
      
      [    9.555649] The buggy address belongs to the physical page:
      [    9.561875] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x45dd30
      [    9.570821] flags: 0x200000000000000(node=0|zone=2)
      [    9.576271] page_type: 0xffffffff()
      [    9.580167] raw: 0200000000000000 ffffea0011774c48 ffffea0012ba1848 0000000000000000
      [    9.588823] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
      [    9.597476] page dumped because: kasan: bad access detected
      
      [    9.605362] Memory state around the buggy address:
      [    9.610714]  ffff88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [    9.618786]  ffff88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [    9.626857] >ffff88845dd30000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [    9.634930]                    ^
      [    9.638534]  ffff88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [    9.646605]  ffff88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [    9.654675] ==================================================================
      
      Link: https://lore.kernel.org/all/20240414114944.1012359-1-qiang4.zhang@linux.intel.com/
      
      
      
      Fixes: 40caa127 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
      Cc: Stable@vger.kernel.org
      Signed-off-by: default avatarQiang Zhang <qiang4.zhang@intel.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e46d3be7
    • Dave Airlie's avatar
      nouveau: fix instmem race condition around ptr stores · a019b44b
      Dave Airlie authored
      
      commit fff1386cc889d8fb4089d285f883f8cba62d82ce upstream.
      
      Running a lot of VK CTS in parallel against nouveau, once every
      few hours you might see something like this crash.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000008
      PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP PTI
      CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
      Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
      RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
      Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
      RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
      RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
      RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
      R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
      R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
      FS:  00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      
      ...
      
       ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
       ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
       nvkm_vmm_iter+0x351/0xa20 [nouveau]
       ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
       ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
       ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
       ? __lock_acquire+0x3ed/0x2170
       ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
       nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
       ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
       ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
       nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]
      
      Adding any sort of useful debug usually makes it go away, so I hand
      wrote the function in a line, and debugged the asm.
      
      Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
      the nv50_instobj_acquire called from nvkm_kmap.
      
      If Thread A and Thread B both get to nv50_instobj_acquire around
      the same time, and Thread A hits the refcount_set line, and in
      lockstep thread B succeeds at refcount_inc_not_zero, there is a
      chance the ptrs value won't have been stored since refcount_set
      is unordered. Force a memory barrier here, I picked smp_mb, since
      we want it on all CPUs and it's write followed by a read.
      
      v2: use paired smp_rmb/smp_wmb.
      
      Cc: <stable@vger.kernel.org>
      Fixes: be55287a ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarDanilo Krummrich <dakr@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a019b44b
    • Zack Rusin's avatar
      drm/vmwgfx: Fix crtc's atomic check conditional · 5d2f587a
      Zack Rusin authored
      
      commit a60ccade88f926e871a57176e86a34bbf0db0098 upstream.
      
      The conditional was supposed to prevent enabling of a crtc state
      without a set primary plane. Accidently it also prevented disabling
      crtc state with a set primary plane. Neither is correct.
      
      Fix the conditional and just driver-warn when a crtc state has been
      enabled without a primary plane which will help debug broken userspace.
      
      Fixes IGT's kms_atomic_interruptible and kms_atomic_transition tests.
      
      Signed-off-by: default avatarZack Rusin <zack.rusin@broadcom.com>
      Fixes: 06ec4190 ("drm/vmwgfx: Add and connect CRTC helper functions")
      Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
      Cc: dri-devel@lists.freedesktop.org
      Cc: <stable@vger.kernel.org> # v4.12+
      Reviewed-by: default avatarIan Forbes <ian.forbes@broadcom.com>
      Reviewed-by: default avatarMartin Krastev <martin.krastev@broadcom.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240412025511.78553-5-zack.rusin@broadcom.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d2f587a
    • Zack Rusin's avatar
      drm/vmwgfx: Sort primary plane formats by order of preference · f1769cb2
      Zack Rusin authored
      
      commit d4c972bff3129a9dd4c22a3999fd8eba1a81531a upstream.
      
      The table of primary plane formats wasn't sorted at all, leading to
      applications picking our least desirable formats by defaults.
      
      Sort the primary plane formats according to our order of preference.
      
      Nice side-effect of this change is that it makes IGT's kms_atomic
      plane-invalid-params pass because the test picks the first format
      which for vmwgfx was DRM_FORMAT_XRGB1555 and uses fb's with odd sizes
      which make Pixman, which IGT depends on assert due to the fact that our
      16bpp formats aren't 32 bit aligned like Pixman requires all formats
      to be.
      
      Signed-off-by: default avatarZack Rusin <zack.rusin@broadcom.com>
      Fixes: 36cc79bc ("drm/vmwgfx: Add universal plane support")
      Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
      Cc: dri-devel@lists.freedesktop.org
      Cc: <stable@vger.kernel.org> # v4.12+
      Acked-by: default avatarPekka Paalanen <pekka.paalanen@collabora.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240412025511.78553-6-zack.rusin@broadcom.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1769cb2
    • Zack Rusin's avatar
      drm/vmwgfx: Fix prime import/export · 65674218
      Zack Rusin authored
      
      commit b32233accefff1338806f064fb9b62cf5bc0609f upstream.
      
      vmwgfx never supported prime import of external buffers. Furthermore the
      driver exposes two different objects to userspace: vmw_surface's and
      gem buffers but prime import/export only worked with vmw_surfaces.
      
      Because gem buffers are used through the dumb_buffer interface this meant
      that the driver created buffers couldn't have been prime exported or
      imported.
      
      Fix prime import/export. Makes IGT's kms_prime pass.
      
      Signed-off-by: default avatarZack Rusin <zack.rusin@broadcom.com>
      Fixes: 8afa13a0 ("drm/vmwgfx: Implement DRIVER_GEM")
      Cc: <stable@vger.kernel.org> # v6.6+
      Reviewed-by: default avatarMartin Krastev <martin.krastev@broadcom.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240412025511.78553-4-zack.rusin@broadcom.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65674218
    • Christian König's avatar
      drm/amdgpu: remove invalid resource->start check v2 · db74904a
      Christian König authored
      
      commit ca7c4507ba87e9fc22e0ecfa819c3664b3e8287b upstream.
      
      The majority of those where removed in the commit aed01a68
      ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2")
      
      But this one was missed because it's working on the resource and not the
      BO. Since we also no longer use a fake start address for visible BOs
      this will now trigger invalid mapping errors.
      
      v2: also remove the unused variable
      
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Fixes: aed01a68 ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2")
      CC: stable@vger.kernel.org
      Acked-by: default avatarPierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db74904a
    • Felix Kuehling's avatar
      drm/amdkfd: Fix memory leak in create_process failure · aa02d433
      Felix Kuehling authored
      
      commit 18921b205012568b45760753ad3146ddb9e2d4e2 upstream.
      
      Fix memory leak due to a leaked mmget reference on an error handling
      code path that is triggered when attempting to create KFD processes
      while a GPU reset is in progress.
      
      Fixes: 0ab2d753 ("drm/amdkfd: prepare per-process debug enable and disable")
      CC: Xiaogang Chen <xiaogang.chen@amd.com>
      Signed-off-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Tested-by: default avatarHarish Kasiviswanthan <Harish.Kasiviswanthan@amd.com>
      Reviewed-by: default avatarMukul Joshi <mukul.joshi@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa02d433
    • xinhui pan's avatar
      drm/amdgpu: validate the parameters of bo mapping operations more clearly · ef13eeca
      xinhui pan authored
      
      commit 6fef2d4c00b5b8561ad68dd2b68173f5c6af1e75 upstream.
      
      Verify the parameters of
      amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place.
      
      Fixes: dc54d3d1 ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarVlad Stolyarov <hexed@google.com>
      Suggested-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarxinhui pan <xinhui.pan@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef13eeca
    • Danny Lin's avatar
      fuse: fix leaked ENOSYS error on first statx call · 885d4c31
      Danny Lin authored
      
      commit eb4b691b9115fae4c844f5941418335575cf667f upstream.
      
      FUSE attempts to detect server support for statx by trying it once and
      setting no_statx=1 if it fails with ENOSYS, but consider the following
      scenario:
      
      - Userspace (e.g. sh) calls stat() on a file
        * succeeds
      - Userspace (e.g. lsd) calls statx(BTIME) on the same file
        - request_mask = STATX_BASIC_STATS | STATX_BTIME
        - first pass: sync=true due to differing cache_mask
        - statx fails and returns ENOSYS
        - set no_statx and retry
        - retry sets mask = STATX_BASIC_STATS
        - now mask == cache_mask; sync=false (time_before: still valid)
        - so we take the "else if (stat)" path
        - "err" is still ENOSYS from the failed statx call
      
      Fix this by zeroing "err" before retrying the failed call.
      
      Fixes: d3045530 ("fuse: implement statx")
      Cc: stable@vger.kernel.org # v6.6
      Signed-off-by: default avatarDanny Lin <danny@orbstack.dev>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      885d4c31
    • Sumanth Korikkar's avatar
      mm/shmem: inline shmem_is_huge() for disabled transparent hugepages · cc10db00
      Sumanth Korikkar authored
      commit 1f737846aa3c45f07a06fa0d018b39e1afb8084a upstream.
      
      In order to  minimize code size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y),
      compiler might choose to make a regular function call (out-of-line) for
      shmem_is_huge() instead of inlining it. When transparent hugepages are
      disabled (CONFIG_TRANSPARENT_HUGEPAGE=n), it can cause compilation
      error.
      
      mm/shmem.c: In function `shmem_getattr':
      ./include/linux/huge_mm.h:383:27: note: in expansion of macro `BUILD_BUG'
        383 | #define HPAGE_PMD_SIZE ({ BUILD_BUG(); 0; })
            |                           ^~~~~~~~~
      mm/shmem.c:1148:33: note: in expansion of macro `HPAGE_PMD_SIZE'
       1148 |                 stat->blksize = HPAGE_PMD_SIZE;
      
      To prevent the possible error, always inline shmem_is_huge() when
      transparent hugepages are disabled.
      
      Link: https://lkml.kernel.org/r/20240409155407.2322714-1-sumanthk@linux.ibm.com
      
      
      Signed-off-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc10db00
    • Miaohe Lin's avatar
      mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled · 882e1180
      Miaohe Lin authored
      commit 1983184c22dd84a4d95a71e5c6775c2638557dc7 upstream.
      
      When I did hard offline test with hugetlb pages, below deadlock occurs:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.8.0-11409-gf6cef5f8c37f #1 Not tainted
      ------------------------------------------------------
      bash/46904 is trying to acquire lock:
      ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60
      
      but task is already holding lock:
      ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (pcp_batch_high_lock){+.+.}-{3:3}:
             __mutex_lock+0x6c/0x770
             page_alloc_cpu_online+0x3c/0x70
             cpuhp_invoke_callback+0x397/0x5f0
             __cpuhp_invoke_callback_range+0x71/0xe0
             _cpu_up+0xeb/0x210
             cpu_up+0x91/0xe0
             cpuhp_bringup_mask+0x49/0xb0
             bringup_nonboot_cpus+0xb7/0xe0
             smp_init+0x25/0xa0
             kernel_init_freeable+0x15f/0x3e0
             kernel_init+0x15/0x1b0
             ret_from_fork+0x2f/0x50
             ret_from_fork_asm+0x1a/0x30
      
      -> #0 (cpu_hotplug_lock){++++}-{0:0}:
             __lock_acquire+0x1298/0x1cd0
             lock_acquire+0xc0/0x2b0
             cpus_read_lock+0x2a/0xc0
             static_key_slow_dec+0x16/0x60
             __hugetlb_vmemmap_restore_folio+0x1b9/0x200
             dissolve_free_huge_page+0x211/0x260
             __page_handle_poison+0x45/0xc0
             memory_failure+0x65e/0xc70
             hard_offline_page_store+0x55/0xa0
             kernfs_fop_write_iter+0x12c/0x1d0
             vfs_write+0x387/0x550
             ksys_write+0x64/0xe0
             do_syscall_64+0xca/0x1e0
             entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcp_batch_high_lock);
                                     lock(cpu_hotplug_lock);
                                     lock(pcp_batch_high_lock);
        rlock(cpu_hotplug_lock);
      
       *** DEADLOCK ***
      
      5 locks held by bash/46904:
       #0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
       #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
       #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
       #3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70
       #4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40
      
      stack backtrace:
      CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x68/0xa0
       check_noncircular+0x129/0x140
       __lock_acquire+0x1298/0x1cd0
       lock_acquire+0xc0/0x2b0
       cpus_read_lock+0x2a/0xc0
       static_key_slow_dec+0x16/0x60
       __hugetlb_vmemmap_restore_folio+0x1b9/0x200
       dissolve_free_huge_page+0x211/0x260
       __page_handle_poison+0x45/0xc0
       memory_failure+0x65e/0xc70
       hard_offline_page_store+0x55/0xa0
       kernfs_fop_write_iter+0x12c/0x1d0
       vfs_write+0x387/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xca/0x1e0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7fc862314887
      Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887
      RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001
      RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
      R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00
      
      In short, below scene breaks the lock dependency chain:
      
       memory_failure
        __page_handle_poison
         zone_pcp_disable -- lock(pcp_batch_high_lock)
         dissolve_free_huge_page
          __hugetlb_vmemmap_restore_folio
           static_key_slow_dec
            cpus_read_lock -- rlock(cpu_hotplug_lock)
      
      Fix this by calling drain_all_pages() instead.
      
      This issue won't occur until commit a6b40850 ("mm: hugetlb: replace
      hugetlb_free_vmemmap_enabled with a static_key").  As it introduced
      rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while
      lock(pcp_batch_high_lock) is already in the __page_handle_poison().
      
      [linmiaohe@huawei.com: extend comment per Oscar]
      [akpm@linux-foundation.org: reflow block comment]
      Link: https://lkml.kernel.org/r/20240407085456.2798193-1-linmiaohe@huawei.com
      
      
      Fixes: a6b40850 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarJane Chu <jane.chu@oracle.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      882e1180
    • Oscar Salvador's avatar
      mm,swapops: update check in is_pfn_swap_entry for hwpoison entries · c85106fb
      Oscar Salvador authored
      commit 07a57a338adb6ec9e766d6a6790f76527f45ceb5 upstream.
      
      Tony reported that the Machine check recovery was broken in v6.9-rc1, as
      he was hitting a VM_BUG_ON when injecting uncorrectable memory errors to
      DRAM.
      
      After some more digging and debugging on his side, he realized that this
      went back to v6.1, with the introduction of 'commit 0d206b5d
      ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")'.  That
      commit, among other things, introduced swp_offset_pfn(), replacing
      hwpoison_entry_to_pfn() in its favour.
      
      The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(), but
      is_pfn_swap_entry() never got updated to cover hwpoison entries, which
      means that we would hit the VM_BUG_ON whenever we would call
      swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM
      set.  Fix this by updating the check to cover hwpoison entries as well,
      and update the comment while we are it.
      
      Link: https://lkml.kernel.org/r/20240407130537.16977-1-osalvador@suse.de
      
      
      Fixes: 0d206b5d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
      
      
      Tested-by: default avatarTony Luck <tony.luck@intel.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: <stable@vger.kernel.org>	[6.1.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c85106fb
    • Peter Xu's avatar
      mm/userfaultfd: allow hugetlb change protection upon poison entry · db01bfbd
      Peter Xu authored
      commit c5977c95dff182d6ee06f4d6f60bcb0284912969 upstream.
      
      After UFFDIO_POISON, there can be two kinds of hugetlb pte markers, either
      the POISON one or UFFD_WP one.
      
      Allow change protection to run on a poisoned marker just like !hugetlb
      cases, ignoring the marker irrelevant of the permission.
      
      Here the two bits are mutual exclusive.  For example, when install a
      poisoned entry it must not be UFFD_WP already (by checking pte_none()
      before such install).  And it also means if UFFD_WP is set there must have
      no POISON bit set.  It makes sense because UFFD_WP is a bit to reflect
      permission, and permissions do not apply if the pte is poisoned and
      destined to sigbus.
      
      So here we simply check uffd_wp bit set first, do nothing otherwise.
      
      Attach the Fixes to UFFDIO_POISON work, as before that it should not be
      possible to have poison entry for hugetlb (e.g., hugetlb doesn't do swap,
      so no chance of swapin errors).
      
      Link: https://lkml.kernel.org/r/20240405231920.1772199-1-peterx@redhat.com
      Link: https://lore.kernel.org/r/000000000000920d5e0615602dd1@google.com
      
      
      Fixes: fc71884a ("mm: userfaultfd: add new UFFDIO_POISON ioctl")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reported-by: default avatar <syzbot+b07c8ac8eee3d4d8440f@syzkaller.appspotmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: <stable@vger.kernel.org>	[6.6+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db01bfbd
    • Yuntao Wang's avatar
      init/main.c: Fix potential static_command_line memory overflow · 81cf85ae
      Yuntao Wang authored
      commit 46dad3c1e57897ab9228332f03e1c14798d2d3b9 upstream.
      
      We allocate memory of size 'xlen + strlen(boot_command_line) + 1' for
      static_command_line, but the strings copied into static_command_line are
      extra_command_line and command_line, rather than extra_command_line and
      boot_command_line.
      
      When strlen(command_line) > strlen(boot_command_line), static_command_line
      will overflow.
      
      This patch just recovers strlen(command_line) which was miss-consolidated
      with strlen(boot_command_line) in the commit f5c7310a ("init/main: add
      checks for the return value of memblock_alloc*()")
      
      Link: https://lore.kernel.org/all/20240412081733.35925-2-ytcoode@gmail.com/
      
      
      
      Fixes: f5c7310a ("init/main: add checks for the return value of memblock_alloc*()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarYuntao Wang <ytcoode@gmail.com>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81cf85ae
    • Yaxiong Tian's avatar
      arm64: hibernate: Fix level3 translation fault in swsusp_save() · 31f815cb
      Yaxiong Tian authored
      
      commit 50449ca66cc5a8cbc64749cf4b9f3d3fc5f4b457 upstream.
      
      On arm64 machines, swsusp_save() faults if it attempts to access
      MEMBLOCK_NOMAP memory ranges. This can be reproduced in QEMU using UEFI
      when booting with rodata=off debug_pagealloc=off and CONFIG_KFENCE=n:
      
        Unable to handle kernel paging request at virtual address ffffff8000000000
        Mem abort info:
          ESR = 0x0000000096000007
          EC = 0x25: DABT (current EL), IL = 32 bits
          SET = 0, FnV = 0
          EA = 0, S1PTW = 0
          FSC = 0x07: level 3 translation fault
        Data abort info:
          ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
          CM = 0, WnR = 0, TnD = 0, TagAccess = 0
          GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
        swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000eeb0b000
        [ffffff8000000000] pgd=180000217fff9803, p4d=180000217fff9803, pud=180000217fff9803, pmd=180000217fff8803, pte=0000000000000000
        Internal error: Oops: 0000000096000007 [#1] SMP
        Internal error: Oops: 0000000096000007 [#1] SMP
        Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bpfilter rfkill at803x snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg dwmac_generic stmmac_platform snd_hda_codec stmmac joydev pcs_xpcs snd_hda_core phylink ppdev lp parport ramoops reed_solomon ip_tables x_tables nls_iso8859_1 vfat multipath linear amdgpu amdxcp drm_exec gpu_sched drm_buddy hid_generic usbhid hid radeon video drm_suballoc_helper drm_ttm_helper ttm i2c_algo_bit drm_display_helper cec drm_kms_helper drm
        CPU: 0 PID: 3663 Comm: systemd-sleep Not tainted 6.6.2+ #76
        Source Version: 4e22ed63a0a48e7a7cff9b98b7806d8d4add7dc0
        Hardware name: Greatwall GW-XXXXXX-XXX/GW-XXXXXX-XXX, BIOS KunLun BIOS V4.0 01/19/2021
        pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
        pc : swsusp_save+0x280/0x538
        lr : swsusp_save+0x280/0x538
        sp : ffffffa034a3fa40
        x29: ffffffa034a3fa40 x28: ffffff8000001000 x27: 0000000000000000
        x26: ffffff8001400000 x25: ffffffc08113e248 x24: 0000000000000000
        x23: 0000000000080000 x22: ffffffc08113e280 x21: 00000000000c69f2
        x20: ffffff8000000000 x19: ffffffc081ae2500 x18: 0000000000000000
        x17: 6666662074736420 x16: 3030303030303030 x15: 3038666666666666
        x14: 0000000000000b69 x13: ffffff9f89088530 x12: 00000000ffffffea
        x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffffc08193f0d0
        x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 0000000000000001
        x5 : ffffffa0fff09dc8 x4 : 0000000000000000 x3 : 0000000000000027
        x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000004e
        Call trace:
         swsusp_save+0x280/0x538
         swsusp_arch_suspend+0x148/0x190
         hibernation_snapshot+0x240/0x39c
         hibernate+0xc4/0x378
         state_store+0xf0/0x10c
         kobj_attr_store+0x14/0x24
      
      The reason is swsusp_save() -> copy_data_pages() -> page_is_saveable()
      -> kernel_page_present() assuming that a page is always present when
      can_set_direct_map() is false (all of rodata_full,
      debug_pagealloc_enabled() and arm64_kfence_can_set_direct_map() false),
      irrespective of the MEMBLOCK_NOMAP ranges. Such MEMBLOCK_NOMAP regions
      should not be saved during hibernation.
      
      This problem was introduced by changes to the pfn_valid() logic in
      commit a7d9f306 ("arm64: drop pfn_valid_within() and simplify
      pfn_valid()").
      
      Similar to other architectures, drop the !can_set_direct_map() check in
      kernel_page_present() so that page_is_savable() skips such pages.
      
      Fixes: a7d9f306 ("arm64: drop pfn_valid_within() and simplify pfn_valid()")
      Cc: <stable@vger.kernel.org> # 5.14.x
      Suggested-by: default avatarMike Rapoport <rppt@kernel.org>
      Suggested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Co-developed-by: default avatarxiongxin <xiongxin@kylinos.cn>
      Signed-off-by: default avatarxiongxin <xiongxin@kylinos.cn>
      Signed-off-by: default avatarYaxiong Tian <tianyaxiong@kylinos.cn>
      Acked-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Link: https://lore.kernel.org/r/20240417025248.386622-1-tianyaxiong@kylinos.cn
      
      
      [catalin.marinas@arm.com: rework commit message]
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31f815cb
    • Ard Biesheuvel's avatar
      arm64/head: Disable MMU at EL2 before clearing HCR_EL2.E2H · e972b6a7
      Ard Biesheuvel authored
      
      commit 34e526cb7d46726b2ae5f83f2892d00ebb088509 upstream.
      
      Even though the boot protocol stipulates otherwise, an exception has
      been made for the EFI stub, and entering the core kernel with the MMU
      enabled is permitted. This allows a substantial amount of cache
      maintenance to be elided, wich is significant when fast boot times are
      critical (e.g., for booting micro-VMs)
      
      Once the initial ID map has been populated, the MMU is disabled as part
      of the logic sequence that puts all system registers into a known state.
      Any code that needs to execute within the window where the MMU is off is
      cleaned to the PoC explicitly, which includes all of HYP text when
      entering at EL2.
      
      However, the current sequence of initializing the EL2 system registers
      is not safe: HCR_EL2 is set to its nVHE initial state before SCTLR_EL2
      is reprogrammed, and this means that a VHE-to-nVHE switch may occur
      while the MMU is enabled. This switch causes some system registers as
      well as page table descriptors to be interpreted in a different way,
      potentially resulting in spurious exceptions relating to MMU
      translation.
      
      So disable the MMU explicitly first when entering in EL2 with the MMU
      and caches enabled.
      
      Fixes: 61786170 ("efi: arm64: enter with MMU and caches enabled")
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: <stable@vger.kernel.org> # 6.3.x
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20240415075412.2347624-6-ardb+git@google.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e972b6a7
    • David Matlack's avatar
      KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty status · cdf811a9
      David Matlack authored
      
      commit 2673dfb591a359c75080dd5af3da484b89320d22 upstream.
      
      Check kvm_mmu_page_ad_need_write_protect() when deciding whether to
      write-protect or clear D-bits on TDP MMU SPTEs, so that the TDP MMU
      accounts for any role-specific reasons for disabling D-bit dirty logging.
      
      Specifically, TDP MMU SPTEs must be write-protected when the TDP MMU is
      being used to run an L2 (i.e. L1 has disabled EPT) and PML is enabled.
      KVM always disables PML when running L2, even when L1 and L2 GPAs are in
      the some domain, so failing to write-protect TDP MMU SPTEs will cause
      writes made by L2 to not be reflected in the dirty log.
      
      Reported-by: default avatar <syzbot+900d58a45dcaab9e4821@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=900d58a45dcaab9e4821
      
      
      Fixes: 5982a539 ("KVM: x86/mmu: Use kvm_ad_enabled() to determine if TDP MMU SPTEs need wrprot")
      Cc: stable@vger.kernel.org
      Cc: Vipin Sharma <vipinsh@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Link: https://lore.kernel.org/r/20240315230541.1635322-2-dmatlack@google.com
      
      
      [sean: massage shortlog and changelog, tweak ternary op formatting]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdf811a9
    • Sandipan Das's avatar
      KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms · 947d518e
      Sandipan Das authored
      
      commit 49ff3b4aec51e3abfc9369997cc603319b02af9a upstream.
      
      On AMD and Hygon platforms, the local APIC does not automatically set
      the mask bit of the LVTPC register when handling a PMI and there is
      no need to clear it in the kernel's PMI handler.
      
      For guests, the mask bit is currently set by kvm_apic_local_deliver()
      and unless it is cleared by the guest kernel's PMI handler, PMIs stop
      arriving and break use-cases like sampling with perf record.
      
      This does not affect non-PerfMonV2 guests because PMIs are handled in
      the guest kernel by x86_pmu_handle_irq() which always clears the LVTPC
      mask bit irrespective of the vendor.
      
      Before:
      
        $ perf record -e cycles:u true
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data (1 samples) ]
      
      After:
      
        $ perf record -e cycles:u true
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]
      
      Fixes: a16eb25b ("KVM: x86: Mask LVTPC when handling a PMI")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSandipan Das <sandipan.das@amd.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      [sean: use is_intel_compatible instead of !is_amd_or_hygon()]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240405235603.1173076-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      947d518e
    • Sean Christopherson's avatar
      KVM: x86/pmu: Disable support for adaptive PEBS · 037e48ce
      Sean Christopherson authored
      commit 9e985cbf2942a1bb8fcef9adc2a17d90fd7ca8ee upstream.
      
      Drop support for virtualizing adaptive PEBS, as KVM's implementation is
      architecturally broken without an obvious/easy path forward, and because
      exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak
      host kernel addresses to the guest.
      
      Bug #1 is that KVM doesn't account for the upper 32 bits of
      IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g
      fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters()
      stores local variables as u8s and truncates the upper bits too, etc.
      
      Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value
      for PEBS events, perf will _always_ generate an adaptive record, even if
      the guest requested a basic record.  Note, KVM will also enable adaptive
      PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the
      guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero,
      i.e. the guest will only ever see Basic records.
      
      Bug #3 is in perf.  intel_pmu_disable_fixed() doesn't clear the upper
      bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and
      intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE
      either.  I.e. perf _always_ enables ADAPTIVE counters, regardless of what
      KVM requests.
      
      Bug #4 is that adaptive PEBS *might* effectively bypass event filters set
      by the host, as "Updated Memory Access Info Group" records information
      that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER.
      
      Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least
      zeros) when entering a vCPU with adaptive PEBS, which allows the guest
      to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries"
      records.
      
      Disable adaptive PEBS support as an immediate fix due to the severity of
      the LBR leak in particular, and because fixing all of the bugs will be
      non-trivial, e.g. not suitable for backporting to stable kernels.
      
      Note!  This will break live migration, but trying to make KVM play nice
      with live migration would be quite complicated, wouldn't be guaranteed to
      work (i.e. KVM might still kill/confuse the guest), and it's not clear
      that there are any publicly available VMMs that support adaptive PEBS,
      let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't
      support PEBS in any capacity.
      
      Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com
      Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com
      
      
      Fixes: c59a1f10 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
      Cc: stable@vger.kernel.org
      Cc: Like Xu <like.xu.linux@gmail.com>
      Cc: Mingwei Zhang <mizhang@google.com>
      Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
      Cc: Zhang Xiong <xiong.y.zhang@intel.com>
      Cc: Lv Zhiyuan <zhiyuan.lv@intel.com>
      Cc: Dapeng Mi <dapeng1.mi@intel.com>
      Cc: Jim Mattson <jmattson@google.com>
      Acked-by: default avatarLike Xu <likexu@tencent.com>
      Link: https://lore.kernel.org/r/20240307005833.827147-1-seanjc@google.com
      
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      037e48ce
    • Sean Christopherson's avatar
      KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible · bdda0c17
      Sean Christopherson authored
      
      commit fd706c9b1674e2858766bfbf7430534c2b26fbef upstream.
      
      Add kvm_vcpu_arch.is_amd_compatible to cache if a vCPU's vendor model is
      compatible with AMD, i.e. if the vCPU vendor is AMD or Hygon, along with
      helpers to check if a vCPU is compatible AMD vs. Intel.  To handle Intel
      vs. AMD behavior related to masking the LVTPC entry, KVM will need to
      check for vendor compatibility on every PMI injection, i.e. querying for
      AMD will soon be a moderately hot path.
      
      Note!  This subtly (or maybe not-so-subtly) makes "Intel compatible" KVM's
      default behavior, both if userspace omits (or never sets) CPUID 0x0 and if
      userspace sets a completely unknown vendor.  One could argue that KVM
      should treat such vCPUs as not being compatible with Intel *or* AMD, but
      that would add useless complexity to KVM.
      
      KVM needs to do *something* in the face of vendor specific behavior, and
      so unless KVM conjured up a magic third option, choosing to treat unknown
      vendors as neither Intel nor AMD means that checks on AMD compatibility
      would yield Intel behavior, and checks for Intel compatibility would yield
      AMD behavior.  And that's far worse as it would effectively yield random
      behavior depending on whether KVM checked for AMD vs. Intel vs. !AMD vs.
      !Intel.  And practically speaking, all x86 CPUs follow either Intel or AMD
      architecture, i.e. "supporting" an unknown third architecture adds no
      value.
      
      Deliberately don't convert any of the existing guest_cpuid_is_intel()
      checks, as the Intel side of things is messier due to some flows explicitly
      checking for exactly vendor==Intel, versus some flows assuming anything
      that isn't "AMD compatible" gets Intel behavior.  The Intel code will be
      cleaned up in the future.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240405235603.1173076-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdda0c17
    • Mathieu Desnoyers's avatar
      sched: Add missing memory barrier in switch_mm_cid · 7fce9f0f
      Mathieu Desnoyers authored
      
      commit fe90f3967bdb3e13f133e5f44025e15f943a99c5 upstream.
      
      Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
      which the core scheduler code has depended upon since commit:
      
          commit 223baf9d ("sched: Fix performance regression introduced by mm_cid")
      
      If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
      unset the actively used cid when it fails to observe active task after it
      sets lazy_put.
      
      There *is* a memory barrier between storing to rq->curr and _return to
      userspace_ (as required by membarrier), but the rseq mm_cid has stricter
      requirements: the barrier needs to be issued between store to rq->curr
      and switch_mm_cid(), which happens earlier than:
      
        - spin_unlock(),
        - switch_to().
      
      So it's fine when the architecture switch_mm() happens to have that
      barrier already, but less so when the architecture only provides the
      full barrier in switch_to() or spin_unlock().
      
      It is a bug in the rseq switch_mm_cid() implementation. All architectures
      that don't have memory barriers in switch_mm(), but rather have the full
      barrier either in finish_lock_switch() or switch_to() have them too late
      for the needs of switch_mm_cid().
      
      Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
      generic barrier.h header, and use it in switch_mm_cid() for scheduler
      transitions where switch_mm() is expected to provide a memory barrier.
      
      Architectures can override smp_mb__after_switch_mm() if their
      switch_mm() implementation provides an implicit memory barrier.
      Override it with a no-op on x86 which implicitly provide this memory
      barrier by writing to CR3.
      
      Fixes: 223baf9d ("sched: Fix performance regression introduced by mm_cid")
      Reported-by: default avatarlevi.yun <yeoreum.yun@arm.com>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
      Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for x86
      Cc: <stable@vger.kernel.org> # 6.4.x
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/r/20240415152114.59122-2-mathieu.desnoyers@efficios.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fce9f0f
    • Alan Stern's avatar
      fs: sysfs: Fix reference leak in sysfs_break_active_protection() · ac107356
      Alan Stern authored
      
      commit a90bca2228c0646fc29a72689d308e5fe03e6d78 upstream.
      
      The sysfs_break_active_protection() routine has an obvious reference
      leak in its error path.  If the call to kernfs_find_and_get() fails then
      kn will be NULL, so the companion sysfs_unbreak_active_protection()
      routine won't get called (and would only cause an access violation by
      trying to dereference kn->parent if it was called).  As a result, the
      reference to kobj acquired at the start of the function will never be
      released.
      
      Fix the leak by adding an explicit kobject_put() call when kn is NULL.
      
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Fixes: 2afc9166 ("scsi: sysfs: Introduce sysfs_{un,}break_active_protection()")
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/8a4d3f0f-c5e3-4b70-a188-0ca433f9e6f9@rowland.harvard.edu
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac107356
Loading