1. 19 Jun, 2019 1 commit
  2. 23 Apr, 2019 1 commit
    • Farhan Ali's avatar
      vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING" · 41be3e26
      Farhan Ali authored
      vfio_dev_present() which is the condition to
      wait_event_interruptible_timeout(), will call vfio_group_get_device
      and try to acquire the mutex group->device_lock.
      
      wait_event_interruptible_timeout() will set the state of the current
      task to TASK_INTERRUPTIBLE, before doing the condition check. This
      means that we will try to acquire the mutex while already in a
      sleeping state. The scheduler warns us by giving the following
      warning:
      
      [ 4050.264464] ------------[ cut here ]------------
      [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
      [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
      ....
      
       4050.264756] Call Trace:
      [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
      [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
      [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
      [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
      [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
      [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
      [ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
      [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
      [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
      [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
      [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
      [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
      [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
      [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
      [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
      [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
      [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
      [ 4050.264925] 4 locks held by sh/35924:
      [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
      [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
      [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
      [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
      [ 4050.264993] Last Breaking-Event-Address:
      [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
      [ 4050.265010] irq event stamp: 7039
      [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
      [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
      [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
      [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
      [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
      
      Let's fix this as described in the article
      https://lwn.net/Articles/628628/.
      Signed-off-by: default avatarFarhan Ali <alifm@linux.ibm.com>
      [remove now redundant vfio_dev_present()]
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      41be3e26
  3. 22 Apr, 2019 1 commit
  4. 12 Feb, 2019 1 commit
  5. 08 Jun, 2018 1 commit
  6. 20 Dec, 2017 1 commit
  7. 25 Oct, 2017 1 commit
    • Mark Rutland's avatar
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland authored
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6aa7de05
  8. 30 Aug, 2017 2 commits
    • Alex Williamson's avatar
      vfio: Stall vfio_del_group_dev() for container group detach · 6586b561
      Alex Williamson authored
      When the user unbinds the last device of a group from a vfio bus
      driver, the devices within that group should be available for other
      purposes.  We currently have a race that makes this generally, but
      not always true.  The device can be unbound from the vfio bus driver,
      but remaining IOMMU context of the group attached to the container
      can result in errors as the next driver configures DMA for the device.
      
      Wait for the group to be detached from the IOMMU backend before
      allowing the bus driver remove callback to complete.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      6586b561
    • Eric Auger's avatar
      vfio: fix noiommu vfio_iommu_group_get reference count · d935ad91
      Eric Auger authored
      In vfio_iommu_group_get() we want to increase the reference
      count of the iommu group.
      
      In noiommu case, the group does not exist and is allocated.
      iommu_group_add_device() increases the group ref count. However we
      then call iommu_group_put() which decrements it.
      
      This leads to a "refcount_t: underflow WARN_ON".
      
      Only decrement the ref count in case of iommu_group_add_device
      failure.
      Signed-off-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      d935ad91
  9. 07 Jul, 2017 1 commit
    • Alex Williamson's avatar
      vfio: Remove unnecessary uses of vfio_container.group_lock · 7f56c30b
      Alex Williamson authored
      The original intent of vfio_container.group_lock is to protect
      vfio_container.group_list, however over time it's become a crutch to
      prevent changes in container composition any time we call into the
      iommu driver backend.  This introduces problems when we start to have
      more complex interactions, for example when a user's DMA unmap request
      triggers a notification to an mdev vendor driver, who responds by
      attempting to unpin mappings within that request, re-entering the
      iommu backend.  We incorrectly assume that the use of read-locks here
      allow for this nested locking behavior, but a poorly timed write-lock
      could in fact trigger a deadlock.
      
      The current use of group_lock seems to fall into the trap of locking
      code, not data.  Correct that by removing uses of group_lock that are
      not directly related to group_list.  Note that the vfio type1 iommu
      backend has its own mutex, vfio_iommu.lock, which it uses to protect
      itself for each of these interfaces anyway.  The group_lock appears to
      be a redundancy for these interfaces and type1 even goes so far as to
      release its mutex to allow for exactly the re-entrant code path above.
      Reported-by: default avatarChuanxiao Dong <chuanxiao.dong@intel.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Acked-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Cc: stable@vger.kernel.org # v4.10+
      7f56c30b
  10. 28 Jun, 2017 2 commits
  11. 13 Jun, 2017 1 commit
  12. 21 Mar, 2017 1 commit
    • Alex Williamson's avatar
      vfio: Rework group release notifier warning · 65b1adeb
      Alex Williamson authored
      The intent of the original warning is make sure that the mdev vendor
      driver has removed any group notifiers at the point where the group
      is closed by the user.  Theoretically this would be through an
      orderly shutdown where any devices are release prior to the group
      release.  We can't always count on an orderly shutdown, the user can
      close the group before the notifier can be removed or the user task
      might be killed.  We'd like to add this sanity test when the group is
      idle and the only references are from the devices within the group
      themselves, but we don't have a good way to do that.  Instead check
      both when the group itself is removed and when the group is opened.
      A bit later than we'd prefer, but better than the current over
      aggressive approach.
      
      Fixes: ccd46dba ("vfio: support notifier chain in vfio_group")
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: <stable@vger.kernel.org> # v4.10
      Cc: Jike Song <jike.song@intel.com>
      65b1adeb
  13. 22 Feb, 2017 1 commit
  14. 09 Feb, 2017 1 commit
  15. 01 Dec, 2016 3 commits
  16. 21 Nov, 2016 1 commit
  17. 17 Nov, 2016 6 commits
  18. 14 Jul, 2016 1 commit
  19. 22 Feb, 2016 2 commits
    • Alex Williamson's avatar
      vfio: Add capability chain helpers · d7a8d5ed
      Alex Williamson authored
      Allow sub-modules to easily reallocate a buffer for managing
      capability chains for info ioctls.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      d7a8d5ed
    • Alex Williamson's avatar
      vfio: If an IOMMU backend fails, keep looking · 7c435b46
      Alex Williamson authored
      Consider an IOMMU to be an API rather than an implementation, we might
      have multiple implementations supporting the same API, so try another
      if one fails.  The expectation here is that we'll really only have
      one implementation per device type.  For instance the existing type1
      driver works with any PCI device where the IOMMU API is available.  A
      vGPU vendor may have a virtual PCI device which provides DMA isolation
      and mapping through other mechanisms, but can re-use userspaces that
      make use of the type1 VFIO IOMMU API.  This allows that to work.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      7c435b46
  20. 27 Jan, 2016 1 commit
  21. 21 Dec, 2015 1 commit
    • Alex Williamson's avatar
      vfio: Include No-IOMMU mode · 03a76b60
      Alex Williamson authored
      There is really no way to safely give a user full access to a DMA
      capable device without an IOMMU to protect the host system.  There is
      also no way to provide DMA translation, for use cases such as device
      assignment to virtual machines.  However, there are still those users
      that want userspace drivers even under those conditions.  The UIO
      driver exists for this use case, but does not provide the degree of
      device access and programming that VFIO has.  In an effort to avoid
      code duplication, this introduces a No-IOMMU mode for VFIO.
      
      This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
      the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
      should make it very clear that this mode is not safe.  Additionally,
      CAP_SYS_RAWIO privileges are necessary to work with groups and
      containers using this mode.  Groups making use of this support are
      named /dev/vfio/noiommu-$GROUP and can only make use of the special
      VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
      binding a device without a native IOMMU group to a VFIO bus driver
      will taint the kernel and should therefore not be considered
      supported.  This patch includes no-iommu support for the vfio-pci bus
      driver only.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      03a76b60
  22. 04 Dec, 2015 1 commit
  23. 21 Nov, 2015 1 commit
  24. 04 Nov, 2015 2 commits
    • Alex Williamson's avatar
      vfio: Include No-IOMMU mode · 033291ec
      Alex Williamson authored
      There is really no way to safely give a user full access to a DMA
      capable device without an IOMMU to protect the host system.  There is
      also no way to provide DMA translation, for use cases such as device
      assignment to virtual machines.  However, there are still those users
      that want userspace drivers even under those conditions.  The UIO
      driver exists for this use case, but does not provide the degree of
      device access and programming that VFIO has.  In an effort to avoid
      code duplication, this introduces a No-IOMMU mode for VFIO.
      
      This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
      the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
      should make it very clear that this mode is not safe.  Additionally,
      CAP_SYS_RAWIO privileges are necessary to work with groups and
      containers using this mode.  Groups making use of this support are
      named /dev/vfio/noiommu-$GROUP and can only make use of the special
      VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
      binding a device without a native IOMMU group to a VFIO bus driver
      will taint the kernel and should therefore not be considered
      supported.  This patch includes no-iommu support for the vfio-pci bus
      driver only.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      033291ec
    • Joerg Roedel's avatar
      vfio: Fix bug in vfio_device_get_from_name() · e324fc82
      Joerg Roedel authored
      The vfio_device_get_from_name() function might return a
      non-NULL pointer, when called with a device name that is not
      found in the list. This causes undefined behavior, in my
      case calling an invalid function pointer later on:
      
       kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
       BUG: unable to handle kernel paging request at ffff8800cb3ddc08
      
      [...]
      
       Call Trace:
        [<ffffffffa03bd733>] ? vfio_group_fops_unl_ioctl+0x253/0x410 [vfio]
        [<ffffffff811efc4d>] do_vfs_ioctl+0x2cd/0x4c0
        [<ffffffff811f9657>] ? __fget+0x77/0xb0
        [<ffffffff811efeb9>] SyS_ioctl+0x79/0x90
        [<ffffffff81001bb0>] ? syscall_return_slowpath+0x50/0x130
        [<ffffffff8167f776>] entry_SYSCALL_64_fastpath+0x16/0x75
      
      Fix the issue by returning NULL when there is no device with
      the requested name in the list.
      
      Cc: stable@vger.kernel.org # v4.2+
      Fixes: 4bc94d5d ("vfio: Fix lockdep issue")
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      e324fc82
  25. 27 Oct, 2015 1 commit
  26. 24 Jul, 2015 1 commit
    • Alex Williamson's avatar
      vfio: Fix lockdep issue · 4bc94d5d
      Alex Williamson authored
      When we open a device file descriptor, we currently have the
      following:
      
      vfio_group_get_device_fd()
        mutex_lock(&group->device_lock);
          open()
          ...
          if (ret)
            release()
      
      If we hit that error case, we call the backend driver release path,
      which for vfio-pci looks like this:
      
      vfio_pci_release()
        vfio_pci_disable()
          vfio_pci_try_bus_reset()
            vfio_pci_get_devs()
              vfio_device_get_from_dev()
                vfio_group_get_device()
                  mutex_lock(&group->device_lock);
      
      Whoops, we've stumbled back onto group.device_lock and created a
      deadlock.  There's a low likelihood of ever seeing this play out, but
      obviously it needs to be fixed.  To do that we can use a reference to
      the vfio_device for vfio_group_get_device_fd() rather than holding the
      lock.  There was a loop in this function, theoretically allowing
      multiple devices with the same name, but in practice we don't expect
      such a thing to happen and the code is already aborting from the loop
      with break on any sort of error rather than continuing and only
      parsing the first match anyway, so the loop was effectively unused
      already.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Fixes: 20f30017 ("vfio/pci: Fix racy vfio_device_get_from_dev() call")
      Reported-by: default avatarJoerg Roedel <joro@8bytes.org>
      Tested-by: default avatarJoerg Roedel <jroedel@suse.de>
      4bc94d5d
  27. 09 Jun, 2015 1 commit
    • Alex Williamson's avatar
      vfio/pci: Fix racy vfio_device_get_from_dev() call · 20f30017
      Alex Williamson authored
      Testing the driver for a PCI device is racy, it can be all but
      complete in the release path and still report the driver as ours.
      Therefore we can't trust drvdata to be valid.  This race can sometimes
      be seen when one port of a multifunction device is being unbound from
      the vfio-pci driver while another function is being released by the
      user and attempting a bus reset.  The device in the remove path is
      found as a dependent device for the bus reset of the release path
      device, the driver is still set to vfio-pci, but the drvdata has
      already been cleared, resulting in a null pointer dereference.
      
      To resolve this, fix vfio_device_get_from_dev() to not take the
      dev_get_drvdata() shortcut and instead traverse through the
      iommu_group, vfio_group, vfio_device path to get a reference we
      can trust.  Once we have that reference, we know the device isn't
      in transition and we can test to make sure the driver is still what
      we expect, so that we don't interfere with devices we don't own.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      20f30017
  28. 01 May, 2015 1 commit
    • Alex Williamson's avatar
      vfio: Fix runaway interruptible timeout · db7d4d7f
      Alex Williamson authored
      Commit 13060b64 ("vfio: Add and use device request op for vfio
      bus drivers") incorrectly makes use of an interruptible timeout.
      When interrupted, the signal remains pending resulting in subsequent
      timeouts occurring instantly.  This makes the loop spin at a much
      higher rate than intended.
      
      Instead of making this completely non-interruptible, we can change
      this into a sort of interruptible-once behavior and use the "once"
      to log debug information.  The driver API doesn't allow us to abort
      and return an error code.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Fixes: 13060b64
      Cc: stable@vger.kernel.org # v4.0
      db7d4d7f
  29. 17 Mar, 2015 1 commit
    • Alex Williamson's avatar
      vfio: Split virqfd into a separate module for vfio bus drivers · 71be3423
      Alex Williamson authored
      An unintended consequence of commit 42ac9bd1 ("vfio: initialize
      the virqfd workqueue in VFIO generic code") is that the vfio module
      is renamed to vfio_core so that it can include both vfio and virqfd.
      That's a user visible change that may break module loading scritps
      and it imposes eventfd support as a dependency on the core vfio code,
      which it's really not.  virqfd is intended to be provided as a service
      to vfio bus drivers, so instead of wrapping it into vfio.ko, we can
      make it a stand-alone module toggled by vfio bus drivers.  This has
      the additional benefit of removing initialization and exit from the
      core vfio code.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      71be3423