1. 17 Jul, 2019 1 commit
  2. 15 Jul, 2019 1 commit
  3. 11 Jul, 2019 1 commit
  4. 02 Jul, 2019 1 commit
  5. 19 Jun, 2019 1 commit
  6. 06 Jun, 2019 3 commits
    • Parav Pandit's avatar
      vfio/mdev: Synchronize device create/remove with parent removal · 5715c4dd
      Parav Pandit authored
      In following sequences, child devices created while removing mdev parent
      device can be left out, or it may lead to race of removing half
      initialized child mdev devices.
      
      issue-1:
      --------
             cpu-0                         cpu-1
             -----                         -----
                                        mdev_unregister_device()
                                          device_for_each_child()
                                            mdev_device_remove_cb()
                                              mdev_device_remove()
      create_store()
        mdev_device_create()                   [...]
          device_add()
                                        parent_remove_sysfs_files()
      
      /* BUG: device added by cpu-0
       * whose parent is getting removed
       * and it won't process this mdev.
       */
      
      issue-2:
      --------
      Below crash is observed when user initiated remove is in progress
      and mdev_unregister_driver() completes parent unregistration.
      
             cpu-0                         cpu-1
             -----                         -----
      remove_store()
         mdev_device_remove()
         active = false;
                                        mdev_unregister_device()
                                        parent device removed.
         [...]
         parents->ops->remove()
       /*
        * BUG: Accessing invalid parent.
        */
      
      This is similar race like create() racing with mdev_unregister_device().
      
      BUG: unable to handle kernel paging request at ffffffffc0585668
      PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
      Oops: 0000 [#1] SMP PTI
      CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
      Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
      RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
      Call Trace:
       remove_store+0x71/0x90 [mdev]
       kernfs_fop_write+0x113/0x1a0
       vfs_write+0xad/0x1b0
       ksys_write+0x5a/0xe0
       do_syscall_64+0x5a/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Therefore, mdev core is improved as below to overcome above issues.
      
      Wait for any ongoing mdev create() and remove() to finish before
      unregistering parent device.
      This continues to allow multiple create and remove to progress in
      parallel for different mdev devices as most common case.
      At the same time guard parent removal while parent is being accessed by
      create() and remove() callbacks.
      create()/remove() and unregister_device() are synchronized by the rwsem.
      
      Refactor device removal code to mdev_device_remove_common() to avoid
      acquiring unreg_sem of the parent.
      
      Fixes: 7b96953b ("vfio: Mediated device Core driver")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      5715c4dd
    • Parav Pandit's avatar
      vfio/mdev: Avoid creating sysfs remove file on stale device removal · 26c9e398
      Parav Pandit authored
      If device is removal is initiated by two threads as below, mdev core
      attempts to create a syfs remove file on stale device.
      During this flow, below [1] call trace is observed.
      
           cpu-0                                    cpu-1
           -----                                    -----
        mdev_unregister_device()
          device_for_each_child
             mdev_device_remove_cb
                mdev_device_remove
                                             user_syscall
                                               remove_store()
                                                 mdev_device_remove()
                                              [..]
         unregister device();
                                             /* not found in list or
                                              * active=false.
                                              */
                                                sysfs_create_file()
                                                ..Call trace
      
      Now that mdev core follows correct device removal sequence of the linux
      bus model, remove shouldn't fail in normal cases. If it fails, there is
      no point of creating a stale file or checking for specific error status.
      
      kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
      sysfs_create_file_ns+0x7f/0x90
      kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
      5.1.0-rc6-vdevbus+ #6
      kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
      08/09/2016
      kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
      kernel: Call Trace:
      kernel: remove_store+0xdc/0x100 [mdev]
      kernel: kernfs_fop_write+0x113/0x1a0
      kernel: vfs_write+0xad/0x1b0
      kernel: ksys_write+0x5a/0xe0
      kernel: do_syscall_64+0x5a/0x210
      kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      26c9e398
    • Parav Pandit's avatar
      vfio/mdev: Improve the create/remove sequence · 522ecce0
      Parav Pandit authored
      This patch addresses below two issues and prepares the code to address
      3rd issue listed below.
      
      1. mdev device is placed on the mdev bus before it is created in the
      vendor driver. Once a device is placed on the mdev bus without creating
      its supporting underlying vendor device, mdev driver's probe() gets
      triggered.  However there isn't a stable mdev available to work on.
      
         create_store()
           mdev_create_device()
             device_register()
                ...
               vfio_mdev_probe()
              [...]
              parent->ops->create()
                vfio_ap_mdev_create()
                  mdev_set_drvdata(mdev, matrix_mdev);
                  /* Valid pointer set above */
      
      Due to this way of initialization, mdev driver who wants to use the mdev,
      doesn't have a valid mdev to work on.
      
      2. Current creation sequence is,
         parent->ops_create()
         groups_register()
      
      Remove sequence is,
         parent->ops->remove()
         groups_unregister()
      
      However, remove sequence should be exact mirror of creation sequence.
      Once this is achieved, all users of the mdev will be terminated first
      before removing underlying vendor device.
      (Follow standard linux driver model).
      At that point vendor's remove() ops shouldn't fail because taking the
      device off the bus should terminate any usage.
      
      3. When remove operation fails, mdev sysfs removal attempts to add the
      file back on already removed device. Following call trace [1] is observed.
      
      [1] call trace:
      kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 sysfs_create_file_ns+0x7f/0x90
      kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
      kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
      kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
      kernel: Call Trace:
      kernel: remove_store+0xdc/0x100 [mdev]
      kernel: kernfs_fop_write+0x113/0x1a0
      kernel: vfs_write+0xad/0x1b0
      kernel: ksys_write+0x5a/0xe0
      kernel: do_syscall_64+0x5a/0x210
      kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Therefore, mdev core is improved in following ways.
      
      1. Split the device registration/deregistration sequence so that some
      things can be done between initialization of the device and hooking it
      up to the bus respectively after deregistering it from the bus but
      before giving up our final reference.
      In particular, this means invoking the ->create() and ->remove()
      callbacks in those new windows. This gives the vendor driver an
      initialized mdev device to work with during creation.
      At the same time, a bus driver who wish to bind to mdev driver also
      gets initialized mdev device.
      
      This follows standard Linux kernel bus and device model.
      
      2. During remove flow, first remove the device from the bus. This
      ensures that any bus specific devices are removed.
      Once device is taken off the mdev bus, invoke remove() of mdev
      from the vendor driver.
      
      3. The driver core device model provides way to register and auto
      unregister the device sysfs attribute groups at dev->groups.
      Make use of dev->groups to let core create the groups and eliminate
      code to avoid explicit groups creation and removal.
      
      To ensure, that new sequence is solid, a below stack dump of a
      process is taken who attempts to remove the device while device is in
      use by vfio driver and user application.
      This stack dump validates that vfio driver guards against such device
      removal when device is in use.
      
       cat /proc/21962/stack
      [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
      [<0>] mdev_remove+0x21/0x40 [mdev]
      [<0>] device_release_driver_internal+0xe8/0x1b0
      [<0>] bus_remove_device+0xf9/0x170
      [<0>] device_del+0x168/0x350
      [<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
      [<0>] mdev_device_remove+0x8c/0xd0 [mdev]
      [<0>] remove_store+0x71/0x90 [mdev]
      [<0>] kernfs_fop_write+0x113/0x1a0
      [<0>] vfs_write+0xad/0x1b0
      [<0>] ksys_write+0x5a/0xe0
      [<0>] do_syscall_64+0x5a/0x210
      [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [<0>] 0xffffffffffffffff
      
      This prepares the code to eliminate calling device_create_file() in
      subsequent patch.
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      522ecce0
  7. 30 May, 2019 2 commits
  8. 21 May, 2019 1 commit
  9. 14 May, 2019 2 commits
    • Ira Weiny's avatar
      mm/gup: change GUP fast to use flags rather than a write 'bool' · 73b0140b
      Ira Weiny authored
      To facilitate additional options to get_user_pages_fast() change the
      singular write parameter to be gup_flags.
      
      This patch does not change any functionality.  New functionality will
      follow in subsequent patches.
      
      Some of the get_user_pages_fast() call sites were unchanged because they
      already passed FOLL_WRITE or 0 for the write parameter.
      
      NOTE: It was suggested to change the ordering of the get_user_pages_fast()
      arguments to ensure that callers were converted.  This breaks the current
      GUP call site convention of having the returned pages be the final
      parameter.  So the suggestion was rejected.
      
      Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.comSigned-off-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarMike Marshall <hubcap@omnibond.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73b0140b
    • Ira Weiny's avatar
      mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM · 932f4a63
      Ira Weiny authored
      Pach series "Add FOLL_LONGTERM to GUP fast and use it".
      
      HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
      advantages.  These pages can be held for a significant time.  But
      get_user_pages_fast() does not protect against mapping FS DAX pages.
      
      Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
      retains the performance while also adding the FS DAX checks.  XDP has also
      shown interest in using this functionality.[1]
      
      In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
      and remove the specialized get_user_pages_longterm call.
      
      [1] https://lkml.org/lkml/2019/3/19/939
      
      "longterm" is a relative thing and at this point is probably a misnomer.
      This is really flagging a pin which is going to be given to hardware and
      can't move.  I've thought of a couple of alternative names but I think we
      have to settle on if we are going to use FL_LAYOUT or something else to
      solve the "longterm" problem.  Then I think we can change the flag to a
      better name.
      
      Secondly, it depends on how often you are registering memory.  I have
      spoken with some RDMA users who consider MR in the performance path...
      For the overall application performance.  I don't have the numbers as the
      tests for HFI1 were done a long time ago.  But there was a significant
      advantage.  Some of which is probably due to the fact that you don't have
      to hold mmap_sem.
      
      Finally, architecturally I think it would be good for everyone to use
      *_fast.  There are patches submitted to the RDMA list which would allow
      the use of *_fast (they reworking the use of mmap_sem) and as soon as they
      are accepted I'll submit a patch to convert the RDMA core as well.  Also
      to this point others are looking to use *_fast.
      
      As an aside, Jasons pointed out in my previous submission that *_fast and
      *_unlocked look very much the same.  I agree and I think further cleanup
      will be coming.  But I'm focused on getting the final solution for DAX at
      the moment.
      
      This patch (of 7):
      
      This patch starts a series which aims to support FOLL_LONGTERM in
      get_user_pages_fast().  Some callers who would like to do a longterm (user
      controlled pin) of pages with the fast variant of GUP for performance
      purposes.
      
      Rather than have a separate get_user_pages_longterm() call, introduce
      FOLL_LONGTERM and change the longterm callers to use it.
      
      This patch does not change any functionality.  In the short term
      "longterm" or user controlled pins are unsafe for Filesystems and FS DAX
      in particular has been blocked.  However, callers of get_user_pages_fast()
      were not "protected".
      
      FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
      requires vmas to determine if DAX is in use.
      
      NOTE: In merging with the CMA changes we opt to change the
      get_user_pages() call in check_and_migrate_cma_pages() to a call of
      __get_user_pages_locked() on the newly migrated pages.  This makes the
      code read better in that we are calling __get_user_pages_locked() on the
      pages before and after a potential migration.
      
      As a side affect some of the interfaces are cleaned up but this is not the
      primary purpose of the series.
      
      In review[1] it was asked:
      
      <quote>
      > This I don't get - if you do lock down long term mappings performance
      > of the actual get_user_pages call shouldn't matter to start with.
      >
      > What do I miss?
      
      A couple of points.
      
      First "longterm" is a relative thing and at this point is probably a
      misnomer.  This is really flagging a pin which is going to be given to
      hardware and can't move.  I've thought of a couple of alternative names
      but I think we have to settle on if we are going to use FL_LAYOUT or
      something else to solve the "longterm" problem.  Then I think we can
      change the flag to a better name.
      
      Second, It depends on how often you are registering memory.  I have spoken
      with some RDMA users who consider MR in the performance path...  For the
      overall application performance.  I don't have the numbers as the tests
      for HFI1 were done a long time ago.  But there was a significant
      advantage.  Some of which is probably due to the fact that you don't have
      to hold mmap_sem.
      
      Finally, architecturally I think it would be good for everyone to use
      *_fast.  There are patches submitted to the RDMA list which would allow
      the use of *_fast (they reworking the use of mmap_sem) and as soon as they
      are accepted I'll submit a patch to convert the RDMA core as well.  Also
      to this point others are looking to use *_fast.
      
      As an asside, Jasons pointed out in my previous submission that *_fast and
      *_unlocked look very much the same.  I agree and I think further cleanup
      will be coming.  But I'm focused on getting the final solution for DAX at
      the moment.
      
      </quote>
      
      [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965
      
      [ira.weiny@intel.com: v3]
        Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.comSigned-off-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      932f4a63
  10. 07 May, 2019 6 commits
  11. 01 May, 2019 1 commit
  12. 23 Apr, 2019 1 commit
    • Farhan Ali's avatar
      vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING" · 41be3e26
      Farhan Ali authored
      vfio_dev_present() which is the condition to
      wait_event_interruptible_timeout(), will call vfio_group_get_device
      and try to acquire the mutex group->device_lock.
      
      wait_event_interruptible_timeout() will set the state of the current
      task to TASK_INTERRUPTIBLE, before doing the condition check. This
      means that we will try to acquire the mutex while already in a
      sleeping state. The scheduler warns us by giving the following
      warning:
      
      [ 4050.264464] ------------[ cut here ]------------
      [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
      [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
      ....
      
       4050.264756] Call Trace:
      [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
      [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
      [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
      [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
      [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
      [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
      [ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
      [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
      [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
      [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
      [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
      [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
      [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
      [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
      [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
      [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
      [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
      [ 4050.264925] 4 locks held by sh/35924:
      [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
      [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
      [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
      [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
      [ 4050.264993] Last Breaking-Event-Address:
      [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
      [ 4050.265010] irq event stamp: 7039
      [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
      [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
      [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
      [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
      [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
      
      Let's fix this as described in the article
      https://lwn.net/Articles/628628/.
      Signed-off-by: default avatarFarhan Ali <alifm@linux.ibm.com>
      [remove now redundant vfio_dev_present()]
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      41be3e26
  13. 22 Apr, 2019 1 commit
  14. 19 Apr, 2019 1 commit
  15. 12 Apr, 2019 3 commits
  16. 03 Apr, 2019 3 commits
    • Alex Williamson's avatar
      vfio/type1: Limit DMA mappings per container · 49285593
      Alex Williamson authored
      Memory backed DMA mappings are accounted against a user's locked
      memory limit, including multiple mappings of the same memory.  This
      accounting bounds the number of such mappings that a user can create.
      However, DMA mappings that are not backed by memory, such as DMA
      mappings of device MMIO via mmaps, do not make use of page pinning
      and therefore do not count against the user's locked memory limit.
      These mappings still consume memory, but the memory is not well
      associated to the process for the purpose of oom killing a task.
      
      To add bounding on this use case, we introduce a limit to the total
      number of concurrent DMA mappings that a user is allowed to create.
      This limit is exposed as a tunable module option where the default
      value of 64K is expected to be well in excess of any reasonable use
      case (a large virtual machine configuration would typically only make
      use of tens of concurrent mappings).
      
      This fixes CVE-2019-3882.
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Tested-by: default avatarEric Auger <eric.auger@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      49285593
    • Wang Hai's avatar
      vfio/spapr_tce: Make symbol 'tce_iommu_driver_ops' static · e39dd513
      Wang Hai authored
      Fixes the following sparse warning:
      
      drivers/vfio/vfio_iommu_spapr_tce.c:1401:36: warning:
       symbol 'tce_iommu_driver_ops' was not declared. Should it be static?
      
      Fixes: 5ffd229c ("powerpc/vfio: Implement IOMMU driver for VFIO")
      Signed-off-by: default avatarWang Hai <wanghai26@huawei.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      e39dd513
    • Louis Taylor's avatar
      vfio/pci: use correct format characters · 426b046b
      Louis Taylor authored
      When compiling with -Wformat, clang emits the following warnings:
      
      drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                      ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                              ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                      ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                                 ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                      ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                              ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                      ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                                 ^~~~~~~~~
      The types of these arguments are unconditionally defined, so this patch
      updates the format character to the correct ones for unsigned ints.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: default avatarLouis Taylor <louis@kragniz.eu>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      426b046b
  17. 20 Mar, 2019 1 commit
  18. 18 Feb, 2019 2 commits
    • Eric Auger's avatar
      vfio_pci: Enable memory accesses before calling pci_map_rom · 0cfd027b
      Eric Auger authored
      pci_map_rom/pci_get_rom_size() performs memory access in the ROM.
      In case the Memory Space accesses were disabled, readw() is likely
      to trigger a synchronous external abort on some platforms.
      
      In case memory accesses were disabled, re-enable them before the
      call and disable them back again just after.
      
      Fixes: 89e1f7d4 ("vfio: Add PCI device driver")
      Signed-off-by: default avatarEric Auger <eric.auger@redhat.com>
      Suggested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      0cfd027b
    • Alex Williamson's avatar
      vfio/pci: Restore device state on PM transition · 51ef3a00
      Alex Williamson authored
      PCI core handles save and restore of device state around reset, but
      when using pci_set_power_state() we can unintentionally trigger a soft
      reset of the device, where PCI core only restores the BAR state.  If
      we're using vfio-pci's idle D3 support to try to put devices into low
      power when unused, this might trigger a reset when the device is woken
      for use.  Also power state management by the user, or within a guest,
      can put the device into D3 power state with potentially limited
      ability to restore the device if it should undergo a reset.  The PCI
      spec does not define the extent of a soft reset and many devices
      reporting soft reset on D3->D0 transition do not undergo a PCI config
      space reset.  It's therefore assumed safe to unconditionally restore
      the remainder of the state if the device indicates soft reset
      support, even on a user initiated wakeup.
      
      Implement a wrapper in vfio-pci to tag devices reporting PM reset
      support, save their state on transitions into D3 and restore on
      transitions back to D0.
      Reported-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      51ef3a00
  19. 13 Feb, 2019 1 commit
    • Alexey Kardashevskiy's avatar
      vfio/spapr_tce: Skip unsetting already unset table · a3906855
      Alexey Kardashevskiy authored
      VFIO TCE IOMMU v2 owns IOMMU tables. When we detach an IOMMU group from
      a container, we need to unset these tables from the group which we do by
      calling unset_window(). We also unset tables when removing a DMA window
      via the VFIO_IOMMU_SPAPR_TCE_REMOVE ioctl.
      
      The window removal checks if the table actually exists (hidden inside
      tce_iommu_find_table()) but the group detaching does not so the user
      may see duplicating messages:
      pci 0009:03     : [PE# fd] Removing DMA window #0
      pci 0009:03     : [PE# fd] Removing DMA window #1
      pci 0009:03     : [PE# fd] Removing DMA window #0
      pci 0009:03     : [PE# fd] Removing DMA window #1
      
      At the moment this is not a problem as the second invocation
      of unset_window() writes zeroes to the HW registers again and exits early
      as there is no table.
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      a3906855
  20. 12 Feb, 2019 1 commit
  21. 05 Feb, 2019 3 commits
  22. 23 Jan, 2019 1 commit
  23. 22 Jan, 2019 1 commit
    • Thomas Gleixner's avatar
      vfio/pci: Cleanup license mess · 33e5ee78
      Thomas Gleixner authored
      The recently added nvlink2 VFIO driver introduced a license conflict in two
      files. In both cases the SPDX license identifier is:
      
        SPDX-License-Identifier: GPL-2.0+
      
      but the files contain also the following license boiler plate text:
      
        * This program is free software; you can redistribute it and/or modify
        * it under the terms of the GNU General Public License version 2 as
        * published by the Free Software Foundation
      
      The latter is GPL-2.9-only and not GPL-2.0=.
      
      Looking deeper. The nvlink source file is derived from vfio_pci_igd.c which
      is also licensed under GPL-2.0-only and it can be assumed that the file was
      copied and modified. As the original file is licensed GPL-2.0-only it's not
      possible to relicense derivative work to GPL-2.0-or-later.
      
      Fix the SPDX identifier and remove the boiler plate as it is redundant.
      
      Fixes: 7f928917 ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      33e5ee78
  24. 08 Jan, 2019 1 commit