1. 09 Jun, 2019 1 commit
  2. 30 May, 2019 1 commit
  3. 13 May, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915: Seal races between async GPU cancellation, retirement and signaling · c36beba6
      Chris Wilson authored
      Currently there is an underlying assumption that i915_request_unsubmit()
      is synchronous wrt the GPU -- that is the request is no longer in flight
      as we remove it. In the near future that may change, and this may upset
      our signaling as we can process an interrupt for that request while it
      is no longer in flight.
      
      CPU0					CPU1
      intel_engine_breadcrumbs_irq
      (queue request completion)
      					i915_request_cancel_signaling
      ...					...
      					i915_request_enable_signaling
      dma_fence_signal
      
      Hence in the time it took us to drop the lock to signal the request, a
      preemption event may have occurred and re-queued the request. In the
      process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
      so reused the rq->signal_link that was in use on CPU0, leading to bad
      pointer chasing in intel_engine_breadcrumbs_irq.
      
      A related issue was that if someone started listening for a signal on a
      completed but no longer in-flight request, we missed the opportunity to
      immediately signal that request.
      
      Furthermore, as intel_contexts may be immediately released during
      request retirement, in order to be entirely sure that
      intel_engine_breadcrumbs_irq may no longer dereference the intel_context
      (ce->signals and ce->signal_link), we must wait for irq spinlock.
      
      In order to prevent the race, we use a bit in the fence.flags to signal
      the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
      For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
      quickly signals to any outside observer that the fence is indeed signaled.
      
      v2: Sketch out potential dma-fence API for manual signaling
      v3: And the test_and_set_bit()
      
      Fixes: 52c0fdb2 ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk
      (cherry picked from commit 0152b3b3)
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      c36beba6
  4. 08 May, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915: Seal races between async GPU cancellation, retirement and signaling · 0152b3b3
      Chris Wilson authored
      Currently there is an underlying assumption that i915_request_unsubmit()
      is synchronous wrt the GPU -- that is the request is no longer in flight
      as we remove it. In the near future that may change, and this may upset
      our signaling as we can process an interrupt for that request while it
      is no longer in flight.
      
      CPU0					CPU1
      intel_engine_breadcrumbs_irq
      (queue request completion)
      					i915_request_cancel_signaling
      ...					...
      					i915_request_enable_signaling
      dma_fence_signal
      
      Hence in the time it took us to drop the lock to signal the request, a
      preemption event may have occurred and re-queued the request. In the
      process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
      so reused the rq->signal_link that was in use on CPU0, leading to bad
      pointer chasing in intel_engine_breadcrumbs_irq.
      
      A related issue was that if someone started listening for a signal on a
      completed but no longer in-flight request, we missed the opportunity to
      immediately signal that request.
      
      Furthermore, as intel_contexts may be immediately released during
      request retirement, in order to be entirely sure that
      intel_engine_breadcrumbs_irq may no longer dereference the intel_context
      (ce->signals and ce->signal_link), we must wait for irq spinlock.
      
      In order to prevent the race, we use a bit in the fence.flags to signal
      the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
      For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
      quickly signals to any outside observer that the fence is indeed signaled.
      
      v2: Sketch out potential dma-fence API for manual signaling
      v3: And the test_and_set_bit()
      
      Fixes: 52c0fdb2 ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk
      0152b3b3
  5. 07 Dec, 2018 1 commit
  6. 03 Dec, 2018 1 commit
  7. 04 Jul, 2018 1 commit
  8. 03 Jul, 2018 1 commit
  9. 02 Jul, 2018 2 commits
    • Daniel Vetter's avatar
      dma-fence: Allow wait_any_timeout for all fences · 796422f2
      Daniel Vetter authored
      When this was introduced in
      
      commit a519435a
      Author: Christian König <christian.koenig@amd.com>
      Date:   Tue Oct 20 16:34:16 2015 +0200
      
          dma-buf/fence: add fence_wait_any_timeout function v2
      
      there was a restriction added that this only works if the dma-fence
      uses the dma_fence_default_wait hook. Which works for amdgpu, which is
      the only caller. Well, until you share some buffers with e.g. i915,
      then you get an -EINVAL.
      
      But there's really no reason for this, because all drivers must
      support callbacks. The special ->wait hook is only as an optimization;
      if the driver needs to create a worker thread for an active callback,
      then it can avoid to do that if it knows that there's a process
      context available already. So ->wait is just an optimization, just
      using the logic in dma_fence_default_wait() should work for all
      drivers.
      
      Let's remove this restriction.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180503142603.28513-4-daniel.vetter@ffwll.ch
      796422f2
    • Daniel Vetter's avatar
      dma-fence: Make ->enable_signaling optional · c701317a
      Daniel Vetter authored
      Many drivers have a trivial implementation for ->enable_signaling.
      Let's make it optional by assuming that signalling is already
      available when the callback isn't present.
      
      v2: Don't do the trick to set the ENABLE_SIGNAL_BIT
      unconditionally, it results in an expensive spinlock take for
      everyone. Instead just check if the callback is present. Suggested by
      Maarten.
      
      Also move misplaced kerneldoc hunk to the right patch.
      
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Link: https://patchwork.freedesktop.org/patch/msgid/20180504141034.27727-1-daniel.vetter@ffwll.ch
      c701317a
  10. 03 May, 2018 4 commits
  11. 29 Jan, 2018 1 commit
  12. 16 Oct, 2017 1 commit
  13. 26 Jul, 2017 1 commit
  14. 14 Jul, 2017 1 commit
    • Chris Wilson's avatar
      dma-buf/fence: Avoid use of uninitialised timestamp · 76250f2b
      Chris Wilson authored
      [  236.821534] WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff8802538683d0)
      [  236.828642] 420000001e7f0000000000000000000000080000000000000000000000000000
      [  236.839543]  i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
      [  236.850420]                                  ^
      [  236.854123] RIP: 0010:[<ffffffff81396f07>]  [<ffffffff81396f07>] fence_signal+0x17/0xd0
      [  236.861313] RSP: 0018:ffff88024acd7ba0  EFLAGS: 00010282
      [  236.865027] RAX: ffffffff812f6a90 RBX: ffff8802527ca800 RCX: ffff880252cb30e0
      [  236.868801] RDX: ffff88024ac5d918 RSI: ffff880252f780e0 RDI: ffff880253868380
      [  236.872579] RBP: ffff88024acd7bc0 R08: ffff88024acd7be0 R09: 0000000000000000
      [  236.876407] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880253868380
      [  236.880185] R13: ffff8802538684d0 R14: ffff880253868380 R15: ffff88024cd48e00
      [  236.883983] FS:  00007f1646d1a740(0000) GS:ffff88025d000000(0000) knlGS:0000000000000000
      [  236.890959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  236.894702] CR2: ffff880251360318 CR3: 000000024ad21000 CR4: 00000000001406f0
      [  236.898481]  [<ffffffff8130d1ad>] i915_gem_request_retire+0x1cd/0x230
      [  236.902439]  [<ffffffff8130e2b3>] i915_gem_request_alloc+0xa3/0x2f0
      [  236.906435]  [<ffffffff812fb1bd>] i915_gem_do_execbuffer.isra.41+0xb6d/0x18b0
      [  236.910434]  [<ffffffff812fc265>] i915_gem_execbuffer2+0x95/0x1e0
      [  236.914390]  [<ffffffff812ad625>] drm_ioctl+0x1e5/0x460
      [  236.918275]  [<ffffffff8110d4cf>] do_vfs_ioctl+0x8f/0x5c0
      [  236.922168]  [<ffffffff8110da3c>] SyS_ioctl+0x3c/0x70
      [  236.926090]  [<ffffffff814b7a5f>] entry_SYSCALL_64_fastpath+0x17/0x93
      [  236.930045]  [<ffffffffffffffff>] 0xffffffffffffffff
      
      We only set the timestamp before we mark the fence as signaled. It is
      done before to avoid observers having a window in which they may see the
      fence as complete but no timestamp. Having it does incur a potential for
      the timestamp to be written twice, and even for it to be corrupted if
      the u64 write is not atomic. Instead use a new bit to record the
      presence of the timestamp, and teach the readers to wait until it is set
      if the fence is complete. There still remains a race where the timestamp
      for the signaled fence may be shown before the fence is reported as
      signaled, but that's a pre-existing error.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Reported-by: default avatarRafael Antognolli <rafael.antognolli@intel.com>
      Signed-off-by: default avatarGustavo Padovan <gustavo.padovan@collabora.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170214124001.1930-1-chris@chris-wilson.co.uk
      76250f2b
  15. 27 Apr, 2017 1 commit
  16. 02 Mar, 2017 1 commit
  17. 16 Feb, 2017 1 commit
  18. 27 Jan, 2017 1 commit
  19. 09 Jan, 2017 3 commits
  20. 08 Nov, 2016 3 commits
  21. 25 Oct, 2016 2 commits
  22. 02 Jun, 2016 1 commit
  23. 30 Oct, 2015 1 commit
  24. 22 Jan, 2015 1 commit
  25. 03 Nov, 2014 1 commit
  26. 28 Aug, 2014 1 commit
  27. 08 Jul, 2014 2 commits
    • Maarten Lankhorst's avatar
      reservation: add suppport for read-only access using rcu · 3c3b177a
      Maarten Lankhorst authored
      This adds some extra functions to deal with rcu.
      
      reservation_object_get_fences_rcu() will obtain the list of shared
      and exclusive fences without obtaining the ww_mutex.
      
      reservation_object_wait_timeout_rcu() will wait on all fences of the
      reservation_object, without obtaining the ww_mutex.
      
      reservation_object_test_signaled_rcu() will test if all fences of the
      reservation_object are signaled without using the ww_mutex.
      
      reservation_object_get_excl and reservation_object_get_list require
      the reservation object to be held, updating requires
      write_seqcount_begin/end. If only the exclusive fence is needed,
      rcu_dereference followed by fence_get_rcu can be used, if the shared
      fences are needed it's recommended to use the supplied functions.
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Acked-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: default avatarDaniel Vetter <daniel@ffwll.ch>
      Reviewed-By: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c3b177a
    • Maarten Lankhorst's avatar
      fence: dma-buf cross-device synchronization (v18) · e941759c
      Maarten Lankhorst authored
      A fence can be attached to a buffer which is being filled or consumed
      by hw, to allow userspace to pass the buffer without waiting to another
      device.  For example, userspace can call page_flip ioctl to display the
      next frame of graphics after kicking the GPU but while the GPU is still
      rendering.  The display device sharing the buffer with the GPU would
      attach a callback to get notified when the GPU's rendering-complete IRQ
      fires, to update the scan-out address of the display, without having to
      wake up userspace.
      
      A driver must allocate a fence context for each execution ring that can
      run in parallel. The function for this takes an argument with how many
      contexts to allocate:
        + fence_context_alloc()
      
      A fence is transient, one-shot deal.  It is allocated and attached
      to one or more dma-buf's.  When the one that attached it is done, with
      the pending operation, it can signal the fence:
        + fence_signal()
      
      To have a rough approximation whether a fence is fired, call:
        + fence_is_signaled()
      
      The dma-buf-mgr handles tracking, and waiting on, the fences associated
      with a dma-buf.
      
      The one pending on the fence can add an async callback:
        + fence_add_callback()
      
      The callback can optionally be cancelled with:
        + fence_remove_callback()
      
      To wait synchronously, optionally with a timeout:
        + fence_wait()
        + fence_wait_timeout()
      
      When emitting a fence, call:
        + trace_fence_emit()
      
      To annotate that a fence is blocking on another fence, call:
        + trace_fence_annotate_wait_on(fence, on_fence)
      
      A default software-only implementation is provided, which can be used
      by drivers attaching a fence to a buffer when they have no other means
      for hw sync.  But a memory backed fence is also envisioned, because it
      is common that GPU's can write to, or poll on some memory location for
      synchronization.  For example:
      
        fence = custom_get_fence(...);
        if ((seqno_fence = to_seqno_fence(fence)) != NULL) {
          dma_buf *fence_buf = seqno_fence->sync_buf;
          get_dma_buf(fence_buf);
      
          ... tell the hw the memory location to wait ...
          custom_wait_on(fence_buf, seqno_fence->seqno_ofs, fence->seqno);
        } else {
          /* fall-back to sw sync * /
          fence_add_callback(fence, my_cb);
        }
      
      On SoC platforms, if some other hw mechanism is provided for synchronizing
      between IP blocks, it could be supported as an alternate implementation
      with it's own fence ops in a similar way.
      
      enable_signaling callback is used to provide sw signaling in case a cpu
      waiter is requested or no compatible hardware signaling could be used.
      
      The intention is to provide a userspace interface (presumably via eventfd)
      later, to be used in conjunction with dma-buf's mmap support for sw access
      to buffers (or for userspace apps that would prefer to do their own
      synchronization).
      
      v1: Original
      v2: After discussion w/ danvet and mlankhorst on #dri-devel, we decided
          that dma-fence didn't need to care about the sw->hw signaling path
          (it can be handled same as sw->sw case), and therefore the fence->ops
          can be simplified and more handled in the core.  So remove the signal,
          add_callback, cancel_callback, and wait ops, and replace with a simple
          enable_signaling() op which can be used to inform a fence supporting
          hw->hw signaling that one or more devices which do not support hw
          signaling are waiting (and therefore it should enable an irq or do
          whatever is necessary in order that the CPU is notified when the
          fence is passed).
      v3: Fix locking fail in attach_fence() and get_fence()
      v4: Remove tie-in w/ dma-buf..  after discussion w/ danvet and mlankorst
          we decided that we need to be able to attach one fence to N dma-buf's,
          so using the list_head in dma-fence struct would be problematic.
      v5: [ Maarten Lankhorst ] Updated for dma-bikeshed-fence and dma-buf-manager.
      v6: [ Maarten Lankhorst ] I removed dma_fence_cancel_callback and some comments
          about checking if fence fired or not. This is broken by design.
          waitqueue_active during destruction is now fatal, since the signaller
          should be holding a reference in enable_signalling until it signalled
          the fence. Pass the original dma_fence_cb along, and call __remove_wait
          in the dma_fence_callback handler, so that no cleanup needs to be
          performed.
      v7: [ Maarten Lankhorst ] Set cb->func and only enable sw signaling if
          fence wasn't signaled yet, for example for hardware fences that may
          choose to signal blindly.
      v8: [ Maarten Lankhorst ] Tons of tiny fixes, moved __dma_fence_init to
          header and fixed include mess. dma-fence.h now includes dma-buf.h
          All members are now initialized, so kmalloc can be used for
          allocating a dma-fence. More documentation added.
      v9: Change compiler bitfields to flags, change return type of
          enable_signaling to bool. Rework dma_fence_wait. Added
          dma_fence_is_signaled and dma_fence_wait_timeout.
          s/dma// and change exports to non GPL. Added fence_is_signaled and
          fence_enable_sw_signaling calls, add ability to override default
          wait operation.
      v10: remove event_queue, use a custom list, export try_to_wake_up from
          scheduler. Remove fence lock and use a global spinlock instead,
          this should hopefully remove all the locking headaches I was having
          on trying to implement this. enable_signaling is called with this
          lock held.
      v11:
          Use atomic ops for flags, lifting the need for some spin_lock_irqsaves.
          However I kept the guarantee that after fence_signal returns, it is
          guaranteed that enable_signaling has either been called to completion,
          or will not be called any more.
      
          Add contexts and seqno to base fence implementation. This allows you
          to wait for less fences, by testing for seqno + signaled, and then only
          wait on the later fence.
      
          Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier.
          An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE
          spam, and another runtime option can turn it off at runtime.
      v12:
          Add CONFIG_FENCE_TRACE. Add missing documentation for the fence->context
          and fence->seqno members.
      v13:
          Fixup CONFIG_FENCE_TRACE kconfig description.
          Move fence_context_alloc to fence.
          Simplify fence_later.
          Kill priv member to fence_cb.
      v14:
          Remove priv argument from fence_add_callback, oops!
      v15:
          Remove priv from documentation.
          Explicitly include linux/atomic.h.
      v16:
          Add trace events.
          Import changes required by android syncpoints.
      v17:
          Use wake_up_state instead of try_to_wake_up. (Colin Cross)
          Fix up commit description for seqno_fence. (Rob Clark)
      v18:
          Rename release_fence to fence_release.
          Move to drivers/dma-buf/.
          Rename __fence_is_signaled and __fence_signal to *_locked.
          Rename __fence_init to fence_init.
          Make fence_default_wait return a signed long, and fix wait ops too.
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Signed-off-by: Thierry Reding <thierry.reding@gmail.com> #use smp_mb__before_atomic()
      Acked-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: default avatarDaniel Vetter <daniel@ffwll.ch>
      Reviewed-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e941759c