1. 17 Jan, 2019 1 commit
    • David Howells's avatar
      afs: Fix race in async call refcounting · 34fa4761
      David Howells authored
      There's a race between afs_make_call() and afs_wake_up_async_call() in the
      case that an error is returned from rxrpc_kernel_send_data() after it has
      queued the final packet.
      
      afs_make_call() will try and clean up the mess, but the call state may have
      been moved on thereby causing afs_process_async_call() to also try and to
      delete the call.
      
      Fix this by:
      
       (1) Getting an extra ref for an asynchronous call for the call itself to
           hold.  This makes sure the call doesn't evaporate on us accidentally
           and will allow the call to be retained by the caller in a future
           patch.  The ref is released on leaving afs_make_call() or
           afs_wait_for_call_to_complete().
      
       (2) In the event of an error from rxrpc_kernel_send_data():
      
           (a) Don't set the call state to AFS_CALL_COMPLETE until *after* the
           	 call has been aborted and ended.  This prevents
           	 afs_deliver_to_call() from doing anything with any notifications
           	 it gets.
      
           (b) Explicitly end the call immediately to prevent further callbacks.
      
           (c) Cancel any queued async_work and wait for the work if it's
           	 executing.  This allows us to be sure the race won't recur when we
           	 change the state.  We put the work queue's ref on the call if we
           	 managed to cancel it.
      
           (d) Put the call's ref that we got in (1).  This belongs to us as long
           	 as the call is in state AFS_CALL_CL_REQUESTING.
      
      Fixes: 341f741f ("afs: Refcount the afs_call struct")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      34fa4761
  2. 16 Jan, 2019 1 commit
  3. 08 Jan, 2019 1 commit
  4. 07 Jan, 2019 3 commits
  5. 02 Jan, 2019 6 commits
  6. 28 Dec, 2018 1 commit
    • Vasily Averin's avatar
      sunrpc: use-after-free in svc_process_common() · d4b09acf
      Vasily Averin authored
      if node have NFSv41+ mounts inside several net namespaces
      it can lead to use-after-free in svc_process_common()
      
      svc_process_common()
              /* Setup reply header */
              rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
      
      svc_process_common() can use incorrect rqstp->rq_xprt,
      its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
      The problem is that serv is global structure but sv_bc_xprt
      is assigned per-netnamespace.
      
      According to Trond, the whole "let's set up rqstp->rq_xprt
      for the back channel" is nothing but a giant hack in order
      to work around the fact that svc_process_common() uses it
      to find the xpt_ops, and perform a couple of (meaningless
      for the back channel) tests of xpt_flags.
      
      All we really need in svc_process_common() is to be able to run
      rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
      
      Bruce J Fields points that this xpo_prep_reply_hdr() call
      is an awfully roundabout way just to do "svc_putnl(resv, 0);"
      in the tcp case.
      
      This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
      now it calls svc_process_common() with rqstp->rq_xprt = NULL.
      
      To adjust reply header svc_process_common() just check
      rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
      
      To handle rqstp->rq_xprt = NULL case in functions called from
      svc_process_common() patch intruduces net namespace pointer
      svc_rqst->rq_bc_net and adjust SVC_NET() definition.
      Some other function was also adopted to properly handle described case.
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Cc: stable@vger.kernel.org
      Fixes: 23c20ecd ("NFS: callback up - users counting cleanup")
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      d4b09acf
  7. 21 Dec, 2018 1 commit
  8. 19 Dec, 2018 1 commit
    • Theodore Ts'o's avatar
      ext4: force inode writes when nfsd calls commit_metadata() · fde87268
      Theodore Ts'o authored
      Some time back, nfsd switched from calling vfs_fsync() to using a new
      commit_metadata() hook in export_operations().  If the file system did
      not provide a commit_metadata() hook, it fell back to using
      sync_inode_metadata().  Unfortunately doesn't work on all file
      systems.  In particular, it doesn't work on ext4 due to how the inode
      gets journalled --- the VFS writeback code will not always call
      ext4_write_inode().
      
      So we need to provide our own ext4_nfs_commit_metdata() method which
      calls ext4_write_inode() directly.
      
      Google-Bug-Id: 121195940
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      fde87268
  9. 17 Dec, 2018 2 commits
  10. 13 Dec, 2018 1 commit
  11. 30 Nov, 2018 2 commits
    • Geneviève Bastien's avatar
      net: Add trace events for all receive exit points · b0e3f1bd
      Geneviève Bastien authored
      Trace events are already present for the receive entry points, to indicate
      how the reception entered the stack.
      
      This patch adds the corresponding exit trace events that will bound the
      reception such that all events occurring between the entry and the exit
      can be considered as part of the reception context. This greatly helps
      for dependency and root cause analyses.
      
      Without this, it is not possible with tracepoint instrumentation to
      determine whether a sched_wakeup event following a netif_receive_skb
      event is the result of the packet reception or a simple coincidence after
      further processing by the thread. It is possible using other mechanisms
      like kretprobes, but considering the "entry" points are already present,
      it would be good to add the matching exit events.
      
      In addition to linking packets with wakeups, the entry/exit event pair
      can also be used to perform network stack latency analyses.
      Signed-off-by: default avatarGeneviève Bastien <gbastien@versatic.net>
      CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (tracing side)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0e3f1bd
    • NeilBrown's avatar
      fs/locks: rename some lists and pointers. · ada5c1da
      NeilBrown authored
      struct file lock contains an 'fl_next' pointer which
      is used to point to the lock that this request is blocked
      waiting for.  So rename it to fl_blocker.
      
      The fl_blocked list_head in an active lock is the head of a list of
      blocked requests.  In a request it is a node in that list.
      These are two distinct uses, so replace with two list_heads
      with different names.
      fl_blocked_requests is the head of a list of blocked requests
      fl_blocked_member is a node in a member of that list.
      
      The two different list_heads are never used at the same time, but that
      will change in a future patch.
      
      Note that a tracepoint is changed to report fl_blocker instead
      of fl_next.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Reviewed-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      ada5c1da
  12. 28 Nov, 2018 1 commit
  13. 15 Nov, 2018 2 commits
    • Jiri Pirko's avatar
      lib: introduce initial implementation of object aggregation manager · 0a020d41
      Jiri Pirko authored
      This lib tracks objects which could be of two types:
      1) root object
      2) nested object - with a "delta" which differentiates it from
                         the associated root object
      The objects are tracked by a hashtable and reference-counted. User is
      responsible of implementing callbacks to create/destroy root entity
      related to each root object and callback to create/destroy nested object
      delta.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a020d41
    • David Howells's avatar
      rxrpc: Fix life check · 7150ceaa
      David Howells authored
      The life-checking function, which is used by kAFS to make sure that a call
      is still live in the event of a pending signal, only samples the received
      packet serial number counter; it doesn't actually provoke a change in the
      counter, rather relying on the server to happen to give us a packet in the
      time window.
      
      Fix this by adding a function to force a ping to be transmitted.
      
      kAFS then keeps track of whether there's been a stall, and if so, uses the
      new function to ping the server, resetting the timeout to allow the reply
      to come back.
      
      If there's a stall, a ping and the call is *still* stalled in the same
      place after another period, then the call will be aborted.
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Fixes: f4d15fb6 ("rxrpc: Provide functions for allowing cleaner handling of signals")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7150ceaa
  14. 12 Nov, 2018 1 commit
  15. 26 Oct, 2018 1 commit
    • Johannes Weiner's avatar
      mm: workingset: tell cache transitions from workingset thrashing · 1899ad18
      Johannes Weiner authored
      Refaults happen during transitions between workingsets as well as in-place
      thrashing.  Knowing the difference between the two has a range of
      applications, including measuring the impact of memory shortage on the
      system performance, as well as the ability to smarter balance pressure
      between the filesystem cache and the swap-backed workingset.
      
      During workingset transitions, inactive cache refaults and pushes out
      established active cache.  When that active cache isn't stale, however,
      and also ends up refaulting, that's bonafide thrashing.
      
      Introduce a new page flag that tells on eviction whether the page has been
      active or not in its lifetime.  This bit is then stored in the shadow
      entry, to classify refaults as transitioning or thrashing.
      
      How many page->flags does this leave us with on 32-bit?
      
      	20 bits are always page flags
      
      	21 if you have an MMU
      
      	23 with the zone bits for DMA, Normal, HighMem, Movable
      
      	29 with the sparsemem section bits
      
      	30 if PAE is enabled
      
      	31 with this patch.
      
      So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes.  If
      that's not enough, the system can switch to discontigmem and re-gain the 6
      or 7 sparsemem section bits.
      
      Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarDaniel Drake <drake@endlessm.com>
      Tested-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <jweiner@fb.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Enderborg <peter.enderborg@sony.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1899ad18
  16. 25 Oct, 2018 1 commit
    • Christoph Hellwig's avatar
      block: add a report_zones method · e76239a3
      Christoph Hellwig authored
      Dispatching a report zones command through the request queue is a major
      pain due to the command reply payload rewriting necessary. Given that
      blkdev_report_zones() is executing everything synchronously, implement
      report zones as a block device file operation instead, allowing major
      simplification of the code in many places.
      
      sd, null-blk, dm-linear and dm-flakey being the only block device
      drivers supporting exposing zoned block devices, these drivers are
      modified to provide the device side implementation of the
      report_zones() block device file operation.
      
      For device mappers, a new report_zones() target type operation is
      defined so that the upper block layer calls blkdev_report_zones() can
      be propagated down to the underlying devices of the dm targets.
      Implementation for this new operation is added to the dm-linear and
      dm-flakey targets.
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      [Damien]
      * Changed method block_device argument to gendisk
      * Various bug fixes and improvements
      * Added support for null_blk, dm-linear and dm-flakey.
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e76239a3
  17. 23 Oct, 2018 6 commits
  18. 15 Oct, 2018 2 commits
  19. 12 Oct, 2018 1 commit
    • Nicolin Chen's avatar
      hwmon: (core) Add trace events to _attr_show/store functions · 61b8ab2c
      Nicolin Chen authored
      Trace events are useful for people who collect data from the
      Ftrace outputs. There're people who analyse the relationship
      of cpufreq, thermal and hwmon (power/voltage/current) using
      the convenient and timestamped Ftrace outputs, while unlike
      cpufreq and thermal subsystems the hwmon does not have trace
      events supported yet.
      
      So this patch adds initial trace events for the hwmon core.
      To call hwmon_attr_base() for aligned attr index numbers, it
      also moves the function upward.
      
      Ftrace outputs:
       ...: hwmon_attr_show_string: index=2, attr_name=in2_label, val=VDD_5V
       ...: hwmon_attr_show: index=2, attr_name=in2_input, val=5112
       ...: hwmon_attr_show: index=2, attr_name=curr2_input, val=440
      
      Note that the _attr_show and _attr_store functions are tied
      to the _with_info API. So a hwmon driver requiring the trace
      events feature should use _with_info API to register a hwmon
      device.
      Signed-off-by: default avatarNicolin Chen <nicoleotsuka@gmail.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      61b8ab2c
  20. 08 Oct, 2018 1 commit
  21. 03 Oct, 2018 3 commits
    • Chuck Lever's avatar
      xprtrdma: Squelch a sparse warning · 470443e0
      Chuck Lever authored
      linux/include/trace/events/rpcrdma.h:501:1: warning: expression using sizeof bool
      linux/include/trace/events/rpcrdma.h:501:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
      
      Fixes: ab03eff5 ("xprtrdma: Add trace points in RPC Call ... ")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      470443e0
    • Eric W. Biederman's avatar
      signal: Distinguish between kernel_siginfo and siginfo · ae7795bc
      Eric W. Biederman authored
      Linus recently observed that if we did not worry about the padding
      member in struct siginfo it is only about 48 bytes, and 48 bytes is
      much nicer than 128 bytes for allocating on the stack and copying
      around in the kernel.
      
      The obvious thing of only adding the padding when userspace is
      including siginfo.h won't work as there are sigframe definitions in
      the kernel that embed struct siginfo.
      
      So split siginfo in two; kernel_siginfo and siginfo.  Keeping the
      traditional name for the userspace definition.  While the version that
      is used internally to the kernel and ultimately will not be padded to
      128 bytes is called kernel_siginfo.
      
      The definition of struct kernel_siginfo I have put in include/signal_types.h
      
      A set of buildtime checks has been added to verify the two structures have
      the same field offsets.
      
      To make it easy to verify the change kernel_siginfo retains the same
      size as siginfo.  The reduction in size comes in a following change.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      ae7795bc
    • Chuck Lever's avatar
      xprtrdma: Rename rpcrdma_qp_async_error_upcall · f9521d53
      Chuck Lever authored
      Clean up: Use a function name that is consistent with the RDMA core
      API and with other consumers. Because this is a function that is
      invoked from outside the rpcrdma.ko module, add an appropriate
      documenting comment.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      f9521d53
  22. 02 Oct, 2018 1 commit