1. 20 Feb, 2017 21 commits
  2. 18 Jan, 2017 4 commits
  3. 12 Jan, 2017 2 commits
  4. 14 Dec, 2016 2 commits
    • Ilya Dryomov's avatar
      libceph: always signal completion when done · c297eb42
      Ilya Dryomov authored
      r_safe_completion is currently, and has always been, signaled only if
      on-disk ack was requested.  It's there for fsync and syncfs, which wait
      for in-flight writes to flush - all data write requests set ONDISK.
      
      However, the pool perm check code introduced in 4.2 sends a write
      request with only ACK set.  An unfortunately timed syncfs can then hang
      forever: r_safe_completion won't be signaled because only an unsafe
      reply was requested.
      
      We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
      that is somewhat incomplete and yet another special case.  Instead,
      rename this completion to r_done_completion and always signal it when
      the OSD client is done with the request, whether unsafe, safe, or
      error.  This is a bit cleaner and helps with the cancellation code.
      Reported-by: default avatarYan, Zheng <zyan@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      c297eb42
    • Yan, Zheng's avatar
      ceph: avoid creating orphan object when checking pool permission · 80e80fbb
      Yan, Zheng authored
      Pool permission check needs to write to the first object. But for
      snapshot, head of the first object may have already been deleted.
      Skip the check for snapshot inode to avoid creating orphan object.
      
      Link: http://tracker.ceph.com/issues/18211Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      80e80fbb
  5. 12 Dec, 2016 11 commits
    • Yan, Zheng's avatar
      ceph: properly set issue_seq for cap release · dc24de82
      Yan, Zheng authored
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      dc24de82
    • Jeff Layton's avatar
      ceph: add flags parameter to send_cap_msg · 1e4ef0c6
      Jeff Layton authored
      Add a flags parameter to send_cap_msg, so we can request expedited
      service from the MDS when we know we'll be waiting on the result.
      
      Set that flag in the case of try_flush_caps. The callers of that
      function generally wait synchronously on the result, so it's beneficial
      to ask the server to expedite it.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarYan, Zheng <zyan@redhat.com>
      1e4ef0c6
    • Jeff Layton's avatar
      ceph: update cap message struct version to 10 · 43b29673
      Jeff Layton authored
      The userland ceph has MClientCaps at struct version 10. This brings the
      kernel up the same version.
      
      For now, all of the the new stuff is set to default values including
      the flags field, which will be conditionally set in a later patch.
      
      Note that we don't need to set the change_attr and btime to anything
      since we aren't currently setting the feature flag. The MDS should
      ignore those values.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarYan, Zheng <zyan@redhat.com>
      43b29673
    • Jeff Layton's avatar
      ceph: define new argument structure for send_cap_msg · 0ff8bfb3
      Jeff Layton authored
      When we get to this many arguments, it's hard to work with positional
      parameters. send_cap_msg is already at 25 arguments, with more needed.
      
      Define a new args structure and pass a pointer to it to send_cap_msg.
      Eventually it might make sense to embed one of these inside
      ceph_cap_snap instead of tracking individual fields.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarYan, Zheng <zyan@redhat.com>
      0ff8bfb3
    • Jeff Layton's avatar
      ceph: move xattr initialzation before the encoding past the ceph_mds_caps · 9670079f
      Jeff Layton authored
      Just for clarity. This part is inside the header, so it makes sense to
      group it with the rest of the stuff in the header.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarYan, Zheng <zyan@redhat.com>
      9670079f
    • Jeff Layton's avatar
      ceph: fix minor typo in unsafe_request_wait · 4945a084
      Jeff Layton authored
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarYan, Zheng <zyan@redhat.com>
      4945a084
    • Yan, Zheng's avatar
      ceph: record truncate size/seq for snap data writeback · 5f743e45
      Yan, Zheng authored
      Dirty snapshot data needs to be flushed unconditionally. If they
      were created before truncation, writeback should use old truncate
      size/seq.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      5f743e45
    • Yan, Zheng's avatar
      ceph: check availability of mds cluster on mount · e9e427f0
      Yan, Zheng authored
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      e9e427f0
    • Yan, Zheng's avatar
      ceph: fix splice read for no Fc capability case · 7ce469a5
      Yan, Zheng authored
      When iov_iter type is ITER_PIPE, copy_page_to_iter() increases
      the page's reference and add the page to a pipe_buffer. It also
      set the pipe_buffer's ops to page_cache_pipe_buf_ops. The comfirm
      callback in page_cache_pipe_buf_ops expects the page is from page
      cache and uptodate, otherwise it return error.
      
      For ceph_sync_read() case, pages are not from page cache. So we
      can't call copy_page_to_iter() when iov_iter type is ITER_PIPE.
      The fix is using iov_iter_get_pages_alloc() to allocate pages
      for the pipe. (the code is similar to default_file_splice_read)
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      7ce469a5
    • Yan, Zheng's avatar
      ceph: try getting buffer capability for readahead/fadvise · 2b1ac852
      Yan, Zheng authored
      For readahead/fadvise cases, caller of ceph_readpages does not
      hold buffer capability. Pages can be added to page cache while
      there is no buffer capability. This can cause data integrity
      issue.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      2b1ac852
    • Nikolay Borisov's avatar
      ceph: fix scheduler warning due to nested blocking · 5c341ee3
      Nikolay Borisov authored
      try_get_cap_refs can be used as a condition in a wait_event* calls.
      This is all fine until it has to call __ceph_do_pending_vmtruncate,
      which in turn acquires the i_truncate_mutex. This leads to a situation
      in which a task's state is !TASK_RUNNING and at the same time it's
      trying to acquire a sleeping primitive. In essence a nested sleeping
      primitives are being used. This causes the following warning:
      
      WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0()
      do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110
       ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6
      CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G           O    4.4.20-clouder2 #6
      Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015
       0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0
       ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c
       0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0
      Call Trace:
       [<ffffffff812f4409>] dump_stack+0x67/0x9e
       [<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0
       [<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50
       [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
       [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
       [<ffffffff8107767f>] __might_sleep+0x9f/0xb0
       [<ffffffff81612d30>] mutex_lock+0x20/0x40
       [<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph]
       [<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph]
       [<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph]
       [<ffffffff81094370>] ? wait_woken+0xb0/0xb0
       [<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph]
       [<ffffffff81613f22>] ? schedule_timeout+0x202/0x260
       [<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200
       [<ffffffff811b46ce>] ? iput+0x9e/0x230
       [<ffffffff81077632>] ? __might_sleep+0x52/0xb0
       [<ffffffff81156147>] ? __might_fault+0x37/0x40
       [<ffffffff8119e123>] ? cp_new_stat+0x153/0x170
       [<ffffffff81198cfa>] __vfs_write+0xaa/0xe0
       [<ffffffff81199369>] vfs_write+0xa9/0x190
       [<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70
       [<ffffffff8119a056>] SyS_write+0x46/0xa0
      
      This happens since wait_event_interruptible can interfere with the
      mutex locking code, since they both fiddle with the task state.
      
      Fix the issue by using the newly-added nested blocking infrastructure
      in 61ada528 ("sched/wait: Provide infrastructure to deal with
      nested blocking")
      
      Link: https://lwn.net/Articles/628628/Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      5c341ee3