1. 18 Oct, 2018 2 commits
    • Eric Sandeen's avatar
      fscache: Fix out of bound read in long cookie keys · fa520c47
      Eric Sandeen authored
      fscache_set_key() can incur an out-of-bounds read, reported by KASAN:
       BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
       Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615
      and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236
        BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
        BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
        Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466
      This happens for any index_key_len which is not divisible by 4 and is
      larger than the size of the inline key, because the code allocates exactly
      index_key_len for the key buffer, but the hashing loop is stepping through
      it 4 bytes (u32) at a time in the buf[] array.
      Fix this by calculating how many u32 buffers we'll need by using
      DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
      buffer to hold the index_key, then using that same count as the hashing
      index limit.
      Fixes: ec0328e4 ("fscache: Maintain a catalogue of allocated cookies")
      Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • David Howells's avatar
      fscache: Fix incomplete initialisation of inline key space · 1ff22883
      David Howells authored
      The inline key in struct rxrpc_cookie is insufficiently initialized,
      zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
      bytes will end up hashing uninitialized memory because the memcpy only
      partially fills the last buf[] element.
      Fix this by clearing fscache_cookie objects on allocation rather than using
      the slab constructor to initialise them.  We're going to pretty much fill
      in the entire struct anyway, so bringing it into our dcache writably
      shouldn't incur much overhead.
      This removes the need to do clearance in fscache_set_key() (where we aren't
      doing it correctly anyway).
      Also, we don't need to set cookie->key_len in fscache_set_key() as we
      already did it in the only caller, so remove that.
      Fixes: ec0328e4 ("fscache: Maintain a catalogue of allocated cookies")
      Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
      Reported-by: default avatarEric Sandeen <sandeen@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  2. 25 Jul, 2018 2 commits
    • Kiran Kumar Modukuri's avatar
      fscache: Fix reference overput in fscache_attach_object() error handling · f29507ce
      Kiran Kumar Modukuri authored
      When a cookie is allocated that causes fscache_object structs to be
      allocated, those objects are initialised with the cookie pointer, but
      aren't blessed with a ref on that cookie unless the attachment is
      successfully completed in fscache_attach_object().
      If attachment fails because the parent object was dying or there was a
      collision, fscache_attach_object() returns without incrementing the cookie
      counter - but upon failure of this function, the object is released which
      then puts the cookie, whether or not a ref was taken on the cookie.
      Fix this by taking a ref on the cookie when it is assigned in
      fscache_object_init(), even when we're creating a root object.
      Analysis from Kiran Kumar:
      This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel
      BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
      fscache cookie ref count updated incorrectly during fscache object
      allocation resulting in following Oops.
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
      Two threads are trying to do operate on a cookie and two objects.
      (1) One thread tries to unmount the filesystem and in process goes over a
          huge list of objects marking them dead and deleting the objects.
          cookie->usage is also decremented in following path:
             -> __fscache_relinquish_cookie
              ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      (2) A second thread tries to lookup an object for reading data in following
          1) cachefiles_alloc_object
              -> fscache_object_init
                 -> assign cookie, but usage not bumped.
          2) fscache_attach_object -> fails in cant_attach_object because the
               cookie's backing object or cookie's->parent object are going away
          3) fscache_put_object
              -> cachefiles_put_object
                     ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      [NOTE from dhowells] It's unclear as to the circumstances in which (2) can
      take place, given that thread (1) is in nfs_kill_super(), however a
      conflicting NFS mount with slightly different parameters that creates a
      different superblock would do it.  A backtrace from Kiran seems to show
      that this is a possibility:
          kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
          RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
          Call Trace:
           __fscache_relinquish_cookie+0x87/0x120 [fscache]
           nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
           nfs_kill_super+0x29/0x40 [nfs]
      [Fix] Bump up the cookie usage in fscache_object_init, when it is first
      being assigned a cookie atomically such that the cookie is added and bumped
      up if its refcount is not zero.  Remove the assignment in
      I have run ~100 hours of NFS stress tests and not seen this bug recur.
      [Regression Potential]
       - Limited to fscache/cachefiles.
      Fixes: ccc4fc3d ("FS-Cache: Implement the cookie management part of the netfs API")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • Kiran Kumar Modukuri's avatar
      fscache: Allow cancelled operations to be enqueued · d0eb06af
      Kiran Kumar Modukuri authored
      Alter the state-check assertion in fscache_enqueue_operation() to allow
      cancelled operations to be given processing time so they can be cleaned up.
      Also fix a debugging statement that was requiring such operations to have
      an object assigned.
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
  3. 16 May, 2018 2 commits
  4. 11 Apr, 2018 1 commit
  5. 06 Apr, 2018 2 commits
  6. 04 Apr, 2018 7 commits
    • David Howells's avatar
      fscache: Attach the index key and aux data to the cookie · 402cb8dd
      David Howells authored
      Attach copies of the index key and auxiliary data to the fscache cookie so
       (1) The callbacks to the netfs for this stuff can be eliminated.  This
           can simplify things in the cache as the information is still
           available, even after the cache has relinquished the cookie.
       (2) Simplifies the locking requirements of accessing the information as we
           don't have to worry about the netfs object going away on us.
       (3) The cache can do lazy updating of the coherency information on disk.
           As long as the cache is flushed before reboot/poweroff, there's no
           need to update the coherency info on disk every time it changes.
       (4) Cookies can be hashed or put in a tree as the index key is easily
           available.  This allows:
           (a) Checks for duplicate cookies can be made at the top fscache layer
           	 rather than down in the bowels of the cache backend.
           (b) Caching can be added to a netfs object that has a cookie if the
           	 cache is brought online after the netfs object is allocated.
      A certain amount of space is made in the cookie for inline copies of the
      data, but if it won't fit there, extra memory will be allocated for it.
      The downside of this is that live cache operation requires more memory.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarAnna Schumaker <anna.schumaker@netapp.com>
      Tested-by: default avatarSteve Dickson <steved@redhat.com>
    • David Howells's avatar
      fscache: Add more tracepoints · 08c2e3d0
      David Howells authored
      Add more tracepoints to fscache, including:
       (*) fscache_page - Tracks netfs pages known to fscache.
       (*) fscache_check_page - Tracks the netfs querying whether a page is
           pending storage.
       (*) fscache_wake_cookie - Tracks cookies being woken up after a page
           completes/aborts storage in the cache.
       (*) fscache_op - Tracks operations being initialised.
       (*) fscache_wrote_page - Tracks return of the backend write_page op.
       (*) fscache_gang_lookup - Tracks lookup of pages to be stored in the write
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      fscache: Add tracepoints · a18feb55
      David Howells authored
      Add some tracepoints to fscache:
       (*) fscache_cookie - Tracks a cookie's usage count.
       (*) fscache_netfs - Logs registration of a network filesystem, including
           the pointer to the cookie allocated.
       (*) fscache_acquire - Logs cookie acquisition.
       (*) fscache_relinquish - Logs cookie relinquishment.
       (*) fscache_enable - Logs enablement of a cookie.
       (*) fscache_disable - Logs disablement of a cookie.
       (*) fscache_osm - Tracks execution of states in the object state machine.
      and cachefiles:
       (*) cachefiles_ref - Tracks a cachefiles object's usage count.
       (*) cachefiles_lookup - Logs result of lookup_one_len().
       (*) cachefiles_mkdir - Logs result of vfs_mkdir().
       (*) cachefiles_create - Logs result of vfs_create().
       (*) cachefiles_unlink - Logs calls to vfs_unlink().
       (*) cachefiles_rename - Logs calls to vfs_rename().
       (*) cachefiles_mark_active - Logs an object becoming active.
       (*) cachefiles_wait_active - Logs a wait for an old object to be
       (*) cachefiles_mark_inactive - Logs an object becoming inactive.
       (*) cachefiles_mark_buried - Logs the burial of an object.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      fscache: Fix hanging wait on page discarded by writeback · 2c984257
      David Howells authored
      If the fscache asynchronous write operation elects to discard a page that's
      pending storage to the cache because the page would be over the store limit
      then it needs to wake the page as someone may be waiting on completion of
      the write.
      The problem is that the store limit may be updated by a different
      asynchronous operation - and so may miss the write - and that the store
      limit may not even get updated until later by the netfs.
      Fix the kernel hang by making fscache_write_op() mark as written any pages
      that are over the limit.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      fscache: Detect multiple relinquishment of a cookie · d0fb31ec
      David Howells authored
      Report if an fscache cookie is relinquished multiple times by the netfs.
      Signed-off-by: default avatarDavid <dhowells@redhat.com>
    • David Howells's avatar
      fscache: Pass the correct cancelled indications to fscache_op_complete() · b27ddd46
      David Howells authored
      The last parameter to fscache_op_complete() is a bool indicating whether or
      not the operation was cancelled.  A lot of the time the inverse value is
      given or no differentiation is made.  Fix this.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      fscache, cachefiles: Fix checker warnings · bfa3837e
      David Howells authored
      Fix a couple of checker warnings in fscache and cachefiles:
       (1) fscache_n_op_requeue is never used, so get rid of it.
       (2) cachefiles_uncache_page() is passed in a lock that it releases, so
           this needs annotating.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
  7. 20 Mar, 2018 1 commit
  8. 16 Nov, 2017 1 commit
  9. 13 Nov, 2017 1 commit
    • David Howells's avatar
      Pass mode to wait_on_atomic_t() action funcs and provide default actions · 5e4def20
      David Howells authored
      Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
      extra argument and make it 'unsigned int throughout.
      Also, consolidate a bunch of identical action functions into a default
      function that can do the appropriate thing for the mode.
      Also, change the argument name in the bit_wait*() function declarations to
      reflect the fact that it's the mode and not the bit number.
      [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
      should be done differently, though he's not immediately sure as to how]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      cc: Ingo Molnar <mingo@kernel.org>
  10. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      How this work was done:
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
      All documentation files were explicitly excluded.
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
         For non */uapi/* files that summary was:
         SPDX license identifier                            # files
         GPL-2.0                                              11139
         and resulted in the first patch in this series.
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
         SPDX license identifier                            # files
         GPL-2.0 WITH Linux-syscall-note                        930
         and resulted in the second patch in this series.
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
         SPDX license identifier                            # files
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
         and that resulted in the third patch in this series.
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  11. 12 Oct, 2017 1 commit
    • Eric Biggers's avatar
      FS-Cache: fix dereference of NULL user_key_payload · d124b2c5
      Eric Biggers authored
      When the file /proc/fs/fscache/objects (available with
      CONFIG_FSCACHE_OBJECT_LIST=y) is opened, we request a user key with
      description "fscache:objlist", then access its payload.  However, a
      revoked key has a NULL payload, and we failed to check for this.
      request_key() *does* skip revoked keys, but there is still a window
      where the key can be revoked before we access its payload.
      Fix it by checking for a NULL payload, treating it like a key which was
      already revoked at the time it was requested.
      Fixes: 4fbf4291 ("FS-Cache: Allow the current state of all objects to be dumped")
      Reviewed-by: default avatarJames Morris <james.l.morris@oracle.com>
      Cc: <stable@vger.kernel.org>    [v2.6.32+]
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
  12. 14 Sep, 2017 1 commit
  13. 07 Sep, 2017 2 commits
  14. 01 Mar, 2017 1 commit
    • David Howells's avatar
      KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload() · 0837e49a
      David Howells authored
      rcu_dereference_key() and user_key_payload() are currently being used in
      two different, incompatible ways:
       (1) As a wrapper to rcu_dereference() - when only the RCU read lock used
           to protect the key.
       (2) As a wrapper to rcu_dereference_protected() - when the key semaphor is
           used to protect the key and the may be being modified.
      Fix this by splitting both of the key wrappers to produce:
       (1) RCU accessors for keys when caller has the key semaphore locked:
       (2) RCU accessors for keys when caller holds the RCU read lock:
      This should fix following warning in the NFS idmapper
        [ INFO: suspicious RCU usage. ]
        4.10.0 #1 Tainted: G        W
        ./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
        other info that might help us debug this:
        rcu_scheduler_active = 2, debug_locks = 0
        1 lock held by mount.nfs/5987:
          #0:  (rcu_read_lock){......}, at: [<d000000002527abc>] nfs_idmap_get_key+0x15c/0x420 [nfsv4]
        stack backtrace:
        CPU: 1 PID: 5987 Comm: mount.nfs Tainted: G        W       4.10.0 #1
        Call Trace:
          dump_stack+0xe8/0x154 (unreliable)
          nfs_idmap_get_key+0x380/0x420 [nfsv4]
          nfs_map_name_to_uid+0x2a0/0x3b0 [nfsv4]
          decode_getfattr_attrs+0xfac/0x16b0 [nfsv4]
          decode_getfattr_generic.constprop.106+0xbc/0x150 [nfsv4]
          nfs4_xdr_dec_lookup_root+0xac/0xb0 [nfsv4]
          rpcauth_unwrap_resp+0xe8/0x140 [sunrpc]
          call_decode+0x29c/0x910 [sunrpc]
          __rpc_execute+0x140/0x8f0 [sunrpc]
          rpc_run_task+0x170/0x200 [sunrpc]
          nfs4_call_sync_sequence+0x68/0xa0 [nfsv4]
          _nfs4_lookup_root.isra.44+0xd0/0xf0 [nfsv4]
          nfs4_lookup_root+0xe0/0x350 [nfsv4]
          nfs4_lookup_root_sec+0x70/0xa0 [nfsv4]
          nfs4_find_root_sec+0xc4/0x100 [nfsv4]
          nfs4_proc_get_rootfh+0x5c/0xf0 [nfsv4]
          nfs4_get_rootfh+0x6c/0x190 [nfsv4]
          nfs4_server_common_setup+0xc4/0x260 [nfsv4]
          nfs4_create_server+0x278/0x3c0 [nfsv4]
          nfs4_remote_mount+0x50/0xb0 [nfsv4]
          nfs_do_root_mount+0xb0/0x140 [nfsv4]
          nfs4_try_mount+0x60/0x100 [nfsv4]
          nfs_fs_mount+0x5ec/0xda0 [nfs]
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
  15. 31 Jan, 2017 3 commits
    • David Howells's avatar
      fscache: Fix dead object requeue · e26bfebd
      David Howells authored
      Under some circumstances, an fscache object can become queued such that it
      fscache_object_work_func() can be called once the object is in the
      OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
      invoke the handler for the state (which is hard coded to 0x2).
      The way this comes about is something like the following:
       (1) The object dispatcher is processing a work state for an object.  This
           is done in workqueue context.
       (2) An out-of-band event comes in that isn't masked, causing the object to
           be queued, say EV_KILL.
       (3) The object dispatcher finishes processing the current work state on
           that object and then sees there's another event to process, so,
           without returning to the workqueue core, it processes that event too.
           It then follows the chain of events that initiates until we reach
           OBJECT_DEAD without going through a wait state (such as
           At this point, object->events may be 0, object->event_mask will be 0
           and oob_event_mask will be 0.
       (4) The object dispatcher returns to the workqueue processor, and in due
           course, this sees that the object's work item is still queued and
           invokes it again.
       (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
           jumps to it - resulting in an OOPS.
      When I'm seeing this, the work state in (1) appears to have been either
      LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
      The window for (2) is very small:
       (A) object->event_mask is cleared whilst the event dispatch process is
           underway - though there's no memory barrier to force this to the top
           of the function.
           The window, therefore is from the time the object was selected by the
           workqueue processor and made requeueable to the time the mask was
       (B) fscache_raise_event() will only queue the object if it manages to set
           the event bit and the corresponding event_mask bit was set.
           The enqueuement is then deferred slightly whilst we get a ref on the
           object and get the per-CPU variable for workqueue congestion.  This
           slight deferral slightly increases the probability by allowing extra
           time for the workqueue to make the item requeueable.
      Handle this by giving the dead state a processor function and checking the
      for the dead state address rather than seeing if the processor function is
      address 0x2.  The dead state processor function can then set a flag to
      indicate that it's occurred and give a warning if it occurs more than once
      per object.
      If this race occurs, an oops similar to the following is seen (note the RIP
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      IP: [<0000000000000002>] 0x1
      PGD 0
      Oops: 0010 [#1] SMP
      Modules linked in: ...
      CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
      Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
      Workqueue: fscache_object fscache_object_work_func [fscache]
      task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
      RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
      RSP: 0018:ffff880717547df8  EFLAGS: 00010202
      RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
      RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
      RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
      R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
      R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
      FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
       ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
       ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
      Call Trace:
       [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
       [<ffffffff8109d5db>] process_one_work+0x17b/0x470
       [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
       [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
       [<ffffffff810a5acf>] kthread+0xcf/0xe0
       [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff816460d8>] ret_from_fork+0x58/0x90
       [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJeremy McNicoll <jeremymc@redhat.com>
      Tested-by: default avatarFrank Sorenson <sorenson@redhat.com>
      Tested-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • David Howells's avatar
      fscache: Clear outstanding writes when disabling a cookie · 6bdded59
      David Howells authored
      fscache_disable_cookie() needs to clear the outstanding writes on the
      cookie it's disabling because they cannot be completed after.
      Without this, fscache_nfs_open_file() gets stuck because it disables the
      cookie when the file is opened for writing but can't uncache the pages till
      afterwards - otherwise there's a race between the open routine and anyone
      who already has it open R/O and is still reading from it.
      Looking in /proc/pid/stack of the offending process shows:
      [<ffffffffa0142883>] __fscache_wait_on_page_write+0x82/0x9b [fscache]
      [<ffffffffa014336e>] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache]
      [<ffffffffa01740fa>] nfs_fscache_open_file+0x59/0x9e [nfs]
      [<ffffffffa01ccf41>] nfs4_file_open+0x17f/0x1b8 [nfsv4]
      [<ffffffff8117350e>] do_dentry_open+0x16d/0x2b7
      [<ffffffff811743ac>] vfs_open+0x5c/0x65
      [<ffffffff81184185>] path_openat+0x785/0x8fb
      [<ffffffff81184343>] do_filp_open+0x48/0x9e
      [<ffffffff81174710>] do_sys_open+0x13b/0x1cb
      [<ffffffff811747b9>] SyS_open+0x19/0x1b
      [<ffffffff81001c44>] do_syscall_64+0x80/0x17a
      [<ffffffff8165c2da>] return_from_SYSCALL_64+0x0/0x7a
      [<ffffffffffffffff>] 0xffffffffffffffff
      Reported-by: default avatarJianhong Yin <jiyin@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • David Howells's avatar
      FS-Cache: Initialise stores_lock in netfs cookie · 62deb818
      David Howells authored
      Initialise the stores_lock in fscache netfs cookies.  Technically, it
      shouldn't be necessary, since the netfs cookie is an index and stores no
      data, but initialising it anyway adds insignificant overhead.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  16. 01 Jun, 2016 1 commit
  17. 29 May, 2016 1 commit
  18. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      This promise never materialized.  And unlikely will.
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      Let's stop pretending that pages in page cache are special.  They are
      The changes are pretty straight-forward:
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
       - page_cache_get() -> get_page();
       - page_cache_release() -> put_page();
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      virtual patch
      expression E;
      + E
      expression E;
      + E
      + PAGE_SHIFT
      + PAGE_SIZE
      + PAGE_MASK
      expression E;
      + PAGE_ALIGN(E)
      expression E;
      - page_cache_get(E)
      + get_page(E)
      expression E;
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  19. 11 Nov, 2015 3 commits
  20. 07 Nov, 2015 1 commit
    • Mel Gorman's avatar
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman authored
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      This patch then converts a number of sites
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  21. 21 Oct, 2015 1 commit
    • David Howells's avatar
      KEYS: Merge the type-specific data with the payload data · 146aa8b1
      David Howells authored
      Merge the type-specific data with the payload data into one four-word chunk
      as it seems pointless to keep them separate.
      Use user_key_payload() for accessing the payloads of overloaded
      user-defined keys.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-cifs@vger.kernel.org
      cc: ecryptfs@vger.kernel.org
      cc: linux-ext4@vger.kernel.org
      cc: linux-f2fs-devel@lists.sourceforge.net
      cc: linux-nfs@vger.kernel.org
      cc: ceph-devel@vger.kernel.org
      cc: linux-ima-devel@lists.sourceforge.net
  22. 02 Apr, 2015 4 commits
    • David Howells's avatar
      FS-Cache: Retain the netfs context in the retrieval op earlier · 4a47132f
      David Howells authored
      Now that the retrieval operation may be disposed of by fscache_put_operation()
      before we actually set the context, the retrieval-specific cleanup operation
      can produce a NULL-pointer dereference when it tries to unconditionally clean
      up the netfs context.
      Given that it is expected that we'll get at least as far as the place where we
      currently set the context pointer and it is unlikely we'll go through the
      error handling paths prior to that point, retain the context right from the
      point that the retrieval op is allocated.
      Concomitant to this, we need to retain the cookie pointer in the retrieval op
      also so that we can call the netfs to release its context in the release
      In addition, we might now get into fscache_release_retrieval_op() with the op
      only initialised.  To this end, set the operation to DEAD only after the
      release method has been called and skip the n_pages test upon cleanup if the
      op is still in the INITIALISED state.
      Without these changes, the following oops might be seen:
      	BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
      	RIP: 0010:[<ffffffffa0089c98>] fscache_release_retrieval_op+0xae/0x100
      	Call Trace:
      	 [<ffffffffa0088560>] fscache_put_operation+0x117/0x2e0
      	 [<ffffffffa008b8f5>] __fscache_read_or_alloc_pages+0x351/0x3ac
      	 [<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
      	 [<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs]
      	 [<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e
      	 [<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a
      	 [<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c
      	 [<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af
      	 [<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a
      	 [<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a
      	 [<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
      	 [<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs]
      	 [<ffffffff811363be>] new_sync_read+0x78/0x9c
      	 [<ffffffff81137164>] __vfs_read+0x13/0x38
      	 [<ffffffff8113721e>] vfs_read+0x95/0x121
      	 [<ffffffff811372f6>] SyS_read+0x4c/0x8a
      	 [<ffffffff81557a52>] system_call_fastpath+0x12/0x17
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarJeff Layton <jeff.layton@primarydata.com>
    • David Howells's avatar
      FS-Cache: The operation cancellation method needs calling in more places · d3b97ca4
      David Howells authored
      Any time an incomplete operation is cancelled, the operation cancellation
      function needs to be called to clean up.  This is currently being passed
      directly to some of the functions that might want to call it, but not all.
      Instead, pass the cancellation method pointer to the fscache_operation_init()
      and have that cache it in the operation struct.  Further, plug in a dummy
      cancellation handler if the caller declines to set one as this allows us to
      call the function unconditionally (the extra overhead isn't worth bothering
      about as we don't expect to be calling this typically).
      The cancellation method must thence be called everywhere the CANCELLED state
      is set.  Note that we call it *before* setting the CANCELLED state such that
      the method can use the old state value to guide its operation.
      fscache_do_cancel_retrieval() needs moving higher up in the sources so that
      the init function can use it now.
      Without this, the following oops may be seen:
      	FS-Cache: Assertion failed
      	FS-Cache: 3 == 0 is false
      	------------[ cut here ]------------
      	kernel BUG at ../fs/fscache/page.c:261!
      	RIP: 0010:[<ffffffffa0089c1b>]  fscache_release_retrieval_op+0x77/0x100
      	 [<ffffffffa008853d>] fscache_put_operation+0x114/0x2da
      	 [<ffffffffa008b8c2>] __fscache_read_or_alloc_pages+0x358/0x3b3
      	 [<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
      	 [<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs]
      	 [<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e
      	 [<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a
      	 [<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c
      	 [<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af
      	 [<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a
      	 [<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a
      	 [<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
      	 [<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs]
      	 [<ffffffff811363be>] new_sync_read+0x78/0x9c
      	 [<ffffffff81137164>] __vfs_read+0x13/0x38
      	 [<ffffffff8113721e>] vfs_read+0x95/0x121
      	 [<ffffffff811372f6>] SyS_read+0x4c/0x8a
      	 [<ffffffff81557a52>] system_call_fastpath+0x12/0x17
      The assertion is showing that the remaining number of pages (n_pages) is not 0
      when the operation is being released.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarJeff Layton <jeff.layton@primarydata.com>
    • David Howells's avatar
      FS-Cache: Put an aborted initialised op so that it is accounted correctly · a39caadf
      David Howells authored
      Call fscache_put_operation() or a wrapper on any op that has gone through
      fscache_operation_init() so that the accounting shown in /proc is done
      correctly, specifically fscache_n_op_release.
      fscache_put_operation() therefore now allows an op in the INITIALISED state as
      well as in the CANCELLED and COMPLETE states.
      Note that this means that an operation can get put that doesn't have its
      ->object pointer filled in, so anything that depends on the object needs to be
      conditional in fscache_put_operation().
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarJeff Layton <jeff.layton@primarydata.com>
    • David Howells's avatar
      FS-Cache: Fix cancellation of in-progress operation · 73c04a47
      David Howells authored
      Cancellation of an in-progress operation needs to update the relevant counters
      and start any operations that are pending waiting on this one.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarJeff Layton <jeff.layton@primarydata.com>