1. 04 Dec, 2018 2 commits
  2. 17 Jun, 2018 1 commit
  3. 21 May, 2018 1 commit
  4. 18 Apr, 2018 1 commit
    • Theodore Ts'o's avatar
      ext4: set h_journal if there is a failure starting a reserved handle · b2569260
      Theodore Ts'o authored
      If ext4 tries to start a reserved handle via
      jbd2_journal_start_reserved(), and the journal has been aborted, this
      can result in a NULL pointer dereference.  This is because the fields
      h_journal and h_transaction in the handle structure share the same
      memory, via a union, so jbd2_journal_start_reserved() will clear
      h_journal before calling start_this_handle().  If this function fails
      due to an aborted handle, h_journal will still be NULL, and the call
      to jbd2_journal_free_reserved() will pass a NULL journal to
      sub_reserve_credits().
      
      This can be reproduced by running "kvm-xfstests -c dioread_nolock
      generic/475".
      
      Cc: stable@kernel.org # 3.11
      Fixes: 8f7d89f3 ("jbd2: transaction reservation support")
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      b2569260
  5. 10 Jan, 2018 1 commit
    • Tobin C. Harding's avatar
      jbd2: fix sphinx kernel-doc build warnings · f69120ce
      Tobin C. Harding authored
      Sphinx emits various (26) warnings when building make target 'htmldocs'.
      Currently struct definitions contain duplicate documentation, some as
      kernel-docs and some as standard c89 comments.  We can reduce
      duplication while cleaning up the kernel docs.
      
      Move all kernel-docs to right above each struct member.  Use the set of
      all existing comments (kernel-doc and c89).  Add documentation for
      missing struct members and function arguments.
      Signed-off-by: default avatarTobin C. Harding <me@tobin.cc>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      f69120ce
  6. 18 Dec, 2017 1 commit
    • Theodore Ts'o's avatar
      ext4: fix up remaining files with SPDX cleanups · f5166768
      Theodore Ts'o authored
      A number of ext4 source files were skipped due because their copyright
      permission statements didn't match the expected text used by the
      automated conversion utilities.  I've added SPDX tags for the rest.
      
      While looking at some of these files, I've noticed that we have quite
      a bit of variation on the licenses that were used --- in particular
      some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
      and we have some files that have a LGPL-2.1 license (which was quite
      surprising).
      
      I've not attempted to do any license changes.  Even if it is perfectly
      legal to relicense to GPL 2.0-only for consistency's sake, that should
      be done with ext4 developer community discussion.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      
      f5166768
  7. 22 May, 2017 1 commit
    • Tahsin Erdogan's avatar
      jbd2: preserve original nofs flag during journal restart · b4709067
      Tahsin Erdogan authored
      When a transaction starts, start_this_handle() saves current
      PF_MEMALLOC_NOFS value so that it can be restored at journal stop time.
      Journal restart is a special case that calls start_this_handle() without
      stopping the transaction. start_this_handle() isn't aware that the
      original value is already stored so it overwrites it with current value.
      
      For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set
      at the end:
      
        jbd2_journal_start()
        jbd2__journal_restart()
        jbd2_journal_stop()
      
      Make jbd2__journal_restart() restore the original value before calling
      start_this_handle().
      
      Fixes: 81378da6 ("jbd2: mark the transaction context with the scope GFP_NOFS context")
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      b4709067
  8. 16 May, 2017 2 commits
  9. 03 May, 2017 1 commit
  10. 05 Feb, 2017 1 commit
  11. 13 Oct, 2016 1 commit
  12. 22 Sep, 2016 1 commit
  13. 30 Jun, 2016 3 commits
    • Jan Kara's avatar
      jbd2: track more dependencies on transaction commit · 1eaa566d
      Jan Kara authored
      So far we were tracking only dependency on transaction commit due to
      starting a new handle (which may require commit to start a new
      transaction). Now add tracking also for other cases where we wait for
      transaction commit. This way lockdep can catch deadlocks e. g. because we
      call jbd2_journal_stop() for a synchronous handle with some locks held
      which rank below transaction start.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1eaa566d
    • Jan Kara's avatar
      jbd2: move lockdep tracking to journal_s · ab714aff
      Jan Kara authored
      Currently lockdep map is tracked in each journal handle. To be able to
      expand lockdep support to cover also other cases where we depend on
      transaction commit and where handle is not available, move lockdep map
      into struct journal_s. Since this makes the lockdep map shared for all
      handles, we have to use rwsem_acquire_read() for acquisitions now.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ab714aff
    • Jan Kara's avatar
      jbd2: move lockdep instrumentation for jbd2 handles · 7a4b188f
      Jan Kara authored
      The transaction the handle references is free to commit once we've
      decremented t_updates counter. Move the lockdep instrumentation to that
      place. Currently it was a bit later which did not really matter but
      subsequent improvements to lockdep instrumentation would cause false
      positives with it.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      7a4b188f
  14. 24 Apr, 2016 1 commit
    • Jan Kara's avatar
      jbd2: add support for avoiding data writes during transaction commits · 41617e1a
      Jan Kara authored
      Currently when filesystem needs to make sure data is on permanent
      storage before committing a transaction it adds inode to transaction's
      inode list. During transaction commit, jbd2 writes back all dirty
      buffers that have allocated underlying blocks and waits for the IO to
      finish. However when doing writeback for delayed allocated data, we
      allocate blocks and immediately submit the data. Thus asking jbd2 to
      write dirty pages just unnecessarily adds more work to jbd2 possibly
      writing back other redirtied blocks.
      
      Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
      outstanding data writes before committing a transaction and thus avoid
      unnecessary writes.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      41617e1a
  15. 18 Apr, 2016 1 commit
  16. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  17. 13 Mar, 2016 1 commit
    • Michal Hocko's avatar
      jbd2: do not fail journal because of frozen_buffer allocation failure · 490c1b44
      Michal Hocko authored
      Journal transaction might fail prematurely because the frozen_buffer
      is allocated by GFP_NOFS request:
      [   72.440013] do_get_write_access: OOM for frozen_buffer
      [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
      [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
      (...snipped....)
      [   72.495559] do_get_write_access: OOM for frozen_buffer
      [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
      [   72.496839] do_get_write_access: OOM for frozen_buffer
      [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
      [   72.505766] Aborting journal on device sda1-8.
      [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
      
      This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
      allocations upon OOM" because small GPF_NOFS allocations never failed.
      This allocation seems essential for the journal and GFP_NOFS is too
      restrictive to the memory allocator so let's use __GFP_NOFAIL here to
      emulate the previous behavior.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      490c1b44
  18. 06 Jan, 2016 1 commit
  19. 04 Dec, 2015 1 commit
    • Junxiao Bi's avatar
      jbd2: fix null committed data return in undo_access · 087ffd4e
      Junxiao Bi authored
      introduced jbd2_write_access_granted() to improve write|undo_access
      speed, but missed to check the status of b_committed_data which caused
      a kernel panic on ocfs2.
      
      [ 6538.405938] ------------[ cut here ]------------
      [ 6538.406686] kernel BUG at fs/ocfs2/suballoc.c:2400!
      [ 6538.406686] invalid opcode: 0000 [#1] SMP
      [ 6538.406686] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront xen_fbfront parport_pc parport pcspkr i2c_piix4 acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix cirrus ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect i2c_core syscopyarea dm_mirror dm_region_hash dm_log dm_mod
      [ 6538.406686] CPU: 1 PID: 16265 Comm: mmap_truncate Not tainted 4.3.0 #1
      [ 6538.406686] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
      [ 6538.406686] task: ffff88007c2bab00 ti: ffff880075b78000 task.ti: ffff880075b78000
      [ 6538.406686] RIP: 0010:[<ffffffffa06a286b>]  [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
      [ 6538.406686] RSP: 0018:ffff880075b7b7f8  EFLAGS: 00010246
      [ 6538.406686] RAX: ffff8800760c5b40 RBX: ffff88006c06a000 RCX: ffffffffa06e6df0
      [ 6538.406686] RDX: 0000000000000000 RSI: ffff88007a6f6ea0 RDI: ffff88007a760430
      [ 6538.406686] RBP: ffff880075b7b878 R08: 0000000000000002 R09: 0000000000000001
      [ 6538.406686] R10: ffffffffa06769be R11: 0000000000000000 R12: 0000000000000001
      [ 6538.406686] R13: ffffffffa06a1750 R14: 0000000000000001 R15: ffff88007a6f6ea0
      [ 6538.406686] FS:  00007f17fde30720(0000) GS:ffff88007f040000(0000) knlGS:0000000000000000
      [ 6538.406686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6538.406686] CR2: 0000000000601730 CR3: 000000007aea0000 CR4: 00000000000406e0
      [ 6538.406686] Stack:
      [ 6538.406686]  ffff88007c2bb5b0 ffff880075b7b8e0 ffff88007a7604b0 ffff88006c640800
      [ 6538.406686]  ffff88007a7604b0 ffff880075d77390 0000000075b7b878 ffffffffa06a309d
      [ 6538.406686]  ffff880075d752d8 ffff880075b7b990 ffff880075b7b898 0000000000000000
      [ 6538.406686] Call Trace:
      [ 6538.406686]  [<ffffffffa06a309d>] ? ocfs2_read_group_descriptor+0x6d/0xa0 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a3654>] _ocfs2_free_suballoc_bits+0xe4/0x320 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a397e>] _ocfs2_free_clusters+0xee/0x210 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
      [ 6538.406686]  [<ffffffffa0682d50>] ? ocfs2_extend_trans+0x50/0x1a0 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a3ad5>] ocfs2_free_clusters+0x15/0x20 [ocfs2]
      [ 6538.406686]  [<ffffffffa065072c>] ocfs2_replay_truncate_records+0xfc/0x290 [ocfs2]
      [ 6538.406686]  [<ffffffffa06843ac>] ? ocfs2_start_trans+0xec/0x1d0 [ocfs2]
      [ 6538.406686]  [<ffffffffa0654600>] __ocfs2_flush_truncate_log+0x140/0x2d0 [ocfs2]
      [ 6538.406686]  [<ffffffffa0654394>] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x170 [ocfs2]
      [ 6538.406686]  [<ffffffffa065acd4>] ocfs2_remove_btree_range+0x374/0x630 [ocfs2]
      [ 6538.406686]  [<ffffffffa017486b>] ? jbd2_journal_stop+0x25b/0x470 [jbd2]
      [ 6538.406686]  [<ffffffffa065d5b5>] ocfs2_commit_truncate+0x305/0x670 [ocfs2]
      [ 6538.406686]  [<ffffffffa0683430>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
      [ 6538.406686]  [<ffffffffa067adb7>] ocfs2_truncate_file+0x297/0x380 [ocfs2]
      [ 6538.406686]  [<ffffffffa01759e4>] ? jbd2_journal_begin_ordered_truncate+0x64/0xc0 [jbd2]
      [ 6538.406686]  [<ffffffffa067c7a2>] ocfs2_setattr+0x572/0x860 [ocfs2]
      [ 6538.406686]  [<ffffffff810e4a3f>] ? current_fs_time+0x3f/0x50
      [ 6538.406686]  [<ffffffff812124b7>] notify_change+0x1d7/0x340
      [ 6538.406686]  [<ffffffff8121abf9>] ? generic_getxattr+0x79/0x80
      [ 6538.406686]  [<ffffffff811f5876>] do_truncate+0x66/0x90
      [ 6538.406686]  [<ffffffff81120e30>] ? __audit_syscall_entry+0xb0/0x110
      [ 6538.406686]  [<ffffffff811f5bb3>] do_sys_ftruncate.clone.0+0xf3/0x120
      [ 6538.406686]  [<ffffffff811f5bee>] SyS_ftruncate+0xe/0x10
      [ 6538.406686]  [<ffffffff816aa2ae>] entry_SYSCALL_64_fastpath+0x12/0x71
      [ 6538.406686] Code: 28 48 81 ee b0 04 00 00 48 8b 92 50 fb ff ff 48 8b 80 b0 03 00 00 48 39 90 88 00 00 00 0f 84 30 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
      [ 6538.406686] RIP  [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
      [ 6538.406686]  RSP <ffff880075b7b7f8>
      [ 6538.691128] ---[ end trace 31cd7011d6770d7e ]---
      [ 6538.694492] Kernel panic - not syncing: Fatal exception
      [ 6538.695484] Kernel Offset: disabled
      
      Fixes: de92c8ca("jbd2: speedup jbd2_journal_get_[write|undo]_access()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      087ffd4e
  20. 24 Nov, 2015 1 commit
    • Jan Kara's avatar
      jbd2: Fix unreclaimed pages after truncate in data=journal mode · bc23f0c8
      Jan Kara authored
      Ted and Namjae have reported that truncated pages don't get timely
      reclaimed after being truncated in data=journal mode. The following test
      triggers the issue easily:
      
      for (i = 0; i < 1000; i++) {
      	pwrite(fd, buf, 1024*1024, 0);
      	fsync(fd);
      	fsync(fd);
      	ftruncate(fd, 0);
      }
      
      The reason is that journal_unmap_buffer() finds that truncated buffers
      are not journalled (jh->b_transaction == NULL), they are part of
      checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have
      been already written out (!buffer_dirty(bh)). We clean such buffers but
      we leave them in the checkpoint list. Since checkpoint transaction holds
      a reference to the journal head, these buffers cannot be released until
      the checkpoint transaction is cleaned up. And at that point we don't
      call release_buffer_page() anymore so pages detached from mapping are
      lingering in the system waiting for reclaim to find them and free them.
      
      Fix the problem by removing buffers from transaction checkpoint lists
      when journal_unmap_buffer() finds out they don't have to be there
      anymore.
      Reported-and-tested-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Fixes: de1b7941Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      bc23f0c8
  21. 07 Nov, 2015 1 commit
    • Mel Gorman's avatar
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman authored
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  22. 04 Aug, 2015 1 commit
    • Lukas Czerner's avatar
      jbd2: limit number of reserved credits · 6d3ec14d
      Lukas Czerner authored
      Currently there is no limitation on number of reserved credits we can
      ask for. If we ask for more reserved credits than 1/2 of maximum
      transaction size, or if total number of credits exceeds the maximum
      transaction size per operation (which is currently only possible with
      the former) we will spin forever in start_this_handle().
      
      Fix this by adding this limitation at the start of start_this_handle().
      
      This patch also removes the credit limitation 1/2 of maximum transaction
      size, since we really only want to limit the number of reserved credits.
      There is not much point to limit the credits if there is still space in
      the journal.
      
      This accidentally also fixes the online resize, where due to the
      limitation of the journal credits we're unable to grow file systems with
      1k block size and size between 16M and 32M. It has been partially fixed
      by 2c869b26, but not entirely.
      
      Thanks Jan Kara for helping me getting the correct fix.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      6d3ec14d
  23. 12 Jul, 2015 1 commit
    • Jan Kara's avatar
      jbd2: speedup jbd2_journal_dirty_metadata() · 6e06ae88
      Jan Kara authored
      It is often the case that we mark buffer as having dirty metadata when
      the buffer is already in that state (frequent for bitmaps, inode table
      blocks, superblock). Thus it is unnecessary to contend on grabbing
      journal head reference and bh_state lock. Avoid that by checking whether
      any modification to the buffer is needed before grabbing any locks or
      references.
      
      [ Note: this is a fixed version of commit 2143c196, which was
        reverted in ebeaa8dd due to a false positive triggering of an
        assertion check. -- Ted ]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      6e06ae88
  24. 27 Jun, 2015 1 commit
    • Linus Torvalds's avatar
      Revert "jbd2: speedup jbd2_journal_dirty_metadata()" · ebeaa8dd
      Linus Torvalds authored
      This reverts commit 2143c196.
      
      This commit seems to be the cause of the following jbd2 assertion
      failure:
      
         ------------[ cut here ]------------
         kernel BUG at fs/jbd2/transaction.c:1325!
         invalid opcode: 0000 [#1] SMP
         Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 ...
         CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679 #1
         Hardware name:                  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
         task: ffff8803bf866040 ti: ffff880308528000 task.ti: ffff880308528000
         RIP: jbd2_journal_dirty_metadata+0x237/0x290
         Call Trace:
           __ext4_handle_dirty_metadata+0x43/0x1f0
           ext4_handle_dirty_dirent_node+0xde/0x160
           ? jbd2_journal_get_write_access+0x36/0x50
           ext4_delete_entry+0x112/0x160
           ? __ext4_journal_start_sb+0x52/0xb0
           ext4_unlink+0xfa/0x260
           vfs_unlink+0xec/0x190
           do_unlinkat+0x24a/0x270
           SyS_unlink+0x11/0x20
           entry_SYSCALL_64_fastpath+0x12/0x6a
         ---[ end trace ae033ebde8d080b4 ]---
      
      which is not easily reproducible (I've seen it just once, and then Ted
      was able to reproduce it once).  Revert it while Ted and Jan try to
      figure out what is wrong.
      
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ebeaa8dd
  25. 21 Jun, 2015 1 commit
    • Jan Kara's avatar
      jbd2: speedup jbd2_journal_dirty_metadata() · 2143c196
      Jan Kara authored
      It is often the case that we mark buffer as having dirty metadata when
      the buffer is already in that state (frequent for bitmaps, inode table
      blocks, superblock). Thus it is unnecessary to contend on grabbing
      journal head reference and bh_state lock. Avoid that by checking whether
      any modification to the buffer is needed before grabbing any locks or
      references.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      2143c196
  26. 08 Jun, 2015 5 commits
    • Jan Kara's avatar
      jbd2: speedup jbd2_journal_get_[write|undo]_access() · de92c8ca
      Jan Kara authored
      jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are
      frequently called for buffers that are already part of the running
      transaction - most frequently it is the case for bitmaps, inode table
      blocks, and superblock. Since in such cases we have nothing to do, it is
      unfortunate we still grab reference to journal head, lock the bh, lock
      bh_state only to find out there's nothing to do.
      
      Improving this is a bit subtle though since until we find out journal
      head is attached to the running transaction, it can disappear from under
      us because checkpointing / commit decided it's no longer needed. We deal
      with this by protecting journal_head slab with RCU. We still have to be
      careful about journal head being freed & reallocated within slab and
      about exposing journal head in consistent state (in particular
      b_modified and b_frozen_data must be in correct state before we allow
      user to touch the buffer).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      de92c8ca
    • Jan Kara's avatar
      jbd2: more simplifications in do_get_write_access() · 8b00f400
      Jan Kara authored
      Check for the simple case of unjournaled buffer first, handle it and
      bail out. This allows us to remove one if and unindent the difficult case
      by one tab. The result is easier to read.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      8b00f400
    • Jan Kara's avatar
      jbd2: simplify error path on allocation failure in do_get_write_access() · d012aa59
      Jan Kara authored
      We were acquiring bh_state_lock when allocation of buffer failed in
      do_get_write_access() only to be able to jump to a label that releases
      the lock and does all other checks that don't make sense for this error
      path. Just jump into the right label instead.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d012aa59
    • Jan Kara's avatar
      jbd2: simplify code flow in do_get_write_access() · ee57aba1
      Jan Kara authored
      needs_copy is set only in one place in do_get_write_access(), just move
      the frozen buffer copying into that place and factor it out to a
      separate function to make do_get_write_access() slightly more readable.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ee57aba1
    • Michal Hocko's avatar
      jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL · 6ccaf3e2
      Michal Hocko authored
      This basically reverts 47def826 (jbd2: Remove __GFP_NOFAIL from jbd2
      layer). The deprecation of __GFP_NOFAIL was a bad choice because it led
      to open coding the endless loop around the allocator rather than
      removing the dependency on the non failing allocation. So the
      deprecation was a clear failure and the reality tells us that
      __GFP_NOFAIL is not even close to go away.
      
      It is still true that __GFP_NOFAIL allocations are generally discouraged
      and new uses should be evaluated and an alternative (pre-allocations or
      reservations) should be considered but it doesn't make any sense to lie
      the allocator about the requirements. Allocator can take steps to help
      making a progress if it knows the requirements.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      6ccaf3e2
  27. 14 May, 2015 1 commit
    • Lukas Czerner's avatar
      ext4: fix NULL pointer dereference when journal restart fails · 9d506594
      Lukas Czerner authored
      Currently when journal restart fails, we'll have the h_transaction of
      the handle set to NULL to indicate that the handle has been effectively
      aborted. We handle this situation quietly in the jbd2_journal_stop() and just
      free the handle and exit because everything else has been done before we
      attempted (and failed) to restart the journal.
      
      Unfortunately there are a number of problems with that approach
      introduced with commit
      
      41a5b913 "jbd2: invalidate handle if jbd2_journal_restart()
      fails"
      
      First of all in ext4 jbd2_journal_stop() will be called through
      __ext4_journal_stop() where we would try to get a hold of the superblock
      by dereferencing h_transaction which in this case would lead to NULL
      pointer dereference and crash.
      
      In addition we're going to free the handle regardless of the refcount
      which is bad as well, because others up the call chain will still
      reference the handle so we might potentially reference already freed
      memory.
      
      Moreover it's expected that we'll get aborted handle as well as detached
      handle in some of the journalling function as the error propagates up
      the stack, so it's unnecessary to call WARN_ON every time we get
      detached handle.
      
      And finally we might leak some memory by forgetting to free reserved
      handle in jbd2_journal_stop() in the case where handle was detached from
      the transaction (h_transaction is NULL).
      
      Fix the NULL pointer dereference in __ext4_journal_stop() by just
      calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix
      the potential memory leak in jbd2_journal_stop() and use proper
      handle refcounting before we attempt to free it to avoid use-after-free
      issues.
      
      And finally remove all WARN_ON(!transaction) from the code so that we do
      not get random traces when something goes wrong because when journal
      restart fails we will get to some of those functions.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      9d506594
  28. 16 Jul, 2014 1 commit
    • NeilBrown's avatar
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown authored
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      74316201
  29. 05 Jul, 2014 1 commit
  30. 12 Mar, 2014 1 commit
  31. 09 Mar, 2014 2 commits