1. 28 Aug, 2013 1 commit
  2. 01 Jul, 2013 3 commits
    • Theodore Ts'o's avatar
      jbd2: invalidate handle if jbd2_journal_restart() fails · 41a5b913
      Theodore Ts'o authored
      If jbd2_journal_restart() fails the handle will have been disconnected
      from the current transaction.  In this situation, the handle must not
      be used for for any jbd2 function other than jbd2_journal_stop().
      Enforce this with by treating a handle which has a NULL transaction
      pointer as an aborted handle, and issue a kernel warning if
      jbd2_journal_extent(), jbd2_journal_get_write_access(),
      jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.
      
      This commit also fixes a bug where jbd2_journal_stop() would trip over
      a kernel jbd2 assertion check when trying to free an invalid handle.
      
      Also move the responsibility of setting current->journal_info to
      start_this_handle(), simplifying the three users of this function.
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: 's avatarYounger Liu <younger.liu@huawei.com>
      Cc: Jan Kara <jack@suse.cz>
      41a5b913
    • Theodore Ts'o's avatar
      jbd2: fix theoretical race in jbd2__journal_restart · 39c04153
      Theodore Ts'o authored
      Once we decrement transaction->t_updates, if this is the last handle
      holding the transaction from closing, and once we release the
      t_handle_lock spinlock, it's possible for the transaction to commit
      and be released.  In practice with normal kernels, this probably won't
      happen, since the commit happens in a separate kernel thread and it's
      unlikely this could all happen within the space of a few CPU cycles.
      
      On the other hand, with a real-time kernel, this could potentially
      happen, so save the tid found in transaction->t_tid before we release
      t_handle_lock.  It would require an insane configuration, such as one
      where the jbd2 thread was set to a very high real-time priority,
      perhaps because a high priority real-time thread is trying to read or
      write to a file system.  But some people who use real-time kernels
      have been known to do insane things, including controlling
      laser-wielding industrial robots.  :-)
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      39c04153
    • Theodore Ts'o's avatar
      jbd2: move superblock checksum calculation to jbd2_write_superblock() · fe52d17c
      Theodore Ts'o authored
      Some of the functions which modify the jbd2 superblock were not
      updating the checksum before calling jbd2_write_superblock().  Move
      the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
      that the checksum is calculated consistently.
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: stable@vger.kernel.org
      fe52d17c
  3. 13 Jun, 2013 6 commits
    • Paul Gortmaker's avatar
      jbd2: remove debug dependency on debug_fs and update Kconfig help text · 75497d06
      Paul Gortmaker authored
      Commit b6e96d00 ("jbd2: use module parameters instead of debugfs
      for jbd_debug") removed any need for a dependency on DEBUG_FS.  It
      also moved the /sys variables out from underneath the typical debugfs
      mount point.  Delete the dependency and update the /sys path to where
      the debug settings are currently.
      Signed-off-by: 's avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      75497d06
    • Paul Gortmaker's avatar
      jbd2: use a single printk for jbd_debug() · 169f1a2a
      Paul Gortmaker authored
      Since the jbd_debug() is implemented with two separate printk()
      calls, it can lead to corrupted and misleading debug output like
      the following (see lines marked with "*"):
      
      [  290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
      [  290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
      [  290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
      [* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
      [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
      [* 290.339382] JBD2: starting commit of transaction 42104
      [  290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
      [  290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079
      
      i.e. the debug output from log_wait_commit and journal_commit_transaction
      have become interleaved.  The output should have been:
      
      (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
      (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104
      
      It is expected that this is not easy to replicate -- I was only able
      to cause it on preempt-rt kernels, and even then only under heavy
      I/O load.
      Reported-by: 's avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Suggested-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: 's avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      169f1a2a
    • Paul Gortmaker's avatar
      jbd2: fix duplicate debug label for phase 2 · cfc7bc89
      Paul Gortmaker authored
      Currently we see this output:
      
        $git grep phase fs/jbd2
        fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 1\n");
        fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 2\n");
        fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 2\n");
        fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 3\n");
        fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 4\n");
        [...]
      
      There is clearly a duplicate label for phase 2, and they are
      both active (i.e. not in #if ... #else block).  Rename them to
      be "2a" and "2b" so the debug output is unambiguous.
      Signed-off-by: 's avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      cfc7bc89
    • Paul Gortmaker's avatar
      jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space() · 0ef54180
      Paul Gortmaker authored
      While trying to debug an an issue under extreme I/O loading
      on preempt-rt kernels, the following backtrace was observed
      via SysRQ output:
      
      rm              D ffff8802203afbc0  4600  4878   4748 0x00000000
       ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
       ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
       ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
      Call Trace:
       [<ffffffff8172dc34>] schedule+0x24/0x70
       [<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
       [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
       [<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
       [<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
       [<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
       [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
       [<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
       [<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
       [<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
       [<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
       [<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
       [<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
       [<ffffffff8114370f>] do_rmdir+0xdf/0x120
       [<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
       [<ffffffff81002889>] ? do_notify_resume+0x89/0x100
       [<ffffffff817361ae>] ? int_signal+0x12/0x17
       [<ffffffff81145d85>] sys_unlinkat+0x25/0x40
       [<ffffffff81735f22>] system_call_fastpath+0x16/0x1b
      
      What is interesting here, is that we call log_wait_commit, from
      within wait_for_space, but we are still holding the checkpoint_mutex
      as it surrounds mostly the whole of wait_for_space.  And then, as we
      are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
      bit is set, then we will also try to take the same checkpoint_mutex.
      
      It seems that we need to drop the checkpoint_mutex while sitting in
      jbd2_log_wait_commit, if we want to guarantee that progress can be made
      by jbd2_journal_commit_transaction().  There does not seem to be
      anything preempt-rt specific about this, other then perhaps increasing
      the odds of it happening.
      Signed-off-by: 's avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      0ef54180
    • Paul Gortmaker's avatar
      jbd2: relocate assert after state lock in journal_commit_transaction() · 3ca841c1
      Paul Gortmaker authored
      The state lock is taken after we are doing an assert on the state
      value, not before.  So we might in fact be doing an assert on a
      transient value.  Ensure the state check is within the scope of
      the state lock being taken.
      Signed-off-by: 's avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      3ca841c1
    • Dmitry Monakhov's avatar
      jbd2: optimize jbd2_journal_force_commit · 9ff86446
      Dmitry Monakhov authored
      Current implementation of jbd2_journal_force_commit() is suboptimal because
      result in empty and useless commits. But callers just want to force and wait
      any unfinished commits. We already have jbd2_journal_force_commit_nested()
      which does exactly what we want, except we are guaranteed that we do not hold
      journal transaction open.
      Signed-off-by: 's avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      9ff86446
  4. 04 Jun, 2013 8 commits
    • Jan Kara's avatar
      jbd2: transaction reservation support · 8f7d89f3
      Jan Kara authored
      In some cases we cannot start a transaction because of locking
      constraints and passing started transaction into those places is not
      handy either because we could block transaction commit for too long.
      Transaction reservation is designed to solve these issues.  It
      reserves a handle with given number of credits in the journal and the
      handle can be later attached to the running transaction without
      blocking on commit or checkpointing.  Reserved handles do not block
      transaction commit in any way, they only reduce maximum size of the
      running transaction (because we have to always be prepared to
      accomodate request for attaching reserved handle).
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      8f7d89f3
    • Jan Kara's avatar
      jbd2: remove unused waitqueues · f29fad72
      Jan Kara authored
      j_wait_logspace and j_wait_checkpoint are unused.  Remove them.
      Reviewed-by: 's avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      f29fad72
    • Jan Kara's avatar
      jbd2: fix race in t_outstanding_credits update in jbd2_journal_extend() · fe1e8db5
      Jan Kara authored
      jbd2_journal_extend() first checked whether transaction can accept
      extending handle with more credits and then added credits to
      t_outstanding_credits.  This can race with start_this_handle() adding
      another handle to a transaction and thus overbooking a transaction.
      Make jbd2_journal_extend() use atomic_add_return() to close the race.
      Reviewed-by: 's avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      fe1e8db5
    • Jan Kara's avatar
      jbd2: cleanup needed free block estimates when starting a transaction · 76c39904
      Jan Kara authored
      __jbd2_log_space_left() and jbd_space_needed() were kind of odd.
      jbd_space_needed() accounted also credits needed for currently
      committing transaction while it didn't account for credits needed for
      control blocks.  __jbd2_log_space_left() then accounted for control
      blocks as a fraction of free space.  Since results of these two
      functions are always only compared against each other, this works
      correct but is somewhat strange.  Move the estimates so that
      jbd_space_needed() returns number of blocks needed for a transaction
      including control blocks and __jbd2_log_space_left() returns free
      space in the journal (with the committing transaction already
      subtracted).  Rename functions to jbd2_log_space_left() and
      jbd2_space_needed() while we are changing them.
      Reviewed-by: 's avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      76c39904
    • Jan Kara's avatar
      jbd2: remove outdated comment · 2f387f84
      Jan Kara authored
      The comment about credit estimates isn't true anymore. We do what the
      comment describes now.
      Reviewed-by: 's avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      2f387f84
    • Jan Kara's avatar
      jbd2: refine waiting for shadow buffers · b34090e5
      Jan Kara authored
      Currently when we add a buffer to a transaction, we wait until the
      buffer is removed from BJ_Shadow list (so that we prevent any changes
      to the buffer that is just written to the journal).  This can take
      unnecessarily long as a lot happens between the time the buffer is
      submitted to the journal and the time when we remove the buffer from
      BJ_Shadow list.  (e.g.  We wait for all data buffers in the
      transaction, we issue a cache flush, etc.)  Also this creates a
      dependency of do_get_write_access() on transaction commit (namely
      waiting for data IO to complete) which we want to avoid when
      implementing transaction reservation.
      
      So we modify commit code to set new BH_Shadow flag when temporary
      shadowing buffer is created and we clear that flag once IO on that
      buffer is complete.  This allows do_get_write_access() to wait only
      for BH_Shadow bit and thus removes the dependency on data IO
      completion.
      Reviewed-by: 's avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      b34090e5
    • Jan Kara's avatar
      jbd2: remove journal_head from descriptor buffers · e5a120ae
      Jan Kara authored
      Similarly as for metadata buffers, also log descriptor buffers don't
      really need the journal head. So strip it and remove BJ_LogCtl list.
      Reviewed-by: 's avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      e5a120ae
    • Jan Kara's avatar
      jbd2: don't create journal_head for temporary journal buffers · f5113eff
      Jan Kara authored
      When writing metadata to the journal, we create temporary buffer heads
      for that task.  We also attach journal heads to these buffer heads but
      the only purpose of the journal heads is to keep buffers linked in
      transaction's BJ_IO list.  We remove the need for journal heads by
      reusing buffer_head's b_assoc_buffers list for that purpose.  Also
      since BJ_IO list is just a temporary list for transaction commit, we
      use a private list in jbd2_journal_commit_transaction() for that thus
      removing BJ_IO list from transaction completely.
      Reviewed-by: 's avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      f5113eff
  5. 28 May, 2013 2 commits
  6. 22 May, 2013 1 commit
  7. 29 Apr, 2013 1 commit
  8. 21 Apr, 2013 1 commit
    • Theodore Ts'o's avatar
      jbd2: trace when lock_buffer in do_get_write_access takes a long time · f783f091
      Theodore Ts'o authored
      While investigating interactivity problems it was clear that processes
      sometimes stall for long periods of times if an attempt is made to
      lock a buffer which is undergoing writeback.  It would stall in
      a trace looking something like
      
      [<ffffffff811a39de>] __lock_buffer+0x2e/0x30
      [<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0
      [<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50
      [<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80
      [<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0
      [<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220
      [<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60
      [<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0
      [<ffffffff8118b9b9>] update_time+0x79/0xc0
      [<ffffffff8118ba98>] file_update_time+0x98/0x100
      [<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0
      [<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0
      [<ffffffff811ea853>] ext4_file_write+0x83/0xd0
      [<ffffffff81172b23>] do_sync_write+0xa3/0xe0
      [<ffffffff811731ae>] vfs_write+0xae/0x180
      [<ffffffff8117361d>] sys_write+0x4d/0x90
      [<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f
      [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: 's avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      f783f091
  9. 19 Apr, 2013 1 commit
  10. 09 Apr, 2013 1 commit
    • Al Viro's avatar
      procfs: new helper - PDE_DATA(inode) · d9dda78b
      Al Viro authored
      The only part of proc_dir_entry the code outside of fs/proc
      really cares about is PDE(inode)->data.  Provide a helper
      for that; static inline for now, eventually will be moved
      to fs/proc, along with the knowledge of struct proc_dir_entry
      layout.
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      d9dda78b
  11. 04 Apr, 2013 2 commits
    • Dmitry Monakhov's avatar
      jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback · 794446c6
      Dmitry Monakhov authored
      The following race is possible:
      
      [kjournald2]                              other_task
      jbd2_journal_commit_transaction()
        j_state = T_FINISHED;
        spin_unlock(&journal->j_list_lock);
                                               ->jbd2_journal_remove_checkpoint()
      					   ->jbd2_journal_free_transaction();
      					     ->kmem_cache_free(transaction)
        ->j_commit_callback(journal, transaction);
          -> USE_AFTER_FREE
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      In order to demonstrace this issue one should mount ext4 with mount -o
      discard option on SSD disk.  This makes callback longer and race
      window becomes wider.
      
      In order to fix this we should mark transaction as finished only after
      callbacks have completed
      Signed-off-by: 's avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      794446c6
    • Theodore Ts'o's avatar
      ext4/jbd2: don't wait (forever) for stale tid caused by wraparound · d76a3a77
      Theodore Ts'o authored
      In the case where an inode has a very stale transaction id (tid) in
      i_datasync_tid or i_sync_tid, it's possible that after a very large
      (2**31) number of transactions, that the tid number space might wrap,
      causing tid_geq()'s calculations to fail.
      
      Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
      by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
      attempted to fix this problem, but it only avoided kjournald spinning
      forever by fixing the logic in jbd2_log_start_commit().
      
      Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
      that might call jbd2_log_start_commit() with a stale tid, those
      functions will subsequently call jbd2_log_wait_commit() with the same
      stale tid, and then wait for a very long time.  To fix this, we
      replace the calls to jbd2_log_start_commit() and
      jbd2_log_wait_commit() with a call to a new function,
      jbd2_complete_transaction(), which will correctly handle stale tid's.
      
      As a bonus, jbd2_complete_transaction() will avoid locking
      j_state_lock for writing unless a commit needs to be started.  This
      should have a small (but probably not measurable) improvement for
      ext4's scalability.
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: 's avatarBen Hutchings <ben@decadent.org.uk>
      Reported-by: 's avatarGeorge Barnett <gbarnett@atlassian.com>
      Cc: stable@vger.kernel.org
      
      d76a3a77
  12. 11 Mar, 2013 1 commit
    • Jan Kara's avatar
      jbd2: fix use after free in jbd2_journal_dirty_metadata() · ad56edad
      Jan Kara authored
      jbd2_journal_dirty_metadata() didn't get a reference to journal_head it
      was working with. This is OK in most of the cases since the journal head
      should be attached to a transaction but in rare occasions when we are
      journalling data, __ext4_journalled_writepage() can race with
      jbd2_journal_invalidatepage() stripping buffers from a page and thus
      journal head can be freed under hands of jbd2_journal_dirty_metadata().
      
      Fix the problem by getting own journal head reference in
      jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers()
      which can possibly have the same issue).
      Reported-by: 's avatarZheng Liu <gnehzuil.liu@gmail.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ad56edad
  13. 02 Mar, 2013 1 commit
    • Dmitry Monakhov's avatar
      jbd2: fix ERR_PTR dereference in jbd2__journal_start · df05c1b8
      Dmitry Monakhov authored
      If start_this_handle() failed handle will be initialized
      to ERR_PTR() and can not be dereferenced.
      
      paging request at fffffffffffffff6
      IP: [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290
      PGD 200e067 PUD 200f067 PMD 0
      Oops: 0000 [#1] SMP
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      CPU 0 journal commit I/O error
      
      Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79                  /DQ67SW
      RIP: 0010:[<ffffffff813c073f>]  [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290
      RSP: 0018:ffff880233b8ba58  EFLAGS: 00010292
      RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48
      RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0
      Signed-off-by: 's avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      df05c1b8
  14. 09 Feb, 2013 1 commit
    • Theodore Ts'o's avatar
      jbd2: use module parameters instead of debugfs for jbd_debug · b6e96d00
      Theodore Ts'o authored
      There are multiple reasons to move away from debugfs.  First of all,
      we are only using it for a single parameter, and it is much more
      complicated to set up (some 30 lines of code compared to 3), and one
      more thing that might fail while loading the jbd2 module.
      
      Secondly, as a module paramter it can be specified as a boot option if
      jbd2 is built into the kernel, or as a parameter when the module is
      loaded, and it can also be manipulated dynamically under
      /sys/module/jbd2/parameters/jbd2_debug.  So it is more flexible.
      
      Ultimately we want to move away from using jbd_debug() towards
      tracepoints, but for now this is still a useful simplification of the
      code base.
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      b6e96d00
  15. 08 Feb, 2013 1 commit
  16. 07 Feb, 2013 1 commit
    • Theodore Ts'o's avatar
      jbd2: track request delay statistics · 9fff24aa
      Theodore Ts'o authored
      Track the delay between when we first request that the commit begin
      and when it actually begins, so we can see how much of a gap exists.
      In theory, this should just be the remaining scheduling quantuum of
      the thread which requested the commit (assuming it was not a
      synchronous operation which triggered the commit request) plus
      scheduling overhead; however, it's possible that real time processes
      might get in the way of letting the kjournald thread from executing.
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      9fff24aa
  17. 30 Jan, 2013 1 commit
    • Eric Sandeen's avatar
      jbd2: don't wake kjournald unnecessarily · e7b04ac0
      Eric Sandeen authored
      Don't send an extra wakeup to kjournald in the case where we
      already have the proper target in j_commit_request, i.e. that
      transaction has already been requested for commit.
      
      commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed
      the logic leading to a wakeup, but it caused some extra wakeups
      which were found to lead to a measurable performance regression.
      Signed-off-by: 's avatarEric Sandeen <sandeen@redhat.com>
      [tytso@mit.edu: reworked check to make it clearer]
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      e7b04ac0
  18. 25 Dec, 2012 1 commit
    • Jan Kara's avatar
      ext4: fix deadlock in journal_unmap_buffer() · 53e87268
      Jan Kara authored
      We cannot wait for transaction commit in journal_unmap_buffer()
      because we hold page lock which ranks below transaction start.  We
      solve the issue by bailing out of journal_unmap_buffer() and
      jbd2_journal_invalidatepage() with -EBUSY.  Caller is then responsible
      for waiting for transaction commit to finish and try invalidation
      again. Since the issue can happen only for page stradding i_size, it
      is simple enough to manually call jbd2_journal_invalidatepage() for
      such page from ext4_setattr(), check the return value and wait if
      necessary.
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      53e87268
  19. 21 Dec, 2012 1 commit
    • Jan Kara's avatar
      jbd2: fix assertion failure in jbd2_journal_flush() · d7961c7f
      Jan Kara authored
      The following race is possible between start_this_handle() and someone
      calling jbd2_journal_flush().
      
      Process A                              Process B
      start_this_handle().
        if (journal->j_barrier_count) # false
        if (!journal->j_running_transaction) { #true
          read_unlock(&journal->j_state_lock);
                                             jbd2_journal_lock_updates()
                                             jbd2_journal_flush()
                                               write_lock(&journal->j_state_lock);
                                               if (journal->j_running_transaction) {
                                                 # false
                                               ... wait for committing trans ...
                                               write_unlock(&journal->j_state_lock);
          ...
          write_lock(&journal->j_state_lock);
          if (!journal->j_running_transaction) { # true
            jbd2_get_transaction(journal, new_transaction);
          write_unlock(&journal->j_state_lock);
          goto repeat; # eventually blocks on j_barrier_count > 0
                                               ...
                                               J_ASSERT(!journal->j_running_transaction);
                                                 # fails
      
      We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
      in exclusive mode.
      
      Reported-by: yjwsignal@empal.com
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d7961c7f
  20. 19 Nov, 2012 1 commit
  21. 08 Nov, 2012 1 commit
  22. 27 Sep, 2012 1 commit
    • Jan Kara's avatar
      jbd2: fix assertion failure in commit code due to lacking transaction credits · b794e7a6
      Jan Kara authored
      ext4 users of data=journal mode with blocksize < pagesize were
      occasionally hitting assertion failure in
      jbd2_journal_commit_transaction() checking whether the transaction has
      at least as many credits reserved as buffers attached.  The core of the
      problem is that when a file gets truncated, buffers that still need
      checkpointing or that are attached to the committing transaction are
      left with buffer_mapped set. When this happens to buffers beyond i_size
      attached to a page stradding i_size, subsequent write extending the file
      will see these buffers and as they are mapped (but underlying blocks
      were freed) things go awry from here.
      
      The assertion failure just coincidentally (and in this case luckily as
      we would start corrupting filesystem) triggers due to journal_head not
      being properly cleaned up as well.
      
      We fix the problem by unmapping buffers if possible (in lots of cases we
      just need a buffer attached to a transaction as a place holder but it
      must not be written out anyway).  And in one case, we just have to bite
      the bullet and wait for transaction commit to finish.
      
      CC: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      b794e7a6
  23. 19 Aug, 2012 1 commit
    • Eric Sandeen's avatar
      jbd2: don't write superblock when if its empty · eeecef0a
      Eric Sandeen authored
      This sequence:
      
      # truncate --size=1g fsfile
      # mkfs.ext4 -F fsfile
      # mount -o loop,ro fsfile /mnt
      # umount /mnt
      # dmesg | tail
      
      results in an IO error when unmounting the RO filesystem:
      
      [  318.020828] Buffer I/O error on device loop1, logical block 196608
      [  318.027024] lost page write due to I/O error on loop1
      [  318.032088] JBD2: Error -5 detected when updating journal superblock for loop1-8.
      
      This was a regression introduced by commit 24bcc89c: "jbd2: split
      updating of journal superblock and marking journal empty".
      Signed-off-by: 's avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: 's avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      eeecef0a
  24. 17 Aug, 2012 1 commit