1. 09 Aug, 2013 2 commits
    • Jaegeuk Kim's avatar
      f2fs: fix inconsistency between xattr node blocks and its inode · e518ff81
      Jaegeuk Kim authored
      
      
      Previously xattr node blocks are stored to the COLD_NODE log, which means that
      our roll-forward mechanism doesn't recover the xattr node blocks at all.
      Only the direct node blocks in the WARM_NODE log can be recovered.
      
      So, let's resolve the issue simply by conducting checkpoint during fsync when a
      file has a modified xattr node block.
      
      This approach is able to degrade the performance, but normally the checkpoint
      overhead is shown at the initial fsync call after the xattr entry changes.
      Once the checkpoint is done, no additional overhead would be occurred.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      e518ff81
    • Jaegeuk Kim's avatar
      f2fs: fix the use of XATTR_NODE_OFFSET · dbe6a5ff
      Jaegeuk Kim authored
      
      
      This patch fixes the use of XATTR_NODE_OFFSET.
      
      o The offset should not use several MSB bits which are used by marking node
      blocks.
      
      o IS_DNODE should handle XATTR_NODE_OFFSET to avoid potential abnormality
      during the fsync call.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      dbe6a5ff
  2. 08 Aug, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: fix a build failure due to missing the kobject header · c2d715d1
      Jaegeuk Kim authored
      
      
      This patch should resolve the following error reported by kbuild test robot.
      
      All error/warnings:
      
         In file included from fs/f2fs/dir.c:13:0:
         >> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type
              struct kobject s_kobj;
      
      The failure was caused by missing the kobject header file in dir.c.
      So, this patch move the header file to the right location, f2fs.h.
      
      CC: Namjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      c2d715d1
  3. 06 Aug, 2013 2 commits
    • Jin Xu's avatar
      f2fs: fix a deadlock in fsync · a569469e
      Jin Xu authored
      
      
      This patch fixes a deadlock bug that occurs quite often when there are
      concurrent write and fsync on a same file.
      
      Following is the simplified call trace when tasks get hung.
      
      fsync thread:
      - f2fs_sync_file
       ...
       - f2fs_write_data_pages
       ...
        - update_extent_cache
        ...
         - update_inode
          - wait_on_page_writeback
      
      bdi writeback thread
      - __writeback_single_inode
       - f2fs_write_data_pages
        - mutex_lock(sbi->writepages)
      
      The deadlock happens when the fsync thread waits on a inode page that has
      been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
      no one else could be able to submit the cached bio to block layer for
      writeback. This is because the fsync thread already hold a sbi->fs_lock and
      the sbi->writepages lock, causing the bdi thread being blocked when attempt
      to write data pages for the same inode. At the same time, f2fs_gc thread
      does not notice the situation and could not help. Even the sync syscall
      gets blocked.
      
      To fix it, we could submit the cached bio first before waiting on a inode page
      that is being written back.
      Signed-off-by: default avatarJin Xu <jinuxstyle@gmail.com>
      [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      a569469e
    • Namjae Jeon's avatar
      f2fs: add sysfs support for controlling the gc_thread · b59d0bae
      Namjae Jeon authored
      
      
      Add sysfs entries to control the timing parameters for
      f2fs gc thread.
      
      Various Sysfs options introduced are:
      gc_min_sleep_time: Min Sleep time for GC in ms
      gc_max_sleep_time: Max Sleep time for GC in ms
      gc_no_gc_sleep_time: Default Sleep time for GC in ms
      
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarPankaj Kumar <pankaj.km@samsung.com>
      Reviewed-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: fix an umount bug and some minor changes]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      b59d0bae
  4. 30 Jul, 2013 5 commits
  5. 01 Jul, 2013 1 commit
  6. 14 Jun, 2013 3 commits
  7. 11 Jun, 2013 3 commits
    • Jaegeuk Kim's avatar
      f2fs: sync dir->i_size with its block allocation · 699489bb
      Jaegeuk Kim authored
      
      
      If new dentry block is allocated and its i_size is updated, we should update
      its inode block together in order to sync i_size and its block allocation.
      Otherwise, we can loose additional dentry block due to the unconsistent i_size.
      
      Errorneous Scenario
      -------------------
      
      In the recovery routine,
       - recovery_dentry
       | - __f2fs_add_link
       | | - get_new_data_page
       | | | - i_size_write(new_i_size)
       | | | - mark_inode_dirty_sync(dir)
       | | - update_parent_metadata
       | | | - mark_inode_dirty(dir)
       |
       - write_checkpoint
         - sync_dirty_dir_inodes
           - filemap_flush(dentry_blocks)
             - f2fs_write_data_page
               - skip to write the last dentry block due to index < i_size
      
      In the above flow, new_i_size is not updated to its inode block so that the
      last dentry block will be lost accordingly.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      699489bb
    • Jaegeuk Kim's avatar
      f2fs: fix i_blocks translation on various types of files · 2d4d9fb5
      Jaegeuk Kim authored
      
      
      Basically an inode manages the number of allocated blocks with inode->i_blocks
      which is represented in a unit of sectors, not file system blocks.
      But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr
      translates it to the number of sectors when fstat is called.
      
      However, previously f2fs_file_inode_operations only has this, so this patch adds
      it to all the types of inode_operations.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      2d4d9fb5
    • Jaegeuk Kim's avatar
      f2fs: support xattr security labels · 8ae8f162
      Jaegeuk Kim authored
      This patch adds the support of security labels for f2fs, which will be used
      by Linus Security Models (LSMs).
      
      Quote from http://en.wikipedia.org/wiki/Linux_Security_Modules
      
      :
      "Linux Security Modules (LSM) is a framework that allows the Linux kernel to
      support a variety of computer security models while avoiding favoritism toward
      any single security implementation. The framework is licensed under the terms of
      the GNU General Public License and is standard part of the Linux kernel since
      Linux 2.6. AppArmor, SELinux, Smack and TOMOYO Linux are the currently accepted
      modules in the official kernel.".
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      8ae8f162
  8. 07 Jun, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: fix iget/iput of dir during recovery · 5deb8267
      Jaegeuk Kim authored
      
      
      It is possible that iput is skipped after iget during the recovery.
      
      In recover_dentry(),
       dir = f2fs_iget();
       ...
       if (de && inode->i_ino == le32_to_cpu(de->ino))
      	goto out;
      
      In this case, this dir is not able to be added in dirty_dir_inode_list.
      The actual linking is done only when set_page_dirty() is called.
      
      So let's add this newly got inode into the list explicitly, and put it at the
      end of the recovery routine.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      5deb8267
  9. 28 May, 2013 8 commits
    • Namjae Jeon's avatar
      f2fs: push some variables to debug part · 35b09d82
      Namjae Jeon authored
      
      
      Some, counters are needed only for the statistical information
      while debugging.
      So, those can be controlled using CONFIG_F2FS_STAT_FS,
      pushing the usage for few variables under this flag.
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarAmit Sahrawat <a.sahrawat@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      35b09d82
    • Jaegeuk Kim's avatar
      f2fs: align data types between on-disk and in-memory block addresses · a9841c4d
      Jaegeuk Kim authored
      
      
      The on-disk block address is defined as __le32, but in-memory block address,
      block_t, does as u64.
      
      Let's synchronize them to 32 bits.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      a9841c4d
    • Jaegeuk Kim's avatar
      f2fs: reuse the locked dnode page and its inode · b292dcab
      Jaegeuk Kim authored
      
      
      This patch fixes the following deadlock bug during the recovery.
      
      INFO: task mount:1322 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mount           D ffffffff81125870     0  1322   1266 0x00000000
       ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
       ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
       ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
      Call Trace:
      [<ffffffff81125870>] ? __lock_page+0x70/0x70
      [<ffffffff816a92d9>] schedule+0x29/0x70
      [<ffffffff816a93af>] io_schedule+0x8f/0xd0
      [<ffffffff8112587e>] sleep_on_page+0xe/0x20
      [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
      [<ffffffff81125867>] __lock_page+0x67/0x70
      [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
      [<ffffffff81126857>] find_lock_page+0x67/0x80
      [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
      [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
      [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
      [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
      [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
      [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
      [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
      [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
      [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
      [<ffffffff81185a13>] mount_fs+0x43/0x1b0
      [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
      [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
      [<ffffffff811a2cb7>] do_mount+0x237/0xa10
      [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
      [<ffffffff811a3520>] SyS_mount+0x90/0xe0
      [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b
      
      The bug is triggered when check_index_in_prev_nodes tries to get the direct
      node page by calling get_node_page.
      At this point, if the direct node page is already locked by get_dnode_of_data,
      its caller, we got a deadlock condition.
      
      This patch adds additional condition check for the reuse of locked direct node
      pages prior to the get_node_page call.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      b292dcab
    • Jaegeuk Kim's avatar
      f2fs: add f2fs_readonly() · 77888c1e
      Jaegeuk Kim authored
      
      
      Introduce a simple macro function for readability.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      77888c1e
    • Peter Zijlstra's avatar
      f2fs, lockdep: annotate mutex_lock_all() · bfe35965
      Peter Zijlstra authored
      
      
      Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all()
      acquires an array of locks (in global/local lock style).
      
      Any such operation is always serialized using cp_mutex, therefore there is no
      fs_lock[] lock-order issue; tell lockdep about this using the
      mutex_lock_nest_lock() primitive.
      Reported-by: default avatarmajianpeng <majianpeng@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      bfe35965
    • Jaegeuk Kim's avatar
      f2fs: update inode page after creation · 44a83ff6
      Jaegeuk Kim authored
      
      
      I found a bug when testing power-off-recovery as follows.
      
      [Bug Scenario]
      1. create a file
      2. fsync the file
      3. reboot w/o any sync
      4. try to recover the file
       - found its fsync mark
       - found its dentry mark
         : try to recover its dentry
          - get its file name
          - get its parent inode number
           : here we got zero value
      
      The reason why we get the wrong parent inode number is that we didn't
      synchronize the inode page with its newly created inode information perfectly.
      
      Especially, previous f2fs stores fi->i_pino and writes it to the cached
      node page in a wrong order, which incurs the zero-valued i_pino during the
      recovery.
      
      So, this patch modifies the creation flow to fix the synchronization order of
      inode page with its inode.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      44a83ff6
    • Jaegeuk Kim's avatar
      f2fs: change get_new_data_page to pass a locked node page · 64aa7ed9
      Jaegeuk Kim authored
      
      
      This patch is for passing a locked node page to get_dnode_of_data.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      64aa7ed9
    • Jaegeuk Kim's avatar
      f2fs: fix BUG_ON during f2fs_evict_inode(dir) · 74d0b917
      Jaegeuk Kim authored
      
      
      During the dentry recovery routine, recover_inode() triggers __f2fs_add_link
      with its directory inode.
      
      In the following scenario, a bug is captured.
       1. dir = f2fs_iget(pino)
       2. __f2fs_add_link(dir, name)
       3. iput(dir)
        -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents))
      
      Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable]
      [<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs]
      Call Trace:
       [<ffffffff8118ea00>] evict+0xb0/0x1b0
       [<ffffffff8118f1c5>] iput+0x105/0x190
       [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs]
       [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0
       [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0
       [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70
       [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140
       [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0
       [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30
       [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70
      
      This means that we should flush all the dentry pages between iget and iput().
      But, during the recovery routine, it is unallowed due to consistency, so we
      have to wait the whole recovery process.
      And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we
      can put the stale dir inodes from the dirty_dir_inode_list.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      74d0b917
  10. 29 Apr, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: enhance alloc_nid and build_free_nids flows · 55008d84
      Jaegeuk Kim authored
      
      
      In order to avoid build_free_nid lock contention, let's change the order of
      function calls as follows.
      
      At first, check whether there is enough free nids.
       - If available, just get a free nid with spin_lock without any overhead.
       - Otherwise, conduct build_free_nids.
        : scan nat pages, journal nat entries, and nat cache entries.
      
      We should consider carefullly not to serve free nids intermediately made by
      build_free_nids.
      We can get stable free nids only after build_free_nids is done.
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      55008d84
  11. 26 Apr, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: give a chance to merge IOs by IO scheduler · c718379b
      Jaegeuk Kim authored
      
      
      Previously, background GC submits many 4KB read requests to load victim blocks
      and/or its (i)node blocks.
      
      ...
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
      f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
      f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
      f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
      ...
      
      However, by the fact that many IOs are sequential, we can give a chance to merge
      the IOs by IO scheduler.
      In order to do that, let's use blk_plug.
      
      ...
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
      <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
      <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
      <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
      <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
      <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
      <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
      <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
      ...
      
      Note that this issue should be addressed in checkpoint, and some readahead
      flows too.
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      c718379b
  12. 09 Apr, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: introduce a new global lock scheme · 39936837
      Jaegeuk Kim authored
      
      
      In the previous version, f2fs uses global locks according to the usage types,
      such as directory operations, block allocation, block write, and so on.
      
      Reference the following lock types in f2fs.h.
      enum lock_type {
      	RENAME,		/* for renaming operations */
      	DENTRY_OPS,	/* for directory operations */
      	DATA_WRITE,	/* for data write */
      	DATA_NEW,	/* for data allocation */
      	DATA_TRUNC,	/* for data truncate */
      	NODE_NEW,	/* for node allocation */
      	NODE_TRUNC,	/* for node truncate */
      	NODE_WRITE,	/* for node write */
      	NR_LOCK_TYPE,
      };
      
      In that case, we lose the performance under the multi-threading environment,
      since every types of operations must be conducted one at a time.
      
      In order to address the problem, let's share the locks globally with a mutex
      array regardless of any types.
      So, let users grab a mutex and perform their jobs in parallel as much as
      possbile.
      
      For this, I propose a new global lock scheme as follows.
      
      0. Data structure
       - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
       - f2fs_sb_info -> node_write
      
      1. mutex_lock_op(sbi)
       - try to get an avaiable lock from the array.
       - returns the index of the gottern lock variable.
      
      2. mutex_unlock_op(sbi, index of the lock)
       - unlock the given index of the lock.
      
      3. mutex_lock_all(sbi)
       - grab all the locks in the array before the checkpoint.
      
      4. mutex_unlock_all(sbi)
       - release all the locks in the array after checkpoint.
      
      5. block_operations()
       - call mutex_lock_all()
       - sync_dirty_dir_inodes()
       - grab node_write
       - sync_node_pages()
      
      Note that,
       the pairs of mutex_lock_op()/mutex_unlock_op() and
       mutex_lock_all()/mutex_unlock_all() should be used together.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      39936837
  13. 03 Apr, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: change GC bitmaps to apply the section granularity · 5ec4e49f
      Jaegeuk Kim authored
      
      
      This patch removes a bitmap for victim segments selected by foreground GC, and
      modifies the other bitmap for victim segments selected by background GC.
      
      1) foreground GC bitmap
       : We don't need to manage this, since we just only one previous victim section
         number instead of the whole victim history.
         The f2fs uses the victim section number in order not to allocate currently
         GC'ed section to current active logs.
      
      2) background GC bitmap
       : This bitmap is used to avoid selecting victims repeatedly by background GCs.
         In addition, the victims are able to be selected by foreground GCs, since
         there is no need to read victim blocks during foreground GCs.
      
         By the fact that the foreground GC reclaims segments in a section unit, it'd
         be better to manage this bitmap based on the section granularity.
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      5ec4e49f
  14. 27 Mar, 2013 3 commits
  15. 20 Mar, 2013 2 commits
  16. 19 Mar, 2013 1 commit
  17. 18 Mar, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: introduce readahead mode of node pages · 266e97a8
      Jaegeuk Kim authored
      
      
      Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
      with RDONLY_NODE flag.
      And, this flag is set by the following functions.
      - get_data_block_ro
      - get_lock_data_page
      - do_write_data_page
      - truncate_blocks
      - truncate_hole
      
      However, this readahead mechanism is initially introduced for the use of
      get_data_block_ro to enhance the sequential read performance.
      
      So, let's clarify all the cases with the additional modes as follows.
      
      enum {
      	ALLOC_NODE,	/* allocate a new node page if needed */
      	LOOKUP_NODE,	/* look up a node without readahead */
      	LOOKUP_NODE_RA,	/*
      			 * look up a node with readahead called
      			 * by get_datablock_ro.
      			 */
      }
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      266e97a8
  18. 11 Feb, 2013 3 commits