1. 28 May, 2013 5 commits
    • Jaegeuk Kim's avatar
      f2fs: add f2fs_readonly() · 77888c1e
      Jaegeuk Kim authored
      Introduce a simple macro function for readability.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Peter Zijlstra's avatar
      f2fs, lockdep: annotate mutex_lock_all() · bfe35965
      Peter Zijlstra authored
      Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all()
      acquires an array of locks (in global/local lock style).
      Any such operation is always serialized using cp_mutex, therefore there is no
      fs_lock[] lock-order issue; tell lockdep about this using the
      mutex_lock_nest_lock() primitive.
      Reported-by: default avatarmajianpeng <majianpeng@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: update inode page after creation · 44a83ff6
      Jaegeuk Kim authored
      I found a bug when testing power-off-recovery as follows.
      [Bug Scenario]
      1. create a file
      2. fsync the file
      3. reboot w/o any sync
      4. try to recover the file
       - found its fsync mark
       - found its dentry mark
         : try to recover its dentry
          - get its file name
          - get its parent inode number
           : here we got zero value
      The reason why we get the wrong parent inode number is that we didn't
      synchronize the inode page with its newly created inode information perfectly.
      Especially, previous f2fs stores fi->i_pino and writes it to the cached
      node page in a wrong order, which incurs the zero-valued i_pino during the
      So, this patch modifies the creation flow to fix the synchronization order of
      inode page with its inode.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: change get_new_data_page to pass a locked node page · 64aa7ed9
      Jaegeuk Kim authored
      This patch is for passing a locked node page to get_dnode_of_data.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: fix BUG_ON during f2fs_evict_inode(dir) · 74d0b917
      Jaegeuk Kim authored
      During the dentry recovery routine, recover_inode() triggers __f2fs_add_link
      with its directory inode.
      In the following scenario, a bug is captured.
       1. dir = f2fs_iget(pino)
       2. __f2fs_add_link(dir, name)
       3. iput(dir)
        -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents))
      Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable]
      [<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs]
      Call Trace:
       [<ffffffff8118ea00>] evict+0xb0/0x1b0
       [<ffffffff8118f1c5>] iput+0x105/0x190
       [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs]
       [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0
       [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0
       [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70
       [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140
       [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0
       [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30
       [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70
      This means that we should flush all the dentry pages between iget and iput().
      But, during the recovery routine, it is unallowed due to consistency, so we
      have to wait the whole recovery process.
      And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we
      can put the stale dir inodes from the dirty_dir_inode_list.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
  2. 29 Apr, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: enhance alloc_nid and build_free_nids flows · 55008d84
      Jaegeuk Kim authored
      In order to avoid build_free_nid lock contention, let's change the order of
      function calls as follows.
      At first, check whether there is enough free nids.
       - If available, just get a free nid with spin_lock without any overhead.
       - Otherwise, conduct build_free_nids.
        : scan nat pages, journal nat entries, and nat cache entries.
      We should consider carefullly not to serve free nids intermediately made by
      We can get stable free nids only after build_free_nids is done.
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
  3. 26 Apr, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: give a chance to merge IOs by IO scheduler · c718379b
      Jaegeuk Kim authored
      Previously, background GC submits many 4KB read requests to load victim blocks
      and/or its (i)node blocks.
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
      f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
      f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
      f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
      However, by the fact that many IOs are sequential, we can give a chance to merge
      the IOs by IO scheduler.
      In order to do that, let's use blk_plug.
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
      <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
      <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
      <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
      <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
      <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
      <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
      <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
      Note that this issue should be addressed in checkpoint, and some readahead
      flows too.
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
  4. 09 Apr, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: introduce a new global lock scheme · 39936837
      Jaegeuk Kim authored
      In the previous version, f2fs uses global locks according to the usage types,
      such as directory operations, block allocation, block write, and so on.
      Reference the following lock types in f2fs.h.
      enum lock_type {
      	RENAME,		/* for renaming operations */
      	DENTRY_OPS,	/* for directory operations */
      	DATA_WRITE,	/* for data write */
      	DATA_NEW,	/* for data allocation */
      	DATA_TRUNC,	/* for data truncate */
      	NODE_NEW,	/* for node allocation */
      	NODE_TRUNC,	/* for node truncate */
      	NODE_WRITE,	/* for node write */
      In that case, we lose the performance under the multi-threading environment,
      since every types of operations must be conducted one at a time.
      In order to address the problem, let's share the locks globally with a mutex
      array regardless of any types.
      So, let users grab a mutex and perform their jobs in parallel as much as
      For this, I propose a new global lock scheme as follows.
      0. Data structure
       - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
       - f2fs_sb_info -> node_write
      1. mutex_lock_op(sbi)
       - try to get an avaiable lock from the array.
       - returns the index of the gottern lock variable.
      2. mutex_unlock_op(sbi, index of the lock)
       - unlock the given index of the lock.
      3. mutex_lock_all(sbi)
       - grab all the locks in the array before the checkpoint.
      4. mutex_unlock_all(sbi)
       - release all the locks in the array after checkpoint.
      5. block_operations()
       - call mutex_lock_all()
       - sync_dirty_dir_inodes()
       - grab node_write
       - sync_node_pages()
      Note that,
       the pairs of mutex_lock_op()/mutex_unlock_op() and
       mutex_lock_all()/mutex_unlock_all() should be used together.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
  5. 03 Apr, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: change GC bitmaps to apply the section granularity · 5ec4e49f
      Jaegeuk Kim authored
      This patch removes a bitmap for victim segments selected by foreground GC, and
      modifies the other bitmap for victim segments selected by background GC.
      1) foreground GC bitmap
       : We don't need to manage this, since we just only one previous victim section
         number instead of the whole victim history.
         The f2fs uses the victim section number in order not to allocate currently
         GC'ed section to current active logs.
      2) background GC bitmap
       : This bitmap is used to avoid selecting victims repeatedly by background GCs.
         In addition, the victims are able to be selected by foreground GCs, since
         there is no need to read victim blocks during foreground GCs.
         By the fact that the foreground GC reclaims segments in a section unit, it'd
         be better to manage this bitmap based on the section granularity.
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
  6. 27 Mar, 2013 3 commits
  7. 20 Mar, 2013 2 commits
  8. 19 Mar, 2013 1 commit
  9. 18 Mar, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: introduce readahead mode of node pages · 266e97a8
      Jaegeuk Kim authored
      Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
      with RDONLY_NODE flag.
      And, this flag is set by the following functions.
      - get_data_block_ro
      - get_lock_data_page
      - do_write_data_page
      - truncate_blocks
      - truncate_hole
      However, this readahead mechanism is initially introduced for the use of
      get_data_block_ro to enhance the sequential read performance.
      So, let's clarify all the cases with the additional modes as follows.
      enum {
      	ALLOC_NODE,	/* allocate a new node page if needed */
      	LOOKUP_NODE,	/* look up a node without readahead */
      	LOOKUP_NODE_RA,	/*
      			 * look up a node with readahead called
      			 * by get_datablock_ro.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
  10. 11 Feb, 2013 6 commits
    • Namjae Jeon's avatar
      f2fs: add compat_ioctl to provide backward compatability · e9750824
      Namjae Jeon authored
      adding compat_ioctl to provide support for backward comptability - 32bit binary
      execution on 64bit kernel.
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarAmit Sahrawat <a.sahrawat@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: clarify and enhance the f2fs_gc flow · 43727527
      Jaegeuk Kim authored
      This patch makes clearer the ambiguous f2fs_gc flow as follows.
      1. Remove intermediate checkpoint condition during f2fs_gc
       (i.e., should_do_checkpoint() and GC_BLOCKED)
      2. Remove unnecessary return values of f2fs_gc because of #1.
       (i.e., GC_NODE, GC_OK, etc)
      3. Simplify write_checkpoint() because of #2
      4. Clarify the main f2fs_gc flow.
       o monitor how many freed sections during one iteration of do_garbage_collect().
       o do GC more without checkpoints if we can't get enough free sections.
       o do checkpoint once we've got enough free sections through forground GCs.
      5. Adopt thread-logging (Slack-Space-Recycle) scheme more aggressively on data
        log types. See. get_ssr_segement()
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Namjae Jeon's avatar
      f2fs: make an accessor to get sections for particular block type · 5ac206cf
      Namjae Jeon authored
      Introduce accessor to get the sections based upon the block type
      (node,dents...) and modify the functions : should_do_checkpoint,
      has_not_enough_free_secs to use this accessor function to get
      the node sections and dent sections.
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarAmit Sahrawat <a.sahrawat@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: avoid balanc_fs during evict_inode · d4686d56
      Jaegeuk Kim authored
      1. Background
      Previously, if f2fs tries to move data blocks of an *evicting* inode during the
      cleaning process, it stops the process incompletely and then restarts the whole
      process, since it needs a locked inode to grab victim data pages in its address
      space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally
      used, but, it waits if the inode is on freeing.
      So, here is a deadlock scenario.
      1. f2fs_evict_inode()       <- inode "A"
        2. f2fs_balance_fs()
          3. f2fs_gc()
            4. gc_data_segment()
              5. f2fs_iget()      <- inode "A" too!
      If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since
      the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which
      skips __wait_on_freeing_inode() was introduced in step #5
      , and stops f2fs_gc()
      to complete f2fs_evict_inode().
      1. f2fs_evict_inode()           <- inode "A"
        2. f2fs_balance_fs()
          3. f2fs_gc()
            4. gc_data_segment()
              5. f2fs_iget_nowait()   <- inode "A", then stop f2fs_gc() w/ -ENOENT
      2. Problem and Solution
      In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if:
       o there are not enough free sections, and
       o f2fs_gc() tries to move data blocks of the *evicting* inode repeatedly.
      So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in
      The f2fs_evict_inode() actually truncates all the data and node blocks, which
      means that it doesn't produce any dirty node pages accordingly.
      So, we don't need to do f2fs_balance_fs() in practical.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Namjae Jeon's avatar
      f2fs: fix typo mistake for data_version description · 324ddc70
      Namjae Jeon authored
      In f2fs_inode_info structure, the description for data_version
      has a typo mistake. It should be latest instead of lastes.
      So, correcting that.
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarAmit Sahrawat <a.sahrawat@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: prevent checkpoint once any IO failure is detected · 577e3495
      Jaegeuk Kim authored
      This patch enhances the checkpoint routine to cope with IO errors.
      Basically f2fs detects IO errors from end_io_write, and the errors are able to
      be occurred during one of data, node, and meta page writes.
      In the previous code, when an IO error is occurred during writes, f2fs sets a
      flag, CP_ERROR_FLAG, in the raw ckeckpoint buffer which will be written to disk.
      Afterwards, write_checkpoint() will check the flag and remount f2fs as a
      read-only (ro) mode.
      However, even once f2fs is remounted as a ro mode, dirty checkpoint pages are
      freely able to be written to disk by flusher or kswapd in background.
      In such a case, after cold reboot, f2fs would restore the checkpoint data having
      CP_ERROR_FLAG, resulting in disabling write_checkpoint and remounting f2fs as
      a ro mode again.
      Therefore, let's prevent any checkpoint page (meta) writes once an IO error is
      occurred, and remount f2fs as a ro mode right away at that moment.
      Reported-by: default avatarOliver Winker <oliver@oli1170.net>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
  11. 08 Feb, 2013 3 commits
  12. 22 Jan, 2013 1 commit
  13. 15 Jan, 2013 1 commit
  14. 09 Jan, 2013 1 commit
    • Jaegeuk Kim's avatar
      f2fs: revisit the f2fs_gc flow · 408e9375
      Jaegeuk Kim authored
      I'd like to revisit the f2fs_gc flow and rewrite as follows.
      1. In practical, the nGC parameter of f2fs_gc is meaningless. So, let's
        remove it.
      2. Background GC marks victim blocks as dirty one at a time.
      3. Foreground GC should do cleaning job until acquiring enough free
        sections. Afterwards, it needs to do checkpoint.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
  15. 04 Jan, 2013 2 commits
  16. 28 Dec, 2012 1 commit
  17. 11 Dec, 2012 6 commits
    • Jaegeuk Kim's avatar
      f2fs: fix tracking parent inode number · 6666e6aa
      Jaegeuk Kim authored
      Previously, f2fs didn't track the parent inode number correctly which is stored
      in each f2fs_inode. In the case of the following scenario, a bug can be occured.
      Let's suppose there are one directory, "/b", and two files, "/a" and "/b/a".
       - pino of "/a" is ROOT_INO.
       - pino of "/b/a" is DIR_B_INO.
       # sync
        : The inode pages of "/a" and "/b/a" contain the parent inode numbers as
          ROOT_INO and DIR_B_INO respectively.
       # mv /a /b/a
        : The parent inode number of "/a" should be changed to DIR_B_INO, but f2fs
          didn't do that. Ref. f2fs_set_link().
      In order to fix this clearly, I added i_pino in f2fs_inode_info, and whenever
      it needs to be changed like in f2fs_add_link() and f2fs_set_link(), it is
      updated temporarily in f2fs_inode_info.
      And later, f2fs_write_inode() stores the latest information to the inode pages.
      For power-off-recovery, f2fs_sync_file() triggers simply f2fs_write_inode().
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: cleanup the f2fs_bio_alloc routine · 3cd8a239
      Jaegeuk Kim authored
      Do cleanup more for better code readability.
      - Change the parameter set of f2fs_bio_alloc()
        This function should allocate a bio only since it is not something like
        f2fs_bio_init(). Instead, the caller should initialize the allocated bio.
      - Introduce SECTOR_FROM_BLOCK
        This macro translates a block address to its sector address.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: adjust kernel coding style · 0a8165d7
      Jaegeuk Kim authored
      As pointed out by Randy Dunlap, this patch removes all usage of "/**" for comment
      blocks. Instead, just use "/*".
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Jaegeuk Kim's avatar
      f2fs: fix endian conversion bugs reported by sparse · 25ca923b
      Jaegeuk Kim authored
      This patch should resolve the bugs reported by the sparse tool.
      Initial reports were written by "kbuild test robot" managed by fengguang.wu.
      In my local machines, I've tested also by running:
      > make C=2 CF="-D__CHECK_ENDIAN__"
      Accordingly, I've found lots of warnings and bugs related to the endian
      conversion. And I've fixed all at this moment.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    • Sachin Kamat's avatar
      f2fs: remove unneeded version.h header file from f2fs.h · cf0e3a64
      Sachin Kamat authored
      Including <linux/version.h> is not necessary.
      Signed-off-by: default avatarSachin Kamat <sachin.kamat@linaro.org>
    • Jaegeuk Kim's avatar
      f2fs: add superblock and major in-memory structure · 39a53e0c
      Jaegeuk Kim authored
      This adds the following major in-memory structures in f2fs.
      - f2fs_sb_info:
        contains f2fs-specific information, two special inode pointers for node and
        meta address spaces, and orphan inode management.
      - f2fs_inode_info:
        contains vfs_inode and other fs-specific information.
      - f2fs_nm_info:
        contains node manager information such as NAT entry cache, free nid list,
        and NAT page management.
      - f2fs_node_info:
        represents a node as node id, inode number, block address, and its version.
      - f2fs_sm_info:
        contains segment manager information such as SIT entry cache, free segment
        map, current active logs, dirty segment management, and segment utilization.
        The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
      In addition, add F2FS_SUPER_MAGIC in magic.h.
      Signed-off-by: default avatarChul Lee <chur.lee@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>