1. 31 Oct, 2005 1 commit
    • Andrea Arcangeli's avatar
      [PATCH] fix nr_unused accounting, and avoid recursing in iput with I_WILL_FREE set · 7f04c26d
      Andrea Arcangeli authored
      
      
       			list_move(&inode->i_list, &inode_in_use);
       		} else {
       			list_move(&inode->i_list, &inode_unused);
      +			inodes_stat.nr_unused++;
       		}
       	}
       	wake_up_inode(inode);
      
      Are you sure the above diff is correct? It was added somewhere between
      2.6.5 and 2.6.8. I think it's wrong.
      
      The only way I can imagine the i_count to be zero in the above path, is
      that I_WILL_FREE is set.  And if I_WILL_FREE is set, then we must not
      increase nr_unused.  So I believe the above change is buggy and it will
      definitely overstate the number of unused inodes and it should be backed
      out.
      
      Note that __writeback_single_inode before calling __sync_single_inode, can
      drop the spinlock and we can have both the dirty and locked bitflags clear
      here:
      
      		spin_unlock(&inode_lock);
      		__wait_on_inode(inode);
      		iput(inode);
      XXXXXXX
      		spin_lock(&inode_lock);
      	}
      	use inode again here
      
      a construct like the above makes zero sense from a reference counting
      standpoint.
      
      Either we don't ever use the inode again after the iput, or the
      inode_lock should be taken _before_ executing the iput (i.e. a __iput
      would be required). Taking the inode_lock after iput means the iget was
      useless if we keep using the inode after the iput.
      
      So the only chance the 2.6 was safe to call __writeback_single_inode
      with the i_count == 0, is that I_WILL_FREE is set (I_WILL_FREE will
      prevent the VM to free the inode in XXXXX).
      
      Potentially calling the above iput with I_WILL_FREE was also wrong
      because it would recurse in iput_final (the second mainline bug).
      
      The below (untested) patch fixes the nr_unused accounting, avoids recursing
      in iput when I_WILL_FREE is set and makes sure (with the BUG_ON) that we
      don't corrupt memory and that all holders that don't set I_WILL_FREE, keeps
      a reference on the inode!
      Signed-off-by: default avatarAndrea Arcangeli <andrea@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7f04c26d
  2. 28 Oct, 2005 1 commit
    • Al Viro's avatar
      [PATCH] gfp_t: fs/* · 27496a8c
      Al Viro authored
      
      
       - ->releasepage() annotated (s/int/gfp_t), instances updated
       - missing gfp_t in fs/* added
       - fixed misannotation from the original sweep caught by bitwise checks:
         XFS used __nocast both for gfp_t and for flags used by XFS allocator.
         The latter left with unsigned int __nocast; we might want to add a
         different type for those but for now let's leave them alone.  That,
         BTW, is a case when __nocast use had been actively confusing - it had
         been used in the same code for two different and similar types, with
         no way to catch misuses.  Switch of gfp_t to bitwise had caught that
         immediately...
      
      One tricky bit is left alone to be dealt with later - mapping->flags is
      a mix of gfp_t and error indications.  Left alone for now.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      27496a8c
  3. 09 Sep, 2005 1 commit
    • Mark Fasheh's avatar
      [PATCH] move truncate_inode_pages() into ->delete_inode() · e85b5652
      Mark Fasheh authored
      
      
      Allow file systems supporting ->delete_inode() to call
      truncate_inode_pages() on their own.  OCFS2 wants this so it can query the
      cluster before making a final decision on whether to wipe an inode from
      disk or not.  In some corner cases an inode marked on the local node via
      voting may not actually get orphaned.  A good example is node death before
      the transaction moving the inode to the orphan dir commits to the journal.
      Without this patch, the truncate_inode_pages() call in
      generic_delete_inode() would discard valid data for such inodes.
      
      During earlier discussion in the 2.6.13 merge plan thread, Christoph
      Hellwig indicated that other file systems might also find this useful.
      
      IMHO, the best solution would be to just allow ->drop_inode() to do the
      cluster query but it seems that would require a substantial reworking of
      that section of the code.  Assuming it is safe to call write_inode_now() in
      ocfs2_delete_inode() for those inodes which won't actually get wiped, this
      solution should get us by for now.
      
      Trivial testing of this patch (and a related OCFS2 update) has shown this
      to avoid the corruption I'm seeing.
      Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      Acked-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e85b5652
  4. 07 Sep, 2005 1 commit
  5. 13 Jul, 2005 2 commits
    • Anton Altaparmakov's avatar
      [PATCH] Fix soft lockup due to NTFS: VFS part and explanation · 88bd5121
      Anton Altaparmakov authored
      
      
      Something has changed in the core kernel such that we now get concurrent
      inode write outs, one e.g via pdflush and one via sys_sync or whatever.
      This causes a nasty deadlock in ntfs.  The only clean solution
      unfortunately requires a minor vfs api extension.
      
      First the deadlock analysis:
      
      Prerequisive knowledge: NTFS has a file $MFT (inode 0) loaded at mount
      time.  The NTFS driver uses the page cache for storing the file contents as
      usual.  More interestingly this file contains the table of on-disk inodes
      as a sequence of MFT_RECORDs.  Thus NTFS driver accesses the on-disk inodes
      by accessing the MFT_RECORDs in the page cache pages of the loaded inode
      $MFT.
      
      The situation: VFS inode X on a mounted ntfs volume is dirty.  For same
      inode X, the ntfs_inode is dirty and thus corresponding on-disk inode,
      which is as explained above in a dirty PAGE_CACHE_PAGE belonging to the
      table of inodes ($MFT, inode 0).
      
      What happens:
      
      Process 1: sys_sync()/umount()/whatever...  calls __sync_single_inode() for
      $MFT -> do_writepages() -> write_page for the dirty page containing the
      on-disk inode X, the page is now locked -> ntfs_write_mst_block() which
      clears PageUptodate() on the page to prevent anyone else getting hold of it
      whilst it does the write out (this is necessary as the on-disk inode needs
      "fixups" applied before the write to disk which are removed again after the
      write and PageUptodate is then set again).  It then analyses the page
      looking for dirty on-disk inodes and when it finds one it calls
      ntfs_may_write_mft_record() to see if it is safe to write this on-disk
      inode.  This then calls ilookup5() to check if the corresponding VFS inode
      is in icache().  This in turn calls ifind() which waits on the inode lock
      via wait_on_inode whilst holding the global inode_lock.
      
      Process 2: pdflush results in a call to __sync_single_inode for the same
      VFS inode X on the ntfs volume.  This locks the inode (I_LOCK) then calls
      write-inode -> ntfs_write_inode -> map_mft_record() -> read_cache_page() of
      the page (in page cache of table of inodes $MFT, inode 0) containing the
      on-disk inode.  This page has PageUptodate() clear because of Process 1
      (see above) so read_cache_page() blocks when tries to take the page lock
      for the page so it can call ntfs_read_page().
      
      Thus Process 1 is holding the page lock on the page containing the on-disk
      inode X and it is waiting on the inode X to be unlocked in ifind() so it
      can write the page out and then unlock the page.
      
      And Process 2 is holding the inode lock on inode X and is waiting for the
      page to be unlocked so it can call ntfs_readpage() or discover that
      Process 1 set PageUptodate() again and use the page.
      
      Thus we have a deadlock due to ifind() waiting on the inode lock.
      
      The only sensible solution: NTFS does not care whether the VFS inode is
      locked or not when it calls ilookup5() (it doesn't use the VFS inode at
      all, it just uses it to find the corresponding ntfs_inode which is of
      course attached to the VFS inode (both are one single struct); and it uses
      the ntfs_inode which is subject to its own locking so I_LOCK is irrelevant)
      hence we want a modified ilookup5_nowait() which is the same as ilookup5()
      but it does not wait on the inode lock.
      
      Without such functionality I would have to keep my own ntfs_inode cache in
      the NTFS driver just so I can find ntfs_inodes independent of their VFS
      inodes which would be slow, memory and cpu cycle wasting, and incredibly
      stupid given the icache already exists in the VFS.
      
      Below is a patch that does the ilookup5_nowait() implementation in
      fs/inode.c and exports it.
      
      ilookup5_nowait.diff:
      
      Introduce ilookup5_nowait() which is basically the same as ilookup5() but
      it does not wait on the inode's lock (i.e. it omits the wait_on_inode()
      done in ifind()).
      
      This is needed to avoid a nasty deadlock in NTFS.
      Signed-off-by: default avatarAnton Altaparmakov <aia21@cantab.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      88bd5121
    • Robert Love's avatar
      [PATCH] inotify · 0eeca283
      Robert Love authored
      
      
      inotify is intended to correct the deficiencies of dnotify, particularly
      its inability to scale and its terrible user interface:
      
              * dnotify requires the opening of one fd per each directory
                that you intend to watch. This quickly results in too many
                open files and pins removable media, preventing unmount.
              * dnotify is directory-based. You only learn about changes to
                directories. Sure, a change to a file in a directory affects
                the directory, but you are then forced to keep a cache of
                stat structures.
              * dnotify's interface to user-space is awful.  Signals?
      
      inotify provides a more usable, simple, powerful solution to file change
      notification:
      
              * inotify's interface is a system call that returns a fd, not SIGIO.
      	  You get a single fd, which is select()-able.
              * inotify has an event that says "the filesystem that the item
                you were watching is on was unmounted."
              * inotify can watch directories or files.
      
      Inotify is currently used by Beagle (a desktop search infrastructure),
      Gamin (a FAM replacement), and other projects.
      
      See Documentation/filesystems/inotify.txt.
      Signed-off-by: default avatarRobert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0eeca283
  6. 12 Jul, 2005 2 commits
    • Artem B. Bityuckiy's avatar
      [PATCH] bugfix: two read_inode() calls without clear_inode() call between · 4120db47
      Artem B. Bityuckiy authored
      
      
      Bug symptoms
      ~~~~~~~~~~~~
      For the same inode VFS calls read_inode() twice and doesn't call
      clear_inode() between the two read_inode() invocations.
      
      Bug description
      ~~~~~~~~~~~~~~~
      Suppose we have an inode which has zero reference count but is still in
      the inode cache. Suppose kswapd invokes shrink_icache_memory() to free
      some RAM. In prune_icache() inodes are removed from i_hash. prune_icache
      () is then going to call clear_inode(), but drops the inode_lock
      spinlock before this. If in this moment another task calls iget() for an
      inode which was just removed from i_hash by prune_icache(), then iget()
      invokes read_inode() for this inode, because it is *already removed*
      from i_hash.
      
      The end result is: we call iget(#N) then iput(#N); inode #N has zero
      i_count now and is in the inode cache; kswapd starts. kswapd removes the
      inode #N from i_hash ans is preempted; we call iget(#N) again;
      read_inode() is invoked as the result; but we expect clear_inode()
      before.
      
      Fix
      ~~~~~~~
      To fix the bug I remove inodes from i_hash later, when clear_inode() is
      actually called. I remove them from i_hash under spinlock protection.
      Since the i_state is set to I_FREEING, it is safe to do this. The others
      will sleep waiting for the inode state change.
      
      I also postpone removing inodes from i_sb_list. It is not compulsory to
      do so but I do it for readability reasons. Inodes are added/removed to
      the lists together everywhere in the code and there is no point to
      change this rule. This is harmless because the only user of i_sb_list
      which somehow may interfere with me (invalidate_list()) is excluded by
      the iprune_sem mutex.
      
      The same race is possible in invalidate_list() so I do the same for it.
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4120db47
    • Miklos Szeredi's avatar
      [PATCH] __wait_on_freeing_inode fix · 168a9fd6
      Miklos Szeredi authored
      
      
      This patch fixes queer behavior in __wait_on_freeing_inode().
      
      If I_LOCK was not set it called yield(), effectively busy waiting for the
      removal of the inode from the hash.  This change was introduced within
      "[PATCH] eliminate inode waitqueue hashtable" Changeset 1.1938.166.16 last
      october by wli.
      
      The solution is to restore the old behavior, of unconditionally waiting on
      the waitqueue.  It doesn't matter if I_LOCK is not set initally, the task
      will go to sleep, and wake up when wake_up_inode() is called from
      generic_delete_inode() after removing the inode from the hash chain.
      
      Comment is also updated to better reflect current behavior.
      
      This condition is very hard to trigger normally (simultaneous clear_inode()
      with iget()) so probably only heavy stress testing can reveal any change of
      behavior.
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      168a9fd6
  7. 08 Jul, 2005 1 commit
    • Mark Fasheh's avatar
      [PATCH] export generic_drop_inode() to modules · cb2c0233
      Mark Fasheh authored
      
      
      OCFS2 wants to mark an inode which has been orphaned by another node so
      that during final iput it takes the correct path through the VFS and can
      pass through the OCFS2 delete_inode callback.  Since i_nlink can get out of
      date with other nodes, the best way I see to accomplish this is by clearing
      i_nlink on those inodes at drop_inode time.  Other than this small amount
      of work, nothing different needs to happen, so I think it would be cleanest
      to be able to just call generic_drop_inode at the end of the OCFS2
      drop_inode callback.
      Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      cb2c0233
  8. 23 Jun, 2005 1 commit
    • Alexander Viro's avatar
      [PATCH] fix for prune_icache()/forced final iput() races · 991114c6
      Alexander Viro authored
      
      
      Based on analysis and a patch from Russ Weight <rweight@us.ibm.com>
      
      There is a race condition that can occur if an inode is allocated and then
      released (using iput) during the ->fill_super functions.  The race
      condition is between kswapd and mount.
      
      For most filesystems this can only happen in an error path when kswapd is
      running concurrently.  For isofs, however, the error can occur in a more
      common code path (which is how the bug was found).
      
      The logic here is "we want final iput() to free inode *now* instead of
      letting it sit in cache if fs is going down or had not quite come up".  The
      problem is with kswapd seeing such inodes in the middle of being killed and
      happily taking over.
      
      The clean solution would be to tell kswapd to leave those inodes alone and
      let our final iput deal with them.  I.e.  add a new flag
      (I_FORCED_FREEING), set it before write_inode_now() there and make
      prune_icache() leave those alone.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      991114c6
  9. 05 May, 2005 2 commits
  10. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4