1. 05 May, 2010 2 commits
  2. 04 May, 2010 3 commits
  3. 01 May, 2010 3 commits
  4. 29 Apr, 2010 3 commits
  5. 27 Apr, 2010 2 commits
  6. 26 Apr, 2010 3 commits
    • Neil Brown's avatar
      nfsd4: bug in read_buf · 2bc3c117
      Neil Brown authored
      
      
      When read_buf is called to move over to the next page in the pagelist
      of an NFSv4 request, it sets argp->end to essentially a random
      number, certainly not an address within the page which argp->p now
      points to.  So subsequent calls to READ_BUF will think there is much
      more than a page of spare space (the cast to u32 ensures an unsigned
      comparison) so we can expect to fall off the end of the second
      page.
      
      We never encountered thsi in testing because typically the only
      operations which use more than two pages are write-like operations,
      which have their own decoding logic.  Something like a getattr after a
      write may cross a page boundary, but it would be very unusual for it to
      cross another boundary after that.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@citi.umich.edu>
      2bc3c117
    • Dave Chinner's avatar
      xfs: more swap extent fixes for dynamic fork offsets · dd77ef92
      Dave Chinner authored
      
      
      A new xfsqa test (226) with a prototype xfs_fsr change to try to
      handle dynamic fork offsets better triggers an assertion failure
      where the inode data fork is in btree format, yet there is room in
      the inode for it to be in extent format. The two inodes look like:
      
      before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
      before: ino 0x115 (temp),  num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
      after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
      after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56
      
      Basically the target inode ends up with 5 extents in btree format,
      but it had space for 6 extents in extent format, so ends up
      incorrect. Notably here the broot size is the same, and that is
      where the kernel code is going wrong - the btree root will fit, so
      it lets the swap go ahead.
      
      The check should not allow the swap to take place if the number of
      extents while in btree format is less than the number of extents
      that can fit in the inode in extent format. Adding that check will
      prevent this swap and corruption from occurring.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      dd77ef92
    • Jens Axboe's avatar
      btrfs: convert to using bdi_setup_and_register() · e6d086d8
      Jens Axboe authored
      
      
      It's now a provided helper, so get rid of the internal setup
      and btrfs atomic_t bdi enumerator.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      e6d086d8
  7. 25 Apr, 2010 4 commits
  8. 24 Apr, 2010 3 commits
    • Anton Blanchard's avatar
      fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes to block devices · b8af67e2
      Anton Blanchard authored
      We are seeing a large regression in database performance on recent
      kernels.  The database opens a block device with O_DIRECT|O_SYNC and a
      number of threads write to different regions of the file at the same time.
      
      A simple test case is below.  I haven't defined DEVICE since getting it
      wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
      see about 17MB/sec and only a few threads in IO wait:
      
      procs  -----io---- -system-- -----cpu------
       r  b     bi    bo   in   cs us sy id wa st
       0  3      0 16170  656 2259  0  0 86 14  0
       0  2      0 16704  695 2408  0  0 92  8  0
       0  2      0 17308  744 2653  0  0 86 14  0
       0  2      0 17933  759 2777  0  0 89 10  0
      
      Most threads are blocking in vfs_fsync_range, which has:
      
              mutex_lock(&mapping->host->i_mutex);
              err = fop->fsync(file, dentry, datasync);
              if (!ret)
                      ret = err;
              mutex_unlock(&mapping->host->i_mutex);
      
      commit 148f948b
      
       (vfs: Introduce new
      helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
      some explanation of what is going on:
      
          Use these new helpers for syncing from generic VFS functions. This makes
          O_SYNC writes to block devices acquire i_mutex for syncing. If we really
          care about this, we can make block_fsync() drop the i_mutex and reacquire
          it before it returns.
      
      Thanks Jan for such a good commit message!  As well as dropping i_mutex,
      Christoph suggests we should remove the call to sync_blockdev():
      
      > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
      > the block device inode, which is exactly what we did just before calling
      > into ->fsync
      
      The patch below incorporates both suggestions. With it the testcase improves
      from 17MB/s to 68M/sec:
      
      procs  -----io---- -system-- -----cpu------
       r  b     bi    bo   in   cs us sy id wa st
       0  7      0 65536 1000 3878  0  0 70 30  0
       0 34      0 69632 1016 3921  0  1 46 53  0
       0 57      0 69632 1000 3921  0  0 55 45  0
       0 53      0 69640  754 4111  0  0 81 19  0
      
      Testcase:
      
      #define _GNU_SOURCE
      #include <stdio.h>
      #include <pthread.h>
      #include <unistd.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      
      #define NR_THREADS 64
      #define BUFSIZE (64 * 1024)
      
      #define DEVICE "/dev/mapper/XXXXXX"
      
      #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))
      
      static int fd;
      
      static void *doit(void *arg)
      {
      	unsigned long offset = (long)arg;
      	char *b, *buf;
      
      	b = malloc(BUFSIZE + 1024);
      	buf = (char *)ALIGN((unsigned long)b, 1024);
      	memset(buf, 0, BUFSIZE);
      
      	while (1)
      		pwrite(fd, buf, BUFSIZE, offset);
      }
      
      int main(int argc, char *argv[])
      {
      	int flags = O_RDWR|O_DIRECT;
      	int i;
      	unsigned long offset = 0;
      
      	if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
      		flags |= O_SYNC;
      
      	fd = open(DEVICE, flags);
      	if (fd == -1) {
      		perror("open");
      		exit(1);
      	}
      
      	for (i = 0; i < NR_THREADS-1; i++) {
      		pthread_t tid;
      		pthread_create(&tid, NULL, doit, (void *)offset);
      		offset += BUFSIZE;
      	}
      	doit((void *)offset);
      
      	return 0;
      }
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8af67e2
    • Jeff Mahoney's avatar
      reiserfs: fix corruption during shrinking of xattrs · fb2162df
      Jeff Mahoney authored
      Commit 48b32a35 ("reiserfs: use generic
      xattr handlers") introduced a problem that causes corruption when extended
      attributes are replaced with a smaller value.
      
      The issue is that the reiserfs_setattr to shrink the xattr file was moved
      from before the write to after the write.
      
      The root issue has always been in the reiserfs xattr code, but was papered
      over by the fact that in the shrink case, the file would just be expanded
      again while the xattr was written.
      
      The end result is that the last 8 bytes of xattr data are lost.
      
      This patch fixes it to use new_size.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826
      
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reported-by: default avatarChristian Kujau <lists@nerdbynature.de>
      Tested-by: default avatarChristian Kujau <lists@nerdbynature.de>
      Cc: Edward Shishkin <edward.shishkin@gmail.com>
      Cc: Jethro Beekman <kernel@jbeekman.nl>
      Cc: Greg Surbey <gregsurbey@hotmail.com>
      Cc: Marco Gatti <marco.gatti@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb2162df
    • Jeff Mahoney's avatar
      reiserfs: fix permissions on .reiserfs_priv · cac36f70
      Jeff Mahoney authored
      Commit 677c9b2e
      
       ("reiserfs: remove
      privroot hiding in lookup") removed the magic from the lookup code to hide
      the .reiserfs_priv directory since it was getting loaded at mount-time
      instead.  The intent was that the entry would be hidden from the user via
      a poisoned d_compare, but this was faulty.
      
      This introduced a security issue where unprivileged users could access and
      modify extended attributes or ACLs belonging to other users, including
      root.
      
      This patch resolves the issue by properly hiding .reiserfs_priv.  This was
      the intent of the xattr poisoning code, but it appears to have never
      worked as expected.  This is fixed by using d_revalidate instead of
      d_compare.
      
      This patch makes -oexpose_privroot a no-op.  I'm fine leaving it this way.
      The effort involved in working out the corner cases wrt permissions and
      caching outweigh the benefit of the feature.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Acked-by: default avatarEdward Shishkin <edward.shishkin@gmail.com>
      Reported-by: default avatarMatt McCutchen <matt@mattmccutchen.net>
      Tested-by: default avatarMatt McCutchen <matt@mattmccutchen.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cac36f70
  9. 23 Apr, 2010 2 commits
  10. 22 Apr, 2010 8 commits
  11. 21 Apr, 2010 2 commits
  12. 20 Apr, 2010 3 commits
  13. 19 Apr, 2010 2 commits