1. 27 May, 2010 4 commits
    • Julia Lawall's avatar
      ipc/sem.c: use ERR_CAST · 4de85cd6
      Julia Lawall authored
      Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
      clear what is the purpose of the operation, which otherwise looks like a
      The semantic patch that makes this change is as follows:
      // <smpl>
      type T;
      T x;
      identifier f;
      T f (...) { <+...
      - ERR_PTR(PTR_ERR(x))
      + x
       ...+> }
      expression x;
      - ERR_PTR(PTR_ERR(x))
      + ERR_CAST(x)
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <julia@diku.dk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Manfred Spraul's avatar
      ipc/sem.c: update description of the implementation · c5cf6359
      Manfred Spraul authored
      ipc/sem.c begins with a 15 year old description about bugs in the initial
      implementation in Linux-1.0.  The patch replaces that with a top level
      description of the current code.
      A TODO could be derived from this text:
      The opengroup man page for semop() does not mandate FIFO.  Thus there is
      no need for a semaphore array list of pending operations.
      - this list is removed
      - the per-semaphore array spinlock is removed (possible if there is no
        list to protect)
      - sem_otime is moved into the semaphores and calculated on demand during
      then the array would be read-mostly - which would significantly improve
      scaling for applications that use semaphore arrays with lots of entries.
      The price would be expensive semctl() calls:
      	for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
      	<do stuff>
      	for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);
      I'm not sure if the complexity is worth the effort, thus here is the
      documentation of the current behavior first.
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Manfred Spraul's avatar
      ipc/sem.c: move wake_up_process out of the spinlock section · 0a2b9d4c
      Manfred Spraul authored
      The wake-up part of semtimedop() consists out of two steps:
      - the right tasks must be identified.
      - they must be woken up.
      Right now, both steps run while the array spinlock is held.  This patch
      reorders the code and moves the actual wake_up_process() behind the point
      where the spinlock is dropped.
      The code also moves setting sem->sem_otime to one place: It does not make
      sense to set the last modify time multiple times.
      [akpm@linux-foundation.org: repair kerneldoc]
      [akpm@linux-foundation.org: fix uninitialised retval]
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Manfred Spraul's avatar
      ipc/sem.c: optimize update_queue() for bulk wakeup calls · fd5db422
      Manfred Spraul authored
      The following series of patches tries to fix the spinlock contention
      reported by Chris Mason - his benchmark exposes problems of the current
      - In the worst case, the algorithm used by update_queue() is O(N^2).
        Bulk wake-up calls can enter this worst case.  The patch series fix
        Note that the benchmark app doesn't expose the problem, it just should
        be fixed: Real world apps might do the wake-ups in another order than
        perfect FIFO.
      - The part of the code that runs within the semaphore array spinlock is
        significantly larger than necessary.
        The patch series fixes that.  This change is responsible for the main
      - The cacheline with the spinlock is also used for a variable that is
        read in the hot path (sem_base) and for a variable that is unnecessarily
        written to multiple times (sem_otime).  The last step of the series
        cacheline-aligns the spinlock.
      This patch:
      The SysV semaphore code allows to perform multiple operations on all
      semaphores in the array as atomic operations.  After a modification,
      update_queue() checks which of the waiting tasks can complete.
      The algorithm that is used to identify the tasks is O(N^2) in the worst
      case.  For some cases, it is simple to avoid the O(N^2).
      The patch adds a detection logic for some cases, especially for the case
      of an array where all sleeping tasks are single sembuf operations and a
      multi-sembuf operation is used to wake up multiple tasks.
      A big database application uses that approach.
      The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
      the patch breaks that.
      [akpm@linux-foundation.org: make do_smart_update() static]
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  2. 25 May, 2010 1 commit
  3. 12 May, 2010 1 commit
  4. 06 Apr, 2010 1 commit
  5. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  6. 22 Mar, 2010 1 commit
    • Anton Blanchard's avatar
      ppc64 sys_ipc breakage in 2.6.34-rc2 · 45575f5a
      Anton Blanchard authored
      I chased down a fail on ppc64 on 2.6.34-rc2 where an application that
      uses shared memory was getting a SEGV.
      Commit baed7fc9 ("Add generic sys_ipc
      wrapper") changed the second argument from an unsigned long to an int.
      When we call shmget the system call wrappers for sys_ipc will sign
      extend second (ie the size) which truncates it.  It took a while to
      track down because the call succeeds and strace shows the untruncated
      size :)
      The patch below changes second from an int to an unsigned long which
      fixes shmget on ppc64 (and I assume s390, sparc64 and mips64).
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      I assume the function prototypes for the other IPC methods would cause us
      to sign or zero extend second where appropriate (avoiding any security
      issues). Come to think of it, the syscall wrappers for each method should do
      that for us as well.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 12 Mar, 2010 2 commits
    • Jiri Slaby's avatar
      ipc: use rlimit helpers · f1eb1332
      Jiri Slaby authored
      Make sure compiler won't do weird things with limits.  E.g.  fetching them
      twice may return 2 different values after writable limits are implemented.
      I.e.  either use rlimit helpers added in
      3e10e716 ("resource: add helpers for
      fetching rlimits") or ACCESS_ONCE if not applicable.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Christoph Hellwig's avatar
      Add generic sys_ipc wrapper · baed7fc9
      Christoph Hellwig authored
      Add a generic implementation of the ipc demultiplexer syscall.  Except for
      s390 and sparc64 all implementations of the sys_ipc are nearly identical.
      There are slight differences in the types of the parameters, where mips
      and powerpc as the only 64-bit architectures with sys_ipc use unsigned
      long for the "third" argument as it gets casted to a pointer later, while
      it traditionally is an "int" like most other paramters.  frv goes even
      further and uses unsigned long for all parameters execept for "ptr" which
      is a pointer type everywhere.  The change from int to unsigned long for
      "third" and back to "int" for the others on frv should be fine due to the
      in-register calling conventions for syscalls (we already had a similar
      issue with the generic sys_ptrace), but I'd prefer to have the arch
      maintainers looks over this in details.
      Except for that h8300, m68k and m68knommu lack an impplementation of the
      semtimedop sub call which this patch adds, and various architectures have
      gets used - at least on i386 it seems superflous as the compat code on
      x86-64 and ia64 doesn't even bother to implement it.
      [akpm@linux-foundation.org: add sys_ipc to sys_ni.c]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Reviewed-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Acked-by: default avatarJesper Nilsson <jesper.nilsson@axis.com>
      Acked-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  8. 03 Mar, 2010 6 commits
  9. 16 Jan, 2010 1 commit
  10. 16 Dec, 2009 12 commits
  11. 11 Dec, 2009 1 commit
  12. 04 Dec, 2009 1 commit
  13. 12 Nov, 2009 1 commit
  14. 27 Sep, 2009 1 commit
  15. 24 Sep, 2009 1 commit
  16. 23 Sep, 2009 1 commit
  17. 22 Sep, 2009 2 commits
    • Eric B Munson's avatar
      hugetlbfs: allow the creation of files suitable for MAP_PRIVATE on the vfs internal mount · 6bfde05b
      Eric B Munson authored
      This patchset adds a flag to mmap that allows the user to request that an
      anonymous mapping be backed with huge pages.  This mapping will borrow
      functionality from the huge page shm code to create a file on the kernel
      internal mount and use it to approximate an anonymous mapping.  The
      MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without
      both flags being preset.
      A new flag is necessary because there is no other way to hook into huge
      pages without creating a file on a hugetlbfs mount which wouldn't be
      To userspace, this mapping will behave just like an anonymous mapping
      because the file is not accessible outside of the kernel.
      This patchset is meant to simplify the programming model.  Presently there
      is a large chunk of boiler platecode, contained in libhugetlbfs, required
      to create private, hugepage backed mappings.  This patch set would allow
      use of hugepages without linking to libhugetlbfs or having hugetblfs
      Unification of the VM code would provide these same benefits, but it has
      been resisted each time that it has been suggested for several reasons: it
      would break PAGE_SIZE assumptions across the kernel, it makes page-table
      abstractions really expensive, and it does not provide any benefit on
      architectures that do not support huge pages, incurring fast path
      penalties without providing any benefit on these architectures.
      This patch:
      There are two means of creating mappings backed by huge pages:
              1. mmap() a file created on hugetlbfs
              2. Use shm which creates a file on an internal mount which essentially
                 maps it MAP_SHARED
      The internal mount is only used for shared mappings but there is very
      little that stops it being used for private mappings. This patch extends
      hugetlbfs_file_setup() to deal with the creation of files that will be
      mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is
      used in a subsequent patch to implement the MAP_HUGETLB mmap() flag.
      Signed-off-by: default avatarEric Munson <ebmunson@us.ibm.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Alexey Dobriyan's avatar
  18. 15 Sep, 2009 1 commit
  19. 24 Aug, 2009 1 commit
    • Hugh Dickins's avatar
      mm: fix hugetlb bug due to user_shm_unlock call · 353d5c30
      Hugh Dickins authored
      2.6.30's commit 8a0bdec1 removed
      user_shm_lock() calls in hugetlb_file_setup() but left the
      user_shm_unlock call in shm_destroy().
      In detail:
      Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
      is not called in hugetlb_file_setup(). However, user_shm_unlock() is
      called in any case in shm_destroy() and in the following
      atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
      up->__count gets zero, also cleanup_user_struct() is scheduled.
      Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
      However, the ref counter up->__count gets unexpectedly non-positive and
      the corresponding structs are freed even though there are live
      references to them, resulting in a kernel oops after a lots of
      shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.
      Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
      time of shm_destroy() may give a different answer from at the time
      of hugetlb_file_setup().  And fixed newseg()'s no_id error path,
      which has missed user_shm_unlock() ever since it came in 2.6.9.
      Reported-by: default avatarStefan Huber <shuber2@gmail.com>
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Tested-by: default avatarStefan Huber <shuber2@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>