1. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  2. 24 Dec, 2016 1 commit
  3. 21 Jan, 2016 1 commit
    • Jann Horn's avatar
      ptrace: use fsuid, fsgid, effective creds for fs access checks · caaee623
      Jann Horn authored
      By checking the effective credentials instead of the real UID / permitted
      capabilities, ensure that the calling process actually intended to use its
      credentials.
      
      To ensure that all ptrace checks use the correct caller credentials (e.g.
      in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
      flag), use two new flags and require one of them to be set.
      
      The problem was that when a privileged task had temporarily dropped its
      privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
      perform following syscalls with the credentials of a user, it still passed
      ptrace access checks that the user would not be able to pass.
      
      While an attacker should not be able to convince the privileged task to
      perform a ptrace() syscall, this is a problem because the ptrace access
      check is reused for things in procfs.
      
      In particular, the following somewhat interesting procfs entries only rely
      on ptrace access checks:
      
       /proc/$pid/stat - uses the check for determining whether pointers
           should be visible, useful for bypassing ASLR
       /proc/$pid/maps - also useful for bypassing ASLR
       /proc/$pid/cwd - useful for gaining access to restricted
           directories that contain files with lax permissions, e.g. in
           this scenario:
           lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
           drwx------ root root /root
           drwxr-xr-x root root /root/foobar
           -rw-r--r-- root root /root/foobar/secret
      
      Therefore, on a system where a root-owned mode 6755 binary changes its
      effective credentials as described and then dumps a user-specified file,
      this could be used by an attacker to reveal the memory layout of root's
      processes or reveal the contents of files he is not allowed to access
      (through /proc/$pid/cwd).
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caaee623
  4. 02 Feb, 2014 1 commit
    • H. Peter Anvin's avatar
      compat: Get rid of (get|put)_compat_time(val|spec) · 81993e81
      H. Peter Anvin authored
      We have two APIs for compatiblity timespec/val, with confusingly
      similar names.  compat_(get|put)_time(val|spec) *do* handle the case
      where COMPAT_USE_64BIT_TIME is set, whereas
      (get|put)_compat_time(val|spec) do not.  This is an accident waiting
      to happen.
      
      Clean it up by favoring the full-service version; the limited version
      is replaced with double-underscore versions static to kernel/compat.c.
      
      A common pattern is to convert a struct timespec to kernel format in
      an allocation on the user stack.  Unfortunately it is open-coded in
      several places.  Since this allocation isn't actually needed if
      COMPAT_USE_64BIT_TIME is true (since user format == kernel format)
      encapsulate that whole pattern into the function
      compat_convert_timespec().  An equivalent function should be written
      for struct timeval if it is needed in the future.
      
      Finally, get rid of compat_(get|put)_timeval_convert(): each was only
      used once, and the latter was not even doing what the function said
      (no conversion actually was being done.)  Moving the conversion into
      compat_sys_settimeofday() itself makes the code much more similar to
      sys_settimeofday() itself.
      
      v3: Remove unused compat_convert_timeval().
      
      v2: Drop bogus "const" in the destination argument for
          compat_convert_time*().
      
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Tested-by: default avatarH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      81993e81
  5. 19 Feb, 2013 1 commit
    • Thomas Gleixner's avatar
      futex: Revert "futex: Mark get_robust_list as deprecated" · fe2b05f7
      Thomas Gleixner authored
      This reverts commit ec0c4274.
      
      get_robust_list() is in use and a removal would break existing user
      space. With the permission checks in place it's not longer a security
      hole. Remove the deprecation warnings.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: akpm@linux-foundation.org
      Cc: paul.gortmaker@windriver.com
      Cc: davej@redhat.com
      Cc: keescook@chromium.org
      Cc: stable@vger.kernel.org
      Cc: ebiederm@xmission.com
      fe2b05f7
  6. 03 Feb, 2013 1 commit
  7. 29 Mar, 2012 2 commits
  8. 24 Mar, 2011 1 commit
  9. 10 Nov, 2010 1 commit
    • Darren Hart's avatar
      futex: Address compiler warnings in exit_robust_list · 4c115e95
      Darren Hart authored
      Since commit 1dcc41bb (futex: Change 3rd arg of fetch_robust_entry()
      to unsigned int*) some gcc versions decided to emit the following
      warning:
      
      kernel/futex.c: In function ‘exit_robust_list’:
      kernel/futex.c:2492: warning: ‘next_pi’ may be used uninitialized in this function
      
      The commit did not introduce the warning as gcc should have warned
      before that commit as well. It's just gcc being silly.
      
      The code path really can't result in next_pi being unitialized (or
      should not), but let's keep the build clean. Annotate next_pi as an
      uninitialized_var.
      
      [ tglx: Addressed the same issue in futex_compat.c and massaged the
        	changelog ]
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Tested-by: default avatarMatt Fleming <matt@console-pimps.org>
      Tested-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <1288897200-13008-1-git-send-email-dvhart@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      4c115e95
  10. 18 Sep, 2010 1 commit
  11. 09 Dec, 2009 1 commit
  12. 10 Aug, 2009 1 commit
  13. 13 Nov, 2008 3 commits
  14. 30 Mar, 2008 1 commit
  15. 24 Feb, 2008 1 commit
    • Thomas Gleixner's avatar
      futex: runtime enable pi and robust functionality · a0c1e907
      Thomas Gleixner authored
      Not all architectures implement futex_atomic_cmpxchg_inatomic().  The default
      implementation returns -ENOSYS, which is currently not handled inside of the
      futex guts.
      
      Futex PI calls and robust list exits with a held futex result in an endless
      loop in the futex code on architectures which have no support.
      
      Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
      add a fair amount of extra if/else constructs to the already complex code.  It
      is also not possible to disable the robust feature before user space tries to
      register robust lists.
      
      Compile time disabling is not a good idea either, as there are already
      architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.
      
      Detect the functionality at runtime instead by calling
      cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
      code.  This is guaranteed to fail, but the call of
      futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.
      
      On architectures, which use the asm-generic implementation or have a runtime
      CPU feature detection, a -ENOSYS return value disables the PI/robust features.
      
      On architectures with a working implementation the call returns -EFAULT and
      the PI/robust features are enabled.
      
      The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
      when the detection fails.
      
      Fixes http://lkml.org/lkml/2008/2/11/149
      Originally reported by: Lennart Buytenhek
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Lennert Buytenhek <buytenh@wantstofly.org>
      Cc: Riku Voipio <riku.voipio@movial.fi>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0c1e907
  16. 14 Feb, 2008 1 commit
  17. 01 Feb, 2008 1 commit
    • Thomas Gleixner's avatar
      futex: Add bitset conditional wait/wakeup functionality · cd689985
      Thomas Gleixner authored
      To allow the implementation of optimized rw-locks in user space, glibc
      needs a possibility to select waiters for wakeup depending on a bitset
      mask.
      
      This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
      These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
      additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
      expecting an absolute timeout value instead of the relative one, which
      is used for the FUTEX_WAIT OP.
      
      FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
      stored in the futex_q structure, which is used to enqueue the waiter
      into the hashed futex waitqueue.
      
      FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
      function logically ANDs the bitset with the bitset stored in each
      waiters futex_q structure. If the result is zero (i.e. none of the set
      bits in the bitsets is matching), then the waiter is not woken up. If
      the result is not zero (i.e. one of the set bits in the bitsets is
      matching), then the waiter is woken.
      
      The bitset provided by the caller must be non zero. In case the
      provided bitset is zero the kernel returns EINVAL.
      
      Internaly the new OPs are only extensions to the existing FUTEX_WAIT
      and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
      set into the futex_wait() and futex_wake() functions.
      Signed-off-by: default avatarThomas Gleixner <tgxl@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cd689985
  18. 10 Nov, 2007 1 commit
    • David Miller's avatar
      [FUTEX] Fix address computation in compat code. · 3c5fd9c7
      David Miller authored
      compat_exit_robust_list() computes a pointer to the
      futex entry in userspace as follows:
      
      	(void __user *)entry + futex_offset
      
      'entry' is a 'struct robust_list __user *', and
      'futex_offset' is a 'compat_long_t' (typically a 's32').
      
      Things explode if the 32-bit sign bit is set in futex_offset.
      
      Type promotion sign extends futex_offset to a 64-bit value before
      adding it to 'entry'.
      
      This triggered a problem on sparc64 running 32-bit applications which
      would lock up a cpu looping forever in the fault handling for the
      userspace load in handle_futex_death().
      
      Compat userspace runs with address masking (wherein the cpu zeros out
      the top 32-bits of every effective address given to a memory operation
      instruction) so the sparc64 fault handler accounts for this by
      zero'ing out the top 32-bits of the fault address too.
      
      Since the kernel properly uses the compat_uptr interfaces, kernel side
      accesses to compat userspace work too since they will only use
      addresses with the top 32-bit clear.
      
      Because of this compat futex layer bug we get into the following loop
      when executing the get_user() load near the top of handle_futex_death():
      
      1) load from address '0xfffffffff7f16bd8', FAULT
      2) fault handler clears upper 32-bits, processes fault
         for address '0xf7f16bd8' which succeeds
      3) goto #1
      
      I want to thank Bernd Zeimetz, Josip Rodin, and Fabio Massimo Di Nitto
      for their tireless efforts helping me track down this bug.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c5fd9c7
  19. 19 Oct, 2007 2 commits
    • Pavel Emelyanov's avatar
      Uninline find_task_by_xxx set of functions · 228ebcbe
      Pavel Emelyanov authored
      The find_task_by_something is a set of macros are used to find task by pid
      depending on what kind of pid is proposed - global or virtual one.  All of
      them are wrappers above the most generic one - find_task_by_pid_type_ns() -
      and just substitute some args for it.
      
      It turned out, that dereferencing the current->nsproxy->pid_ns construction
      and pushing one more argument on the stack inline cause kernel text size to
      grow.
      
      This patch moves all this stuff out-of-line into kernel/pid.c.  Together
      with the next patch it saves a bit less than 400 bytes from the .text
      section.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      228ebcbe
    • Pavel Emelyanov's avatar
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov authored
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
  20. 01 Oct, 2007 1 commit
  21. 12 Sep, 2007 1 commit
    • Arnd Bergmann's avatar
      futex_compat: fix list traversal bugs · 179c85ea
      Arnd Bergmann authored
      The futex list traversal on the compat side appears to have
      a bug.
      
      It's loop termination condition compares:
      
              while (compat_ptr(uentry) != &head->list)
      
      But that can't be right because "uentry" has the special
      "pi" indicator bit still potentially set at bit 0.  This
      is cleared by fetch_robust_entry() into the "entry"
      return value.
      
      What this seems to mean is that the list won't terminate
      when list iteration gets back to the the head.  And we'll
      also process the list head like a normal entry, which could
      cause all kinds of problems.
      
      So we should check for equality with "entry".  That pointer
      is of the non-compat type so we have to do a little casting
      to keep the compiler and sparse happy.
      
      The same problem can in theory occur with the 'pending'
      variable, although that has not been reported from users
      so far.
      
      Based on the original patch from David Miller.
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      179c85ea
  22. 18 Jun, 2007 1 commit
    • Thomas Gleixner's avatar
      Revert "futex_requeue_pi optimization" · bd197234
      Thomas Gleixner authored
      This reverts commit d0aa7a70.
      
      It not only introduced user space visible changes to the futex syscall,
      it is also non-functional and there is no way to fix it proper before
      the 2.6.22 release.
      
      The breakage report ( http://lkml.org/lkml/2007/5/12/17 ) went
      unanswered, and unfortunately it turned out that the concept is not
      feasible at all.  It violates the rtmutex semantics badly by introducing
      a virtual owner, which hacks around the coupling of the user-space
      pi_futex and the kernel internal rt_mutex representation.
      
      At the moment the only safe option is to remove it fully as it contains
      user-space visible changes to broken kernel code, which we do not want
      to expose in the 2.6.22 release.
      
      The patch reverts the original patch mostly 1:1, but contains a couple
      of trivial manual cleanups which were necessary due to patches, which
      touched the same area of code later.
      
      Verified against the glibc tests and my own PI futex tests.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarUlrich Drepper <drepper@redhat.com>
      Cc: Pierre Peiffer <pierre.peiffer@bull.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd197234
  23. 01 Jun, 2007 1 commit
  24. 09 May, 2007 2 commits
    • Pierre Peiffer's avatar
      futex_requeue_pi optimization · d0aa7a70
      Pierre Peiffer authored
      This patch provides the futex_requeue_pi functionality, which allows some
      threads waiting on a normal futex to be requeued on the wait-queue of a
      PI-futex.
      
      This provides an optimization, already used for (normal) futexes, to be used
      with the PI-futexes.
      
      This optimization is currently used by the glibc in pthread_broadcast, when
      using "normal" mutexes.  With futex_requeue_pi, it can be used with
      PRIO_INHERIT mutexes too.
      Signed-off-by: default avatarPierre Peiffer <pierre.peiffer@bull.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0aa7a70
    • Pierre Peiffer's avatar
      Make futex_wait() use an hrtimer for timeout · c19384b5
      Pierre Peiffer authored
      This patch modifies futex_wait() to use an hrtimer + schedule() in place of
      schedule_timeout().
      
      schedule_timeout() is tick based, therefore the timeout granularity is the
      tick (1 ms, 4 ms or 10 ms depending on HZ).  By using a high resolution timer
      for timeout wakeup, we can attain a much finer timeout granularity (in the
      microsecond range).  This parallels what is already done for futex_lock_pi().
      
      The timeout passed to the syscall is no longer converted to jiffies and is
      therefore passed to do_futex() and futex_wait() as an absolute ktime_t
      therefore keeping nanosecond resolution.
      
      Also this removes the need to pass the nanoseconds timeout part to
      futex_lock_pi() in val2.
      
      In futex_wait(), if there is no timeout then a regular schedule() is
      performed.  Otherwise, an hrtimer is fired before schedule() is called.
      
      [akpm@linux-foundation.org: fix `make headers_check']
      Signed-off-by: default avatarSebastien Dugue <sebastien.dugue@bull.net>
      Signed-off-by: default avatarPierre Peiffer <pierre.peiffer@bull.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c19384b5
  25. 10 Oct, 2006 1 commit
  26. 06 Aug, 2006 1 commit
  27. 29 Jul, 2006 1 commit
  28. 28 Jun, 2006 2 commits
    • Ingo Molnar's avatar
      [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support · c87e2837
      Ingo Molnar authored
      This adds the actual pi-futex implementation, based on rt-mutexes.
      
      [dino@in.ibm.com: fix an oops-causing race]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarDinakar Guniguntala <dino@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c87e2837
    • Ingo Molnar's avatar
      [PATCH] pi-futex: futex code cleanups · e2970f2f
      Ingo Molnar authored
      We are pleased to announce "lightweight userspace priority inheritance" (PI)
      support for futexes.  The following patchset and glibc patch implements it,
      ontop of the robust-futexes patchset which is included in 2.6.16-mm1.
      
      We are calling it lightweight for 3 reasons:
      
       - in the user-space fastpath a PI-enabled futex involves no kernel work
         (or any other PI complexity) at all.  No registration, no extra kernel
         calls - just pure fast atomic ops in userspace.
      
       - in the slowpath (in the lock-contention case), the system call and
         scheduling pattern is in fact better than that of normal futexes, due to
         the 'integrated' nature of FUTEX_LOCK_PI.  [more about that further down]
      
       - the in-kernel PI implementation is streamlined around the mutex
         abstraction, with strict rules that keep the implementation relatively
         simple: only a single owner may own a lock (i.e.  no read-write lock
         support), only the owner may unlock a lock, no recursive locking, etc.
      
        Priority Inheritance - why, oh why???
        -------------------------------------
      
      Many of you heard the horror stories about the evil PI code circling Linux for
      years, which makes no real sense at all and is only used by buggy applications
      and which has horrible overhead.  Some of you have dreaded this very moment,
      when someone actually submits working PI code ;-)
      
      So why would we like to see PI support for futexes?
      
      We'd like to see it done purely for technological reasons.  We dont think it's
      a buggy concept, we think it's useful functionality to offer to applications,
      which functionality cannot be achieved in other ways.  We also think it's the
      right thing to do, and we think we've got the right arguments and the right
      numbers to prove that.  We also believe that we can address all the
      counter-arguments as well.  For these reasons (and the reasons outlined below)
      we are submitting this patch-set for upstream kernel inclusion.
      
      What are the benefits of PI?
      
        The short reply:
        ----------------
      
      User-space PI helps achieving/improving determinism for user-space
      applications.  In the best-case, it can help achieve determinism and
      well-bound latencies.  Even in the worst-case, PI will improve the statistical
      distribution of locking related application delays.
      
        The longer reply:
        -----------------
      
      Firstly, sharing locks between multiple tasks is a common programming
      technique that often cannot be replaced with lockless algorithms.  As we can
      see it in the kernel [which is a quite complex program in itself], lockless
      structures are rather the exception than the norm - the current ratio of
      lockless vs.  locky code for shared data structures is somewhere between 1:10
      and 1:100.  Lockless is hard, and the complexity of lockless algorithms often
      endangers to ability to do robust reviews of said code.  I.e.  critical RT
      apps often choose lock structures to protect critical data structures, instead
      of lockless algorithms.  Furthermore, there are cases (like shared hardware,
      or other resource limits) where lockless access is mathematically impossible.
      
      Media players (such as Jack) are an example of reasonable application design
      with multiple tasks (with multiple priority levels) sharing short-held locks:
      for example, a highprio audio playback thread is combined with medium-prio
      construct-audio-data threads and low-prio display-colory-stuff threads.  Add
      video and decoding to the mix and we've got even more priority levels.
      
      So once we accept that synchronization objects (locks) are an unavoidable fact
      of life, and once we accept that multi-task userspace apps have a very fair
      expectation of being able to use locks, we've got to think about how to offer
      the option of a deterministic locking implementation to user-space.
      
      Most of the technical counter-arguments against doing priority inheritance
      only apply to kernel-space locks.  But user-space locks are different, there
      we cannot disable interrupts or make the task non-preemptible in a critical
      section, so the 'use spinlocks' argument does not apply (user-space spinlocks
      have the same priority inversion problems as other user-space locking
      constructs).  Fact is, pretty much the only technique that currently enables
      good determinism for userspace locks (such as futex-based pthread mutexes) is
      priority inheritance:
      
      Currently (without PI), if a high-prio and a low-prio task shares a lock [this
      is a quite common scenario for most non-trivial RT applications], even if all
      critical sections are coded carefully to be deterministic (i.e.  all critical
      sections are short in duration and only execute a limited number of
      instructions), the kernel cannot guarantee any deterministic execution of the
      high-prio task: any medium-priority task could preempt the low-prio task while
      it holds the shared lock and executes the critical section, and could delay it
      indefinitely.
      
        Implementation:
        ---------------
      
      As mentioned before, the userspace fastpath of PI-enabled pthread mutexes
      involves no kernel work at all - they behave quite similarly to normal
      futex-based locks: a 0 value means unlocked, and a value==TID means locked.
      (This is the same method as used by list-based robust futexes.) Userspace uses
      atomic ops to lock/unlock these mutexes without entering the kernel.
      
      To handle the slowpath, we have added two new futex ops:
      
        FUTEX_LOCK_PI
        FUTEX_UNLOCK_PI
      
      If the lock-acquire fastpath fails, [i.e.  an atomic transition from 0 to TID
      fails], then FUTEX_LOCK_PI is called.  The kernel does all the remaining work:
      if there is no futex-queue attached to the futex address yet then the code
      looks up the task that owns the futex [it has put its own TID into the futex
      value], and attaches a 'PI state' structure to the futex-queue.  The pi_state
      includes an rt-mutex, which is a PI-aware, kernel-based synchronization
      object.  The 'other' task is made the owner of the rt-mutex, and the
      FUTEX_WAITERS bit is atomically set in the futex value.  Then this task tries
      to lock the rt-mutex, on which it blocks.  Once it returns, it has the mutex
      acquired, and it sets the futex value to its own TID and returns.  Userspace
      has no other work to perform - it now owns the lock, and futex value contains
      FUTEX_WAITERS|TID.
      
      If the unlock side fastpath succeeds, [i.e.  userspace manages to do a TID ->
      0 atomic transition of the futex value], then no kernel work is triggered.
      
      If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), then
      FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the behalf of
      userspace - and it also unlocks the attached pi_state->rt_mutex and thus wakes
      up any potential waiters.
      
      Note that under this approach, contrary to other PI-futex approaches, there is
      no prior 'registration' of a PI-futex.  [which is not quite possible anyway,
      due to existing ABI properties of pthread mutexes.]
      
      Also, under this scheme, 'robustness' and 'PI' are two orthogonal properties
      of futexes, and all four combinations are possible: futex, robust-futex,
      PI-futex, robust+PI-futex.
      
        glibc support:
        --------------
      
      Ulrich Drepper and Jakub Jelinek have written glibc support for PI-futexes
      (and robust futexes), enabling robust and PI (PTHREAD_PRIO_INHERIT) POSIX
      mutexes.  (PTHREAD_PRIO_PROTECT support will be added later on too, no
      additional kernel changes are needed for that).  [NOTE: The glibc patch is
      obviously inofficial and unsupported without matching upstream kernel
      functionality.]
      
      the patch-queue and the glibc patch can also be downloaded from:
      
        http://redhat.com/~mingo/PI-futex-patches/
      
      Many thanks go to the people who helped us create this kernel feature: Steven
      Rostedt, Esben Nielsen, Benedikt Spranger, Daniel Walker, John Cooper, Arjan
      van de Ven, Oleg Nesterov and others.  Credits for related prior projects goes
      to Dirk Grambow, Inaky Perez-Gonzalez, Bill Huey and many others.
      
      Clean up the futex code, before adding more features to it:
      
       - use u32 as the futex field type - that's the ABI
       - use __user and pointers to u32 instead of unsigned long
       - code style / comment style cleanups
       - rename hash-bucket name from 'bh' to 'hb'.
      
      I checked the pre and post futex.o object files to make sure this
      patch has no code effects.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e2970f2f
  29. 31 Mar, 2006 1 commit
  30. 28 Mar, 2006 1 commit
    • Andrew Morton's avatar
      [PATCH] compat_sys_futex() warning fix · ec7e15d6
      Andrew Morton authored
      kernel/futex_compat.c: In function `compat_sys_futex':
      kernel/futex_compat.c:140: warning: passing arg 1 of `do_futex' makes integer from pointer without a cast
      kernel/futex_compat.c:140: warning: passing arg 5 of `do_futex' makes integer from pointer without a cast
      
      Not sure what Ingo was thinking of here.  Put the casts back in.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ec7e15d6
  31. 27 Mar, 2006 2 commits