1. 14 Mar, 2017 2 commits
  2. 02 Mar, 2017 2 commits
  3. 28 Feb, 2017 1 commit
  4. 13 Feb, 2017 1 commit
    • Yang Yang's avatar
      futex: Move futex_init() to core_initcall · 25f71d1c
      Yang Yang authored
      The UEVENT user mode helper is enabled before the initcalls are executed
      and is available when the root filesystem has been mounted.
      
      The user mode helper is triggered by device init calls and the executable
      might use the futex syscall.
      
      futex_init() is marked __initcall which maps to device_initcall, but there
      is no guarantee that futex_init() is invoked _before_ the first device init
      call which triggers the UEVENT user mode helper.
      
      If the user mode helper uses the futex syscall before futex_init() then the
      syscall crashes with a NULL pointer dereference because the futex subsystem
      has not been initialized yet.
      
      Move futex_init() to core_initcall so futexes are initialized before the
      root filesystem is mounted and the usermode helper becomes available.
      
      [ tglx: Rewrote changelog ]
      Signed-off-by: default avatarYang Yang <yang.yang29@zte.com.cn>
      Cc: jiang.biao2@zte.com.cn
      Cc: jiang.zhengxiong@zte.com.cn
      Cc: zhong.weidong@zte.com.cn
      Cc: deng.huali@zte.com.cn
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cnSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      25f71d1c
  5. 25 Dec, 2016 1 commit
    • Thomas Gleixner's avatar
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner authored
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  6. 21 Nov, 2016 1 commit
  7. 05 Sep, 2016 1 commit
  8. 29 Jul, 2016 1 commit
  9. 08 Jun, 2016 1 commit
    • Mel Gorman's avatar
      futex: Calculate the futex key based on a tail page for file-based futexes · 077fa7ae
      Mel Gorman authored
      Mike Galbraith reported that the LTP test case futex_wake04 was broken
      by commit 65d8fc77 ("futex: Remove requirement for lock_page()
      in get_futex_key()").
      
      This test case uses futexes backed by hugetlbfs pages and so there is an
      associated inode with a futex stored on such pages. The problem is that
      the key is being calculated based on the head page index of the hugetlbfs
      page and not the tail page.
      
      Prior to the optimisation, the page lock was used to stabilise mappings and
      pin the inode is file-backed which is overkill. If the page was a compound
      page, the head page was automatically looked up as part of the page lock
      operation but the tail page index was used to calculate the futex key.
      
      After the optimisation, the compound head is looked up early and the page
      lock is only relied upon to identify truncated pages, special pages or a
      shmem page moving to swapcache. The head page is looked up because without
      the page lock, special care has to be taken to pin the inode correctly.
      However, the tail page is still required to calculate the futex key so
      this patch records the tail page.
      
      On vanilla 4.6, the output of the test case is;
      
      futex_wake04    0  TINFO  :  Hugepagesize 2097152
      futex_wake04    1  TFAIL  :  futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs.
      
      With the patch applied
      
      futex_wake04    0  TINFO  :  Hugepagesize 2097152
      futex_wake04    1  TPASS  :  Hi hydra, thread2 awake!
      
      Fixes: 65d8fc77 "futex: Remove requirement for lock_page() in get_futex_key()"
      Reported-and-tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160608132522.GM2469@suse.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      077fa7ae
  10. 23 May, 2016 1 commit
    • Linus Torvalds's avatar
      x86: remove more uaccess_32.h complexity · bd28b145
      Linus Torvalds authored
      I'm looking at trying to possibly merge the 32-bit and 64-bit versions
      of the x86 uaccess.h implementation, but first this needs to be cleaned
      up.
      
      For example, the 32-bit version of "__copy_from_user_inatomic()" is
      mostly the special cases for the constant size, and it's actually almost
      never relevant.  Most users aren't actually using a constant size
      anyway, and the few cases that do small constant copies are better off
      just using __get_user() instead.
      
      So get rid of the unnecessary complexity.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd28b145
  11. 21 Apr, 2016 1 commit
  12. 20 Apr, 2016 1 commit
  13. 08 Mar, 2016 1 commit
  14. 17 Feb, 2016 2 commits
    • Mel Gorman's avatar
      futex: Remove requirement for lock_page() in get_futex_key() · 65d8fc77
      Mel Gorman authored
      When dealing with key handling for shared futexes, we can drastically reduce
      the usage/need of the page lock. 1) For anonymous pages, the associated futex
      object is the mm_struct which does not require the page lock. 2) For inode
      based, keys, we can check under RCU read lock if the page mapping is still
      valid and take reference to the inode. This just leaves one rare race that
      requires the page lock in the slow path when examining the swapcache.
      
      Additionally realtime users currently have a problem with the page lock being
      contended for unbounded periods of time during futex operations.
      
      Task A
           get_futex_key()
           lock_page()
          ---> preempted
      
      Now any other task trying to lock that page will have to wait until
      task A gets scheduled back in, which is an unbound time.
      
      With this patch, we pretty much have a lockless futex_get_key().
      
      Experiments show that this patch can boost/speedup the hashing of shared
      futexes with the perf futex benchmarks (which is good for measuring such
      change) by up to 45% when there are high (> 100) thread counts on a 60 core
      Westmere. Lower counts are pretty much in the noise range or less than 10%,
      but mid range can be seen at over 30% overall throughput (hash ops/sec).
      This makes anon-mem shared futexes much closer to its private counterpart.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      [ Ported on top of thp refcount rework, changelog, comments, fixes. ]
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Chris Mason <clm@fb.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      65d8fc77
    • Davidlohr Bueso's avatar
      futex: Rename barrier references in ordering guarantees · 8ad7b378
      Davidlohr Bueso authored
      Ingo suggested we rename how we reference barriers A and B
      regarding futex ordering guarantees. This patch replaces,
      for both barriers, MB (A) with smp_mb(); (A), such that:
      
       - We explicitly state that the barriers are SMP, and
      
       - We standardize how we reference these across futex.c
         helping readers follow what barrier does what and where.
      Suggested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Chris Mason <clm@fb.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1455045314-8305-2-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8ad7b378
  15. 26 Jan, 2016 1 commit
    • Thomas Gleixner's avatar
      rtmutex: Make wait_lock irq safe · b4abf910
      Thomas Gleixner authored
      Sasha reported a lockdep splat about a potential deadlock between RCU boosting
      rtmutex and the posix timer it_lock.
      
      CPU0					CPU1
      
      rtmutex_lock(&rcu->rt_mutex)
        spin_lock(&rcu->rt_mutex.wait_lock)
      					local_irq_disable()
      					spin_lock(&timer->it_lock)
      					spin_lock(&rcu->mutex.wait_lock)
      --> Interrupt
          spin_lock(&timer->it_lock)
      
      This is caused by the following code sequence on CPU1
      
           rcu_read_lock()
           x = lookup();
           if (x)
           	spin_lock_irqsave(&x->it_lock);
           rcu_read_unlock();
           return x;
      
      We could fix that in the posix timer code by keeping rcu read locked across
      the spinlocked and irq disabled section, but the above sequence is common and
      there is no reason not to support it.
      
      Taking rt_mutex.wait_lock irq safe prevents the deadlock.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      b4abf910
  16. 21 Jan, 2016 1 commit
    • Jann Horn's avatar
      ptrace: use fsuid, fsgid, effective creds for fs access checks · caaee623
      Jann Horn authored
      By checking the effective credentials instead of the real UID / permitted
      capabilities, ensure that the calling process actually intended to use its
      credentials.
      
      To ensure that all ptrace checks use the correct caller credentials (e.g.
      in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
      flag), use two new flags and require one of them to be set.
      
      The problem was that when a privileged task had temporarily dropped its
      privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
      perform following syscalls with the credentials of a user, it still passed
      ptrace access checks that the user would not be able to pass.
      
      While an attacker should not be able to convince the privileged task to
      perform a ptrace() syscall, this is a problem because the ptrace access
      check is reused for things in procfs.
      
      In particular, the following somewhat interesting procfs entries only rely
      on ptrace access checks:
      
       /proc/$pid/stat - uses the check for determining whether pointers
           should be visible, useful for bypassing ASLR
       /proc/$pid/maps - also useful for bypassing ASLR
       /proc/$pid/cwd - useful for gaining access to restricted
           directories that contain files with lax permissions, e.g. in
           this scenario:
           lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
           drwx------ root root /root
           drwxr-xr-x root root /root/foobar
           -rw-r--r-- root root /root/foobar/secret
      
      Therefore, on a system where a root-owned mode 6755 binary changes its
      effective credentials as described and then dumps a user-specified file,
      this could be used by an attacker to reveal the memory layout of root's
      processes or reveal the contents of files he is not allowed to access
      (through /proc/$pid/cwd).
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caaee623
  17. 16 Jan, 2016 2 commits
  18. 20 Dec, 2015 6 commits
  19. 04 Oct, 2015 1 commit
  20. 22 Sep, 2015 1 commit
  21. 20 Jul, 2015 3 commits
  22. 19 Jun, 2015 1 commit
  23. 19 May, 2015 1 commit
  24. 08 May, 2015 1 commit
    • Davidlohr Bueso's avatar
      futex: Implement lockless wakeups · 1d0dcb3a
      Davidlohr Bueso authored
      Given the overall futex architecture, any chance of reducing
      hb->lock contention is welcome. In this particular case, using
      wake-queues to enable lockless wakeups addresses very much real
      world performance concerns, even cases of soft-lockups in cases
      of large amounts of blocked tasks (which is not hard to find in
      large boxes, using but just a handful of futex).
      
      At the lowest level, this patch can reduce latency of a single thread
      attempting to acquire hb->lock in highly contended scenarios by a
      up to 2x. At lower counts of nr_wake there are no regressions,
      confirming, of course, that the wake_q handling overhead is practically
      non existent. For instance, while a fair amount of variation,
      the extended pef-bench wakeup benchmark shows for a 20 core machine
      the following avg per-thread time to wakeup its share of tasks:
      
      	nr_thr	ms-before	ms-after
      	16 	0.0590		0.0215
      	32 	0.0396		0.0220
      	48 	0.0417		0.0182
      	64 	0.0536		0.0236
      	80 	0.0414		0.0097
      	96 	0.0672		0.0152
      
      Naturally, this can cause spurious wakeups. However there is no core code
      that cannot handle them afaict, and furthermore tglx does have the point
      that other events can already trigger them anyway.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Mason <clm@fb.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: George Spelvin <linux@horizon.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1430494072-30283-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1d0dcb3a
  25. 22 Apr, 2015 1 commit
  26. 18 Feb, 2015 1 commit
  27. 13 Feb, 2015 1 commit
    • Andy Lutomirski's avatar
      all arches, signal: move restart_block to struct task_struct · f56141e3
      Andy Lutomirski authored
      If an attacker can cause a controlled kernel stack overflow, overwriting
      the restart block is a very juicy exploit target.  This is because the
      restart_block is held in the same memory allocation as the kernel stack.
      
      Moving the restart block to struct task_struct prevents this exploit by
      making the restart_block harder to locate.
      
      Note that there are other fields in thread_info that are also easy
      targets, at least on some architectures.
      
      It's also a decent simplification, since the restart code is more or less
      identical on all architectures.
      
      [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: default avatarRichard Weinberger <richard@nod.at>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f56141e3
  28. 19 Jan, 2015 1 commit
    • Michael Kerrisk's avatar
      futex: Fix argument handling in futex_lock_pi() calls · 996636dd
      Michael Kerrisk authored
      This patch fixes two separate buglets in calls to futex_lock_pi():
      
        * Eliminate unused 'detect' argument
        * Change unused 'timeout' argument of FUTEX_TRYLOCK_PI to NULL
      
      The 'detect' argument of futex_lock_pi() seems never to have been
      used (when it was included with the initial PI mutex implementation
      in Linux 2.6.18, all checks against its value were disabled by
      ANDing against 0 (i.e., if (detect... && 0)), and with
      commit 778e9a9c, any mention of
      this argument in futex_lock_pi() went way altogether. Its presence
      now serves only to confuse readers of the code, by giving the
      impression that the futex() FUTEX_LOCK_PI operation actually does
      use the 'val' argument. This patch removes the argument.
      
      The futex_lock_pi() call that corresponds to FUTEX_TRYLOCK_PI includes
      'timeout' as one of its arguments. This misleads the reader into thinking
      that the FUTEX_TRYLOCK_PI operation does employ timeouts for some sensible
      purpose; but it does not.  Indeed, it cannot, because the checks at the
      start of sys_futex() exclude FUTEX_TRYLOCK_PI from the set of operations
      that do copy_from_user() on the timeout argument. So, in the
      FUTEX_TRYLOCK_PI futex_lock_pi() call it would be simplest to change
      'timeout' to 'NULL'. This patch does that.
      Signed-off-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: default avatarDarren Hart <darren@dvhart.com>
      Link: http://lkml.kernel.org/r/54B96646.8010200@gmail.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      996636dd
  29. 26 Oct, 2014 1 commit
    • Brian Silverman's avatar
      futex: Fix a race condition between REQUEUE_PI and task death · 30a6b803
      Brian Silverman authored
      free_pi_state and exit_pi_state_list both clean up futex_pi_state's.
      exit_pi_state_list takes the hb lock first, and most callers of
      free_pi_state do too. requeue_pi doesn't, which means free_pi_state
      can free the pi_state out from under exit_pi_state_list. For example:
      
      task A                            |  task B
      exit_pi_state_list                |
        pi_state =                      |
            curr->pi_state_list->next   |
                                        |  futex_requeue(requeue_pi=1)
                                        |    // pi_state is the same as
                                        |    // the one in task A
                                        |    free_pi_state(pi_state)
                                        |      list_del_init(&pi_state->list)
                                        |      kfree(pi_state)
        list_del_init(&pi_state->list)  |
      
      Move the free_pi_state calls in requeue_pi to before it drops the hb
      locks which it's already holding.
      
      [ tglx: Removed a pointless free_pi_state() call and the hb->lock held
        	debugging. The latter comes via a seperate patch ]
      Signed-off-by: default avatarBrian Silverman <bsilver16384@gmail.com>
      Cc: austin.linux@gmail.com
      Cc: darren@dvhart.com
      Cc: peterz@infradead.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1414282837-23092-1-git-send-email-bsilver16384@gmail.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      30a6b803