Skip to content
Snippets Groups Projects
  1. Jul 01, 2021
  2. May 23, 2021
    • Varad Gautam's avatar
      ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry · a11ddb37
      Varad Gautam authored
      do_mq_timedreceive calls wq_sleep with a stack local address.  The
      sender (do_mq_timedsend) uses this address to later call pipelined_send.
      
      This leads to a very hard to trigger race where a do_mq_timedreceive
      call might return and leave do_mq_timedsend to rely on an invalid
      address, causing the following crash:
      
        RIP: 0010:wake_q_add_safe+0x13/0x60
        Call Trace:
         __x64_sys_mq_timedsend+0x2a9/0x490
         do_syscall_64+0x80/0x680
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f5928e40343
      
      The race occurs as:
      
      1. do_mq_timedreceive calls wq_sleep with the address of `struct
         ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
         holds a valid `struct ext_wait_queue *` as long as the stack has not
         been overwritten.
      
      2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and
         do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
         __pipelined_op.
      
      3. Sender calls __pipelined_op::smp_store_release(&this->state,
         STATE_READY).  Here is where the race window begins.  (`this` is
         `ewq_addr`.)
      
      4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
         will see `state == STATE_READY` and break.
      
      5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
         to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
         stack.  (Although the address may not get overwritten until another
         function happens to touch it, which means it can persist around for an
         indefinite time.)
      
      6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
         `struct ext_wait_queue *`, and uses it to find a task_struct to pass to
         the wake_q_add_safe call.  In the lucky case where nothing has
         overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct.
         In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
         bogus address as the receiver's task_struct causing the crash.
      
      do_mq_timedsend::__pipelined_op() should not dereference `this` after
      setting STATE_READY, as the receiver counterpart is now free to return.
      Change __pipelined_op to call wake_q_add_safe on the receiver's
      task_struct returned by get_task_struct, instead of dereferencing `this`
      which sits on the receiver's stack.
      
      As Manfred pointed out, the race potentially also exists in
      ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare.  Fix
      those in the same way.
      
      Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com
      
      
      Fixes: c5b2cbdb ("ipc/mqueue.c: update/document memory barriers")
      Fixes: 8116b54e ("ipc/sem.c: document and update memory barriers")
      Fixes: 0d97a82b ("ipc/msg.c: update and document memory barriers")
      Signed-off-by: default avatarVarad Gautam <varad.gautam@suse.com>
      Reported-by: default avatarMatthias von Faber <matthias.vonfaber@aox-tech.de>
      Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a11ddb37
  3. May 07, 2021
  4. Apr 30, 2021
  5. Jan 24, 2021
  6. Dec 15, 2020
    • Dmitry Safonov's avatar
      vm_ops: rename .split() callback to .may_split() · dd3b614f
      Dmitry Safonov authored
      Rename the callback to reflect that it's not called *on* or *after* split,
      but rather some time before the splitting to check if it's possible.
      
      Link: https://lkml.kernel.org/r/20201013013416.390574-5-dima@arista.com
      
      
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd3b614f
  7. Sep 05, 2020
  8. Aug 23, 2020
  9. Aug 19, 2020
    • Kirill Tkhai's avatar
      ipc: Use generic ns_common::count · 137ec390
      Kirill Tkhai authored
      
      Switch over ipc namespaces to use the newly introduced common lifetime
      counter.
      
      Currently every namespace type has its own lifetime counter which is stored
      in the specific namespace struct. The lifetime counters are used
      identically for all namespaces types. Namespaces may of course have
      additional unrelated counters and these are not altered.
      
      This introduces a common lifetime counter into struct ns_common. The
      ns_common struct encompasses information that all namespaces share. That
      should include the lifetime counter since its common for all of them.
      
      It also allows us to unify the type of the counters across all namespaces.
      Most of them use refcount_t but one uses atomic_t and at least one uses
      kref. Especially the last one doesn't make much sense since it's just a
      wrapper around refcount_t since 2016 and actually complicates cleanup
      operations by having to use container_of() to cast the correct namespace
      struct out of struct ns_common.
      
      Having the lifetime counter for the namespaces in one place reduces
      maintenance cost. Not just because after switching all namespaces over we
      will have removed more code than we added but also because the logic is
      more easily understandable and we indicate to the user that the basic
      lifetime requirements for all namespaces are currently identical.
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Link: https://lore.kernel.org/r/159644978697.604812.16592754423881032385.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      137ec390
  10. Aug 12, 2020
  11. Aug 07, 2020
  12. Jun 09, 2020
  13. Jun 08, 2020
  14. May 14, 2020
    • Vasily Averin's avatar
      ipc/util.c: sysvipc_find_ipc() incorrectly updates position index · 5e698222
      Vasily Averin authored
      
      Commit 89163f93 ("ipc/util.c: sysvipc_find_ipc() should increase
      position index") is causing this bug (seen on 5.6.8):
      
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
      
         # ipcmk -Q
         Message queue id: 0
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x82db8127 0          root       644        0            0
      
         # ipcmk -Q
         Message queue id: 1
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x82db8127 0          root       644        0            0
         0x76d1fb2a 1          root       644        0            0
      
         # ipcrm -q 0
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x76d1fb2a 1          root       644        0            0
         0x76d1fb2a 1          root       644        0            0
      
         # ipcmk -Q
         Message queue id: 2
         # ipcrm -q 2
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x76d1fb2a 1          root       644        0            0
         0x76d1fb2a 1          root       644        0            0
      
         # ipcmk -Q
         Message queue id: 3
         # ipcrm -q 1
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x7c982867 3          root       644        0            0
         0x7c982867 3          root       644        0            0
         0x7c982867 3          root       644        0            0
         0x7c982867 3          root       644        0            0
      
      Whenever an IPC item with a low id is deleted, the items with higher ids
      are duplicated, as if filling a hole.
      
      new_pos should jump through hole of unused ids, pos can be updated
      inside "for" cycle.
      
      Fixes: 89163f93 ("ipc/util.c: sysvipc_find_ipc() should increase position index")
      Reported-by: default avatarAndreas Schwab <schwab@suse.de>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/4921fe9b-9385-a2b4-1dc4-1099be6d2e39@virtuozzo.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e698222
  15. May 09, 2020
    • Christian Brauner's avatar
      nsproxy: add struct nsset · f2a8d52e
      Christian Brauner authored
      
      Add a simple struct nsset. It holds all necessary pieces to switch to a new
      set of namespaces without leaving a task in a half-switched state which we
      will make use of in the next patch. This patch switches the existing setns
      logic over without causing a change in setns() behavior. This brings
      setns() closer to how unshare() works(). The prepare_ns() function is
      responsible to prepare all necessary information. This has two reasons.
      First it minimizes dependencies between individual namespaces, i.e. all
      install handler can expect that all fields are properly initialized
      independent in what order they are called in. Second, this makes the code
      easier to maintain and easier to follow if it needs to be changed.
      
      The prepare_ns() helper will only be switched over to use a flags argument
      in the next patch. Here it will still use nstype as a simple integer
      argument which was argued would be clearer. I'm not particularly
      opinionated about this if it really helps or not. The struct nsset itself
      already contains the flags field since its name already indicates that it
      can contain information required by different namespaces. None of this
      should have functional consequences.
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com
      f2a8d52e
  16. May 08, 2020
  17. Apr 27, 2020
  18. Apr 10, 2020
  19. Apr 07, 2020
  20. Feb 21, 2020
    • Ioanna Alifieraki's avatar
      Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" · edf28f40
      Ioanna Alifieraki authored
      This reverts commit a9795584.
      
      Commit a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage
      in exit_sem()") removes a lock that is needed.  This leads to a process
      looping infinitely in exit_sem() and can also lead to a crash.  There is
      a reproducer available in [1] and with the commit reverted the issue
      does not reproduce anymore.
      
      Using the reproducer found in [1] is fairly easy to reach a point where
      one of the child processes is looping infinitely in exit_sem between
      for(;;) and if (semid == -1) block, while it's trying to free its last
      sem_undo structure which has already been freed by freeary().
      
      Each sem_undo struct is on two lists: one per semaphore set (list_id)
      and one per process (list_proc).  The list_id list tracks undos by
      semaphore set, and the list_proc by process.
      
      Undo structures are removed either by freeary() or by exit_sem().  The
      freeary function is invoked when the user invokes a syscall to remove a
      semaphore set.  During this operation freeary() traverses the list_id
      associated with the semaphore set and removes the undo structures from
      both the list_id and list_proc lists.
      
      For this case, exit_sem() is called at process exit.  Each process
      contains a struct sem_undo_list (referred to as "ulp") which contains
      the head for the list_proc list.  When the process exits, exit_sem()
      traverses this list to remove each sem_undo struct.  As in freeary(),
      whenever a sem_undo struct is removed from list_proc, it is also removed
      from the list_id list.
      
      Removing elements from list_id is safe for both exit_sem() and freeary()
      due to sem_lock().  Removing elements from list_proc is not safe;
      freeary() locks &un->ulp->lock when it performs
      list_del_rcu(&un->list_proc) but exit_sem() does not (locking was
      removed by commit a9795584 ("ipc,sem: remove uneeded sem_undo_list
      lock usage in exit_sem()").
      
      This can result in the following situation while executing the
      reproducer [1] : Consider a child process in exit_sem() and the parent
      in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)).
      
       - The list_proc for the child contains the last two undo structs A and
         B (the rest have been removed either by exit_sem() or freeary()).
      
       - The semid for A is 1 and semid for B is 2.
      
       - exit_sem() removes A and at the same time freeary() removes B.
      
       - Since A and B have different semid sem_lock() will acquire different
         locks for each process and both can proceed.
      
      The bug is that they remove A and B from the same list_proc at the same
      time because only freeary() acquires the ulp lock. When exit_sem()
      removes A it makes ulp->list_proc.next to point at B and at the same
      time freeary() removes B setting B->semid=-1.
      
      At the next iteration of for(;;) loop exit_sem() will try to remove B.
      
      The only way to break from for(;;) is for (&un->list_proc ==
      &ulp->list_proc) to be true which is not. Then exit_sem() will check if
      B->semid=-1 which is and will continue looping in for(;;) until the
      memory for B is reallocated and the value at B->semid is changed.
      
      At that point, exit_sem() will crash attempting to unlink B from the
      lists (this can be easily triggered by running the reproducer [1] a
      second time).
      
      To prove this scenario instrumentation was added to keep information
      about each sem_undo (un) struct that is removed per process and per
      semaphore set (sma).
      
                CPU0                                CPU1
        [caller holds sem_lock(sma for A)]      ...
        freeary()                               exit_sem()
        ...                                     ...
        ...                                     sem_lock(sma for B)
        spin_lock(A->ulp->lock)                 ...
        list_del_rcu(un_A->list_proc)           list_del_rcu(un_B->list_proc)
      
      Undo structures A and B have different semid and sem_lock() operations
      proceed.  However they belong to the same list_proc list and they are
      removed at the same time.  This results into ulp->list_proc.next
      pointing to the address of B which is already removed.
      
      After reverting commit a9795584 ("ipc,sem: remove uneeded
      sem_undo_list lock usage in exit_sem()") the issue was no longer
      reproducible.
      
      [1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779
      
      Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com
      
      
      Fixes: a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()")
      Signed-off-by: default avatarIoanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: <malat@debian.org>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      edf28f40
  21. Feb 04, 2020
  22. Dec 09, 2019
  23. Nov 15, 2019
  24. Sep 26, 2019
Loading