1. 26 Dec, 2018 1 commit
    • David Howells's avatar
      ipc: Convert mqueue fs to fs_context · e1b836f6
      David Howells authored
      Convert the mqueue filesystem to use the filesystem context stuff.
      
      Notes:
      
       (1) The relevant ipc namespace is selected in when the context is
           initialised (and it defaults to the current task's ipc namespace).
           The caller can override this before calling vfs_get_tree().
      
       (2) Rather than simply calling kern_mount_data(), mq_init_ns() and
           mq_internal_mount() create a context, adjust it and then do the rest
           of the mount procedure.
      
       (3) The lazy mqueue mounting on creation of a new namespace is retained
           from a previous patch, but the avoidance of sget() if no superblock
           yet exists is reverted and the superblock is again keyed on the
           namespace pointer.
      
           Yes, there was a performance gain in not searching the superblock
           hash, but it's only paid once per ipc namespace - and only if someone
           uses mqueue within that namespace, so I'm not sure it's worth it,
           especially as calling sget() allows avoidance of recursion.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e1b836f6
  2. 31 Oct, 2018 2 commits
  3. 05 Oct, 2018 1 commit
  4. 03 Oct, 2018 1 commit
    • Eric W. Biederman's avatar
      signal: Distinguish between kernel_siginfo and siginfo · ae7795bc
      Eric W. Biederman authored
      Linus recently observed that if we did not worry about the padding
      member in struct siginfo it is only about 48 bytes, and 48 bytes is
      much nicer than 128 bytes for allocating on the stack and copying
      around in the kernel.
      
      The obvious thing of only adding the padding when userspace is
      including siginfo.h won't work as there are sigframe definitions in
      the kernel that embed struct siginfo.
      
      So split siginfo in two; kernel_siginfo and siginfo.  Keeping the
      traditional name for the userspace definition.  While the version that
      is used internally to the kernel and ultimately will not be padded to
      128 bytes is called kernel_siginfo.
      
      The definition of struct kernel_siginfo I have put in include/signal_types.h
      
      A set of buildtime checks has been added to verify the two structures have
      the same field offsets.
      
      To make it easy to verify the change kernel_siginfo retains the same
      size as siginfo.  The reduction in size comes in a following change.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      ae7795bc
  5. 04 Sep, 2018 1 commit
  6. 27 Aug, 2018 1 commit
    • Arnd Bergmann's avatar
      y2038: globally rename compat_time to old_time32 · 9afc5eee
      Arnd Bergmann authored
      Christoph Hellwig suggested a slightly different path for handling
      backwards compatibility with the 32-bit time_t based system calls:
      
      Rather than simply reusing the compat_sys_* entry points on 32-bit
      architectures unchanged, we get rid of those entry points and the
      compat_time types by renaming them to something that makes more sense
      on 32-bit architectures (which don't have a compat mode otherwise),
      and then share the entry points under the new name with the 64-bit
      architectures that use them for implementing the compatibility.
      
      The following types and interfaces are renamed here, and moved
      from linux/compat_time.h to linux/time32.h:
      
      old				new
      ---				---
      compat_time_t			old_time32_t
      struct compat_timeval		struct old_timeval32
      struct compat_timespec		struct old_timespec32
      struct compat_itimerspec	struct old_itimerspec32
      ns_to_compat_timeval()		ns_to_old_timeval32()
      get_compat_itimerspec64()	get_old_itimerspec32()
      put_compat_itimerspec64()	put_old_itimerspec32()
      compat_get_timespec64()		get_old_timespec32()
      compat_put_timespec64()		put_old_timespec32()
      
      As we already have aliases in place, this patch addresses only the
      instances that are relevant to the system call interface in particular,
      not those that occur in device drivers and other modules. Those
      will get handled separately, while providing the 64-bit version
      of the respective interfaces.
      
      I'm not renaming the timex, rusage and itimerval structures, as we are
      still debating what the new interface will look like, and whether we
      will need a replacement at all.
      
      This also doesn't change the names of the syscall entry points, which can
      be done more easily when we actually switch over the 32-bit architectures
      to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
      SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      9afc5eee
  7. 22 Aug, 2018 10 commits
  8. 02 Aug, 2018 1 commit
  9. 27 Jul, 2018 1 commit
  10. 12 Jul, 2018 2 commits
  11. 22 Jun, 2018 1 commit
    • NeilBrown's avatar
      rhashtable: split rhashtable.h · 0eb71a9d
      NeilBrown authored
      Due to the use of rhashtables in net namespaces,
      rhashtable.h is included in lots of the kernel,
      so a small changes can required a large recompilation.
      This makes development painful.
      
      This patch splits out rhashtable-types.h which just includes
      the major type declarations, and does not include (non-trivial)
      inline code.  rhashtable.h is no longer included by anything
      in the include/ directory.
      Common include files only include rhashtable-types.h so a large
      recompilation is only triggered when that changes.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eb71a9d
  12. 14 Jun, 2018 2 commits
  13. 12 Jun, 2018 1 commit
    • Kees Cook's avatar
      treewide: kvmalloc() -> kvmalloc_array() · 344476e1
      Kees Cook authored
      The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This
      patch replaces cases of:
      
              kvmalloc(a * b, gfp)
      
      with:
              kvmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kvmalloc(a * b * c, gfp)
      
      with:
      
              kvmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kvmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kvmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kvmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kvmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kvmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kvmalloc
      + kvmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kvmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kvmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kvmalloc(C1 * C2 * C3, ...)
      |
        kvmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kvmalloc(sizeof(THING) * C2, ...)
      |
        kvmalloc(sizeof(TYPE) * C2, ...)
      |
        kvmalloc(C1 * C2 * C3, ...)
      |
        kvmalloc(C1 * C2, ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      344476e1
  14. 26 May, 2018 2 commits
  15. 20 Apr, 2018 5 commits
    • Arnd Bergmann's avatar
      y2038: ipc: Redirect ipc(SEMTIMEDOP, ...) to compat_ksys_semtimedop · 5dc0b152
      Arnd Bergmann authored
      32-bit architectures implementing 64BIT_TIME and COMPAT_32BIT_TIME
      need to have the traditional semtimedop() behavior with 32-bit timestamps
      for sys_ipc() by calling compat_ksys_semtimedop(), while those that
      are not yet converted need to keep using ksys_semtimedop() like
      64-bit architectures do.
      
      Note that I chose to not implement a new SEMTIMEDOP64 function that
      corresponds to the new sys_semtimedop() with 64-bit timeouts. The reason
      here is that sys_ipc() should no longer be used for new system calls,
      and libc should just call the semtimedop syscall directly.
      
      One open question remain to whether we want to completely avoid the
      sys_ipc() system call for architectures that do not yet have all the
      individual calls as they get converted to 64-bit time_t. Doing that
      would require adding several extra system calls on m68k, mips, powerpc,
      s390, sh, sparc, and x86-32.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      5dc0b152
    • Arnd Bergmann's avatar
      y2038: ipc: Enable COMPAT_32BIT_TIME · b0d17578
      Arnd Bergmann authored
      Three ipc syscalls (mq_timedsend, mq_timedreceive and and semtimedop)
      take a timespec argument. After we move 32-bit architectures over to
      useing 64-bit time_t based syscalls, we need seperate entry points for
      the old 32-bit based interfaces.
      
      This changes the #ifdef guards for the existing 32-bit compat syscalls
      to check for CONFIG_COMPAT_32BIT_TIME instead, which will then be
      enabled on all existing 32-bit architectures.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      b0d17578
    • Arnd Bergmann's avatar
      y2038: ipc: Use __kernel_timespec · 21fc538d
      Arnd Bergmann authored
      This is a preparatation for changing over __kernel_timespec to 64-bit
      times, which involves assigning new system call numbers for mq_timedsend(),
      mq_timedreceive() and semtimedop() for compatibility with future y2038
      proof user space.
      
      The existing ABIs will remain available through compat code.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      21fc538d
    • Arnd Bergmann's avatar
      y2038: ipc: Report long times to user space · c2ab975c
      Arnd Bergmann authored
      The shmid64_ds/semid64_ds/msqid64_ds data structures have been extended
      to contain extra fields for storing the upper bits of the time stamps,
      this patch does the other half of the job and and fills the new fields on
      32-bit architectures as well as 32-bit tasks running on a 64-bit kernel
      in compat mode.
      
      There should be no change for native 64-bit tasks.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      c2ab975c
    • Arnd Bergmann's avatar
      y2038: ipc: Use ktime_get_real_seconds consistently · 2a70b787
      Arnd Bergmann authored
      In some places, we still used get_seconds() instead of
      ktime_get_real_seconds(), and I'm changing the remaining ones now to
      all use ktime_get_real_seconds() so we use the full available range for
      timestamps instead of overflowing the 'unsigned long' return value in
      year 2106 on 32-bit kernels.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      2a70b787
  16. 14 Apr, 2018 1 commit
    • Eric Biggers's avatar
      ipc/shm: fix use-after-free of shm file via remap_file_pages() · 3f05317d
      Eric Biggers authored
      syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
      shm_get_unmapped_area(), called via sys_remap_file_pages().
      
      Unfortunately it couldn't generate a reproducer, but I found a bug which
      I think caused it.  When remap_file_pages() is passed a full System V
      shared memory segment, the memory is first unmapped, then a new map is
      created using the ->vm_file.  Between these steps, the shm ID can be
      removed and reused for a new shm segment.  But, shm_mmap() only checks
      whether the ID is currently valid before calling the underlying file's
      ->mmap(); it doesn't check whether it was reused.  Thus it can use the
      wrong underlying file, one that was already freed.
      
      Fix this by making the "outer" shm file (the one that gets put in
      ->vm_file) hold a reference to the real shm file, and by making
      __shm_open() require that the file associated with the shm ID matches
      the one associated with the "outer" file.
      
      Taking the reference to the real shm file is needed to fully solve the
      problem, since otherwise sfd->file could point to a freed file, which
      then could be reallocated for the reused shm ID, causing the wrong shm
      segment to be mapped (and without the required permission checks).
      
      Commit 1ac0b6de ("ipc/shm: handle removed segments gracefully in
      shm_mmap()") almost fixed this bug, but it didn't go far enough because
      it didn't consider the case where the shm ID is reused.
      
      The following program usually reproduces this bug:
      
      	#include <stdlib.h>
      	#include <sys/shm.h>
      	#include <sys/syscall.h>
      	#include <unistd.h>
      
      	int main()
      	{
      		int is_parent = (fork() != 0);
      		srand(getpid());
      		for (;;) {
      			int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
      			if (is_parent) {
      				void *addr = shmat(id, NULL, 0);
      				usleep(rand() % 50);
      				while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
      			} else {
      				usleep(rand() % 50);
      				shmctl(id, IPC_RMID, NULL);
      			}
      		}
      	}
      
      It causes the following NULL pointer dereference due to a 'struct file'
      being used while it's being freed.  (I couldn't actually get a KASAN
      use-after-free splat like in the syzbot report.  But I think it's
      possible with this bug; it would just take a more extraordinary race...)
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
      	PGD 0 P4D 0
      	Oops: 0000 [#1] SMP NOPTI
      	CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16 #189
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      	RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
      	RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
      	[...]
      	Call Trace:
      	 file_accessed include/linux/fs.h:2063 [inline]
      	 shmem_mmap+0x25/0x40 mm/shmem.c:2149
      	 call_mmap include/linux/fs.h:1789 [inline]
      	 shm_mmap+0x34/0x80 ipc/shm.c:465
      	 call_mmap include/linux/fs.h:1789 [inline]
      	 mmap_region+0x309/0x5b0 mm/mmap.c:1712
      	 do_mmap+0x294/0x4a0 mm/mmap.c:1483
      	 do_mmap_pgoff include/linux/mm.h:2235 [inline]
      	 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
      	 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
      	 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
      	 entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      [ebiggers@google.com: add comment]
        Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
      Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
      Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
      Fixes: c8d78c18 ("mm: replace remap_file_pages() syscall with emulation")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f05317d
  17. 11 Apr, 2018 5 commits
    • Andrew Morton's avatar
      ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops · a61fc2cb
      Andrew Morton authored
      This was added by the recent "ipc/shm.c: add split function to
      shm_vm_ops", but it is not necessary.
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a61fc2cb
    • Davidlohr Bueso's avatar
      ipc/msg: introduce msgctl(MSG_STAT_ANY) · 23c8cec8
      Davidlohr Bueso authored
      There is a permission discrepancy when consulting msq ipc object
      metadata between /proc/sysvipc/msg (0444) and the MSG_STAT shmctl
      command.  The later does permission checks for the object vs S_IRUGO.
      As such there can be cases where EACCESS is returned via syscall but the
      info is displayed anyways in the procfs files.
      
      While this might have security implications via info leaking (albeit no
      writing to the msq metadata), this behavior goes way back and showing
      all the objects regardless of the permissions was most likely an
      overlook - so we are stuck with it.  Furthermore, modifying either the
      syscall or the procfs file can cause userspace programs to break (ie
      ipcs).  Some applications require getting the procfs info (without root
      privileges) and can be rather slow in comparison with a syscall -- up to
      500x in some reported cases for shm.
      
      This patch introduces a new MSG_STAT_ANY command such that the msq ipc
      object permissions are ignored, and only audited instead.  In addition,
      I've left the lsm security hook checks in place, as if some policy can
      block the call, then the user has no other choice than just parsing the
      procfs file.
      
      Link: http://lkml.kernel.org/r/20180215162458.10059-4-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarRobert Kettler <robert.kettler@outlook.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23c8cec8
    • Davidlohr Bueso's avatar
      ipc/sem: introduce semctl(SEM_STAT_ANY) · a280d6dc
      Davidlohr Bueso authored
      There is a permission discrepancy when consulting shm ipc object
      metadata between /proc/sysvipc/sem (0444) and the SEM_STAT semctl
      command.  The later does permission checks for the object vs S_IRUGO.
      As such there can be cases where EACCESS is returned via syscall but the
      info is displayed anyways in the procfs files.
      
      While this might have security implications via info leaking (albeit no
      writing to the sma metadata), this behavior goes way back and showing
      all the objects regardless of the permissions was most likely an
      overlook - so we are stuck with it.  Furthermore, modifying either the
      syscall or the procfs file can cause userspace programs to break (ie
      ipcs).  Some applications require getting the procfs info (without root
      privileges) and can be rather slow in comparison with a syscall -- up to
      500x in some reported cases for shm.
      
      This patch introduces a new SEM_STAT_ANY command such that the sem ipc
      object permissions are ignored, and only audited instead.  In addition,
      I've left the lsm security hook checks in place, as if some policy can
      block the call, then the user has no other choice than just parsing the
      procfs file.
      
      Link: http://lkml.kernel.org/r/20180215162458.10059-3-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarRobert Kettler <robert.kettler@outlook.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a280d6dc
    • Davidlohr Bueso's avatar
      ipc/shm: introduce shmctl(SHM_STAT_ANY) · c21a6970
      Davidlohr Bueso authored
      Patch series "sysvipc: introduce STAT_ANY commands", v2.
      
      The following patches adds the discussed (see [1]) new command for shm
      as well as for sems and msq as they are subject to the same
      discrepancies for ipc object permission checks between the syscall and
      via procfs.  These new commands are justified in that (1) we are stuck
      with this semantics as changing syscall and procfs can break userland;
      and (2) some users can benefit from performance (for large amounts of
      shm segments, for example) from not having to parse the procfs
      interface.
      
      Once merged, I will submit the necesary manpage updates.  But I'm thinking
      something like:
      
      : diff --git a/man2/shmctl.2 b/man2/shmctl.2
      : index 7bb503999941..bb00bbe21a57 100644
      : --- a/man2/shmctl.2
      : +++ b/man2/shmctl.2
      : @@ -41,6 +41,7 @@
      :  .\" 2005-04-25, mtk -- noted aberrant Linux behavior w.r.t. new
      :  .\"	attaches to a segment that has already been marked for deletion.
      :  .\" 2005-08-02, mtk: Added IPC_INFO, SHM_INFO, SHM_STAT descriptions.
      : +.\" 2018-02-13, dbueso: Added SHM_STAT_ANY description.
      :  .\"
      :  .TH SHMCTL 2 2017-09-15 "Linux" "Linux Programmer's Manual"
      :  .SH NAME
      : @@ -242,6 +243,18 @@ However, the
      :  argument is not a segment identifier, but instead an index into
      :  the kernel's internal array that maintains information about
      :  all shared memory segments on the system.
      : +.TP
      : +.BR SHM_STAT_ANY " (Linux-specific)"
      : +Return a
      : +.I shmid_ds
      : +structure as for
      : +.BR SHM_STAT .
      : +However, the
      : +.I shm_perm.mode
      : +is not checked for read access for
      : +.IR shmid ,
      : +resembing the behaviour of
      : +/proc/sysvipc/shm.
      :  .PP
      :  The caller can prevent or allow swapping of a shared
      :  memory segment with the following \fIcmd\fP values:
      : @@ -287,7 +300,7 @@ operation returns the index of the highest used entry in the
      :  kernel's internal array recording information about all
      :  shared memory segments.
      :  (This information can be used with repeated
      : -.B SHM_STAT
      : +.B SHM_STAT/SHM_STAT_ANY
      :  operations to obtain information about all shared memory segments
      :  on the system.)
      :  A successful
      : @@ -328,7 +341,7 @@ isn't accessible.
      :  \fIshmid\fP is not a valid identifier, or \fIcmd\fP
      :  is not a valid command.
      :  Or: for a
      : -.B SHM_STAT
      : +.B SHM_STAT/SHM_STAT_ANY
      :  operation, the index value specified in
      :  .I shmid
      :  referred to an array slot that is currently unused.
      
      This patch (of 3):
      
      There is a permission discrepancy when consulting shm ipc object metadata
      between /proc/sysvipc/shm (0444) and the SHM_STAT shmctl command.  The
      later does permission checks for the object vs S_IRUGO.  As such there can
      be cases where EACCESS is returned via syscall but the info is displayed
      anyways in the procfs files.
      
      While this might have security implications via info leaking (albeit no
      writing to the shm metadata), this behavior goes way back and showing all
      the objects regardless of the permissions was most likely an overlook - so
      we are stuck with it.  Furthermore, modifying either the syscall or the
      procfs file can cause userspace programs to break (ie ipcs).  Some
      applications require getting the procfs info (without root privileges) and
      can be rather slow in comparison with a syscall -- up to 500x in some
      reported cases.
      
      This patch introduces a new SHM_STAT_ANY command such that the shm ipc
      object permissions are ignored, and only audited instead.  In addition,
      I've left the lsm security hook checks in place, as if some policy can
      block the call, then the user has no other choice than just parsing the
      procfs file.
      
      [1] https://lkml.org/lkml/2017/12/19/220
      
      Link: http://lkml.kernel.org/r/20180215162458.10059-2-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Robert Kettler <robert.kettler@outlook.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c21a6970
    • Alexey Dobriyan's avatar
      proc: move /proc/sysvipc creation to where it belongs · e74a0eff
      Alexey Dobriyan authored
      Move the proc_mkdir() call within the sysvipc subsystem such that we
      avoid polluting proc_root_init() with petty cpp.
      
      [dave@stgolabs.net: contributed changelog]
      Link: http://lkml.kernel.org/r/20180216161732.GA10297@avx2Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e74a0eff
  18. 02 Apr, 2018 2 commits