1. 06 Feb, 2018 6 commits
    • Matthew Wilcox's avatar
      idr: Make 1-based IDRs more efficient · 6ce711f2
      Matthew Wilcox authored
      About 20% of the IDR users in the kernel want the allocated IDs to start
      at 1.  The implementation currently searches all the way down the left
      hand side of the tree, finds no free ID other than ID 0, walks all the
      way back up, and then all the way down again.  This patch 'rebases' the
      ID so we fill the entire radix tree, rather than leave a gap at 0.
      
      Chris Wilson says: "I did the quick hack of allocating index 0 of the
      idr and that eradicated idr_get_free() from being at the top of the
      profiles for the many-object stress tests. This improvement will be
      much appreciated."
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      6ce711f2
    • Matthew Wilcox's avatar
      idr: Warn if old iterators see large IDs · 72fd6c7b
      Matthew Wilcox authored
      Now that the IDR can be used to store large IDs, it is possible somebody
      might only partially convert their old code and use the iterators which
      can only handle IDs up to INT_MAX.  It's probably unwise to show them a
      truncated ID, so settle for spewing warnings to dmesg, and terminating
      the iteration.
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      72fd6c7b
    • Matthew Wilcox's avatar
      idr: Rename idr_for_each_entry_ext · 7a457577
      Matthew Wilcox authored
      Most places in the kernel that we need to distinguish functions by the
      type of their arguments, we use '_ul' as a suffix for the unsigned long
      variant, not '_ext'.  Also add kernel-doc.
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      7a457577
    • Matthew Wilcox's avatar
      idr: Remove idr_alloc_ext · 460488c5
      Matthew Wilcox authored
      It has no more users, so remove it.  Move idr_alloc() back into idr.c,
      move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the
      wrappers around idr_get_free_cmn() and rename it to idr_get_free().
      While there is now no interface to allocate IDs larger than a u32,
      the IDR internals remain ready to handle a larger ID should a need arise.
      
      These changes make it possible to provide the guarantee that, if the
      nextid pointer points into the object, the object's ID will be initialised
      before a concurrent lookup can find the object.
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      460488c5
    • Matthew Wilcox's avatar
      idr: Add idr_alloc_u32 helper · e096f6a7
      Matthew Wilcox authored
      All current users of idr_alloc_ext() actually want to allocate a u32
      and idr_alloc_u32() fits their needs better.
      
      Like idr_get_next(), it uses a 'nextid' argument which serves as both
      a pointer to the start ID and the assigned ID (instead of a separate
      minimum and pointer-to-assigned-ID argument).  It uses a 'max' argument
      rather than 'end' because the semantics that idr_alloc has for 'end'
      don't work well for unsigned types.
      
      Since idr_alloc_u32() returns an errno instead of the allocated ID, mark
      it as __must_check to help callers use it correctly.  Include copious
      kernel-doc.  Chris Mi <chrism@mellanox.com> has promised to contribute
      test-cases for idr_alloc_u32.
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      e096f6a7
    • Matthew Wilcox's avatar
      idr: Delete idr_replace_ext function · 234a4624
      Matthew Wilcox authored
      Changing idr_replace's 'id' argument to 'unsigned long' works for all
      callers.  Callers which passed a negative ID now get -ENOENT instead of
      -EINVAL.  No callers relied on this error value.
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      234a4624
  2. 01 Feb, 2018 1 commit
    • Andrey Ryabinin's avatar
      lib/strscpy: Shut up KASAN false-positives in strscpy() · 1a3241ff
      Andrey Ryabinin authored
      strscpy() performs the word-at-a-time optimistic reads.  So it may may
      access the memory past the end of the object, which is perfectly fine
      since strscpy() doesn't use that (past-the-end) data and makes sure the
      optimistic read won't cross a page boundary.
      
      Use new read_word_at_a_time() to shut up the KASAN.
      
      Note that this potentially could hide some bugs.  In example bellow,
      stscpy() will copy more than we should (1-3 extra uninitialized bytes):
      
              char dst[8];
              char *src;
      
              src = kmalloc(5, GFP_KERNEL);
              memset(src, 0xff, 5);
              strscpy(dst, src, 8);
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a3241ff
  3. 27 Jan, 2018 1 commit
  4. 23 Jan, 2018 2 commits
    • Steven Rostedt (VMware)'s avatar
      vsprintf: Do not have bprintf dereference pointers · 841a915d
      Steven Rostedt (VMware) authored
      When trace_printk() was introduced, it was discussed that making it be as
      low overhead as possible, that the processing of the format string should be
      delayed until it is read. That is, a "trace_printk()" should not convert
      the %d into numbers and so on, but instead, save the fmt string and all the
      args in the buffer at the time of recording. When the trace_printk() data is
      read, it would then parse the format string and do the conversions of the
      saved arguments in the tracing buffer.
      
      The code to perform this was added to vsprintf where vbin_printf() would
      save the arguments of a specified format string in a buffer, then
      bstr_printf() could be used to convert the buffer with the same format
      string into the final output, as if vsprintf() was called in one go.
      
      The issue arises when dereferenced pointers are used. The problem is that
      something like %*pbl which reads a bitmask, will save the pointer to the
      bitmask in the buffer. Then the reading of the buffer via bstr_printf() will
      then look at the pointer to process the final output. Obviously the value of
      that pointer could have changed since the time it was recorded to the time
      the buffer is read. Worse yet, the bitmask could be unmapped, and the
      reading of the trace buffer could actually cause a kernel oops.
      
      Another problem is that user space tools such as perf and trace-cmd do not
      have access to the contents of these pointers, and they become useless when
      the tracing buffer is extracted.
      
      Instead of having vbin_printf() simply save the pointer in the buffer for
      later processing, have it perform the formatting at the time bin_printf() is
      called. This will fix the issue of dereferencing pointers at a later time,
      and has the extra benefit of having user space tools understand these
      values.
      
      Since perf and trace-cmd already can handle %p[sSfF] via saving kallsyms,
      their pointers are saved and not processed during vbin_printf(). If they
      were converted, it would break perf and trace-cmd, as they would not know
      how to deal with the conversion.
      
      Link: http://lkml.kernel.org/r/20171228204025.14a71d8f@gandalf.local.homeReported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      841a915d
    • Bart Van Assche's avatar
      kobject: Export kobj_ns_grab_current() and kobj_ns_drop() · 172856ea
      Bart Van Assche authored
      Make it possible to call these two functions from a kernel module.
      Note: despite their name, these two functions can be used meaningfully
      independent of kobjects. A later patch will add calls to these
      functions from the SRP driver because this patch series modifies the
      SRP driver such that it can hold a reference to a namespace that can
      last longer than the lifetime of the process through which the
      namespace reference was obtained.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      172856ea
  5. 22 Jan, 2018 2 commits
  6. 20 Jan, 2018 1 commit
  7. 19 Jan, 2018 1 commit
    • Bart Van Assche's avatar
      lib/scatterlist: Fix chaining support in sgl_alloc_order() · 8c7a8d1c
      Bart Van Assche authored
      This patch avoids that workloads with large block sizes (megabytes)
      can trigger the following call stack with the ib_srpt driver (that
      driver is the only driver that chains scatterlists allocated by
      sgl_alloc_order()):
      
      BUG: Bad page state in process kworker/0:1H  pfn:2423a78
      page:fffffb03d08e9e00 count:-3 mapcount:0 mapping:          (null) index:0x0
      flags: 0x57ffffc0000000()
      raw: 0057ffffc0000000 0000000000000000 0000000000000000 fffffffdffffffff
      raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
      page dumped because: nonzero _count
      CPU: 0 PID: 733 Comm: kworker/0:1H Tainted: G          I      4.15.0-rc7.bart+ #1
      Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
      Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      Call Trace:
       dump_stack+0x5c/0x83
       bad_page+0xf5/0x10f
       get_page_from_freelist+0xa46/0x11b0
       __alloc_pages_nodemask+0x103/0x290
       sgl_alloc_order+0x101/0x180
       target_alloc_sgl+0x2c/0x40 [target_core_mod]
       srpt_alloc_rw_ctxs+0x173/0x2d0 [ib_srpt]
       srpt_handle_new_iu+0x61e/0x7f0 [ib_srpt]
       __ib_process_cq+0x55/0xa0 [ib_core]
       ib_cq_poll_work+0x1b/0x60 [ib_core]
       process_one_work+0x141/0x340
       worker_thread+0x47/0x3e0
       kthread+0xf5/0x130
       ret_from_fork+0x1f/0x30
      
      Fixes: e80a0af4 ("lib/scatterlist: Introduce sgl_alloc() and sgl_free()")
      Reported-by: default avatarLaurence Oberman <loberman@redhat.com>
      Tested-by: default avatarLaurence Oberman <loberman@redhat.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Laurence Oberman <loberman@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8c7a8d1c
  8. 15 Jan, 2018 16 commits
  9. 13 Jan, 2018 3 commits
    • Masami Hiramatsu's avatar
      error-injection: Support fault injection framework · 4b1a29a7
      Masami Hiramatsu authored
      Support in-kernel fault-injection framework via debugfs.
      This allows you to inject a conditional error to specified
      function using debugfs interfaces.
      
      Here is the result of test script described in
      Documentation/fault-injection/fault-injection.txt
      
        ===========
        # ./test_fail_function.sh
        1+0 records in
        1+0 records out
        1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0227404 s, 46.1 MB/s
        btrfs-progs v4.4
        See http://btrfs.wiki.kernel.org for more information.
      
        Label:              (null)
        UUID:               bfa96010-12e9-4360-aed0-42eec7af5798
        Node size:          16384
        Sector size:        4096
        Filesystem size:    1001.00MiB
        Block group profiles:
          Data:             single            8.00MiB
          Metadata:         DUP              58.00MiB
          System:           DUP              12.00MiB
        SSD detected:       no
        Incompat features:  extref, skinny-metadata
        Number of devices:  1
        Devices:
           ID        SIZE  PATH
            1  1001.00MiB  /dev/loop2
      
        mount: mount /dev/loop2 on /opt/tmpmnt failed: Cannot allocate memory
        SUCCESS!
        ===========
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4b1a29a7
    • Masami Hiramatsu's avatar
      error-injection: Add injectable error types · 663faf9f
      Masami Hiramatsu authored
      Add injectable error types for each error-injectable function.
      
      One motivation of error injection test is to find software flaws,
      mistakes or mis-handlings of expectable errors. If we find such
      flaws by the test, that is a program bug, so we need to fix it.
      
      But if the tester miss input the error (e.g. just return success
      code without processing anything), it causes unexpected behavior
      even if the caller is correctly programmed to handle any errors.
      That is not what we want to test by error injection.
      
      To clarify what type of errors the caller must expect for each
      injectable function, this introduces injectable error types:
      
       - EI_ETYPE_NULL : means the function will return NULL if it
      		    fails. No ERR_PTR, just a NULL.
       - EI_ETYPE_ERRNO : means the function will return -ERRNO
      		    if it fails.
       - EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO
      		       (ERR_PTR) or NULL.
      
      ALLOW_ERROR_INJECTION() macro is expanded to get one of
      NULL, ERRNO, ERRNO_NULL to record the error type for
      each function. e.g.
      
       ALLOW_ERROR_INJECTION(open_ctree, ERRNO)
      
      This error types are shown in debugfs as below.
      
        ====
        / # cat /sys/kernel/debug/error_injection/list
        open_ctree [btrfs]	ERRNO
        io_ctl_init [btrfs]	ERRNO
        ====
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      663faf9f
    • Masami Hiramatsu's avatar
      error-injection: Separate error-injection from kprobe · 540adea3
      Masami Hiramatsu authored
      Since error-injection framework is not limited to be used
      by kprobes, nor bpf. Other kernel subsystems can use it
      freely for checking safeness of error-injection, e.g.
      livepatch, ftrace etc.
      So this separate error-injection framework from kprobes.
      
      Some differences has been made:
      
      - "kprobe" word is removed from any APIs/structures.
      - BPF_ALLOW_ERROR_INJECTION() is renamed to
        ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
      - CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
        feature. It is automatically enabled if the arch supports
        error injection feature for kprobe or ftrace etc.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      540adea3
  10. 12 Jan, 2018 1 commit
  11. 10 Jan, 2018 1 commit
    • Christoph Hellwig's avatar
      dma-mapping: move swiotlb arch helpers to a new header · ea8c64ac
      Christoph Hellwig authored
      phys_to_dma, dma_to_phys and dma_capable are helpers published by
      architecture code for use of swiotlb and xen-swiotlb only.  Drivers are
      not supposed to use these directly, but use the DMA API instead.
      
      Move these to a new asm/dma-direct.h helper, included by a
      linux/dma-direct.h wrapper that provides the default linear mapping
      unless the architecture wants to override it.
      
      In the MIPS case the existing dma-coherent.h is reused for now as
      untangling it will take a bit of work.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarRobin Murphy <robin.murphy@arm.com>
      ea8c64ac
  12. 09 Jan, 2018 3 commits
    • Alexei Starovoitov's avatar
      bpf: introduce BPF_JIT_ALWAYS_ON config · 290af866
      Alexei Starovoitov authored
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      
      v2->v3:
      - move __bpf_prog_ret0 under ifdef (Daniel)
      
      v1->v2:
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      290af866
    • Joe Perches's avatar
      treewide: Use DEVICE_ATTR_RW · b6b996b6
      Joe Perches authored
      Convert DEVICE_ATTR uses to DEVICE_ATTR_RW where possible.
      
      Done with perl script:
      
      $ git grep -w --name-only DEVICE_ATTR | \
        xargs perl -i -e 'local $/; while (<>) { s/\bDEVICE_ATTR\s*\(\s*(\w+)\s*,\s*\(?(\s*S_IRUGO\s*\|\s*S_IWUSR|\s*S_IWUSR\s*\|\s*S_IRUGO\s*|\s*0644\s*)\)?\s*,\s*\1_show\s*,\s*\1_store\s*\)/DEVICE_ATTR_RW(\1)/g; print;}'
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Acked-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Acked-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Acked-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: default avatarZhang Rui <rui.zhang@intel.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@bitmer.com>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6b996b6
    • Sergey Senozhatsky's avatar
      symbol lookup: introduce dereference_symbol_descriptor() · 04b8eb7a
      Sergey Senozhatsky authored
      dereference_symbol_descriptor() invokes appropriate ARCH specific
      function descriptor dereference callbacks:
      - dereference_kernel_function_descriptor() if the pointer is a
        kernel symbol;
      
      - dereference_module_function_descriptor() if the pointer is a
        module symbol.
      
      This is the last step needed to make '%pS/%ps' smart enough to
      handle function descriptor dereference on affected ARCHs and
      to retire '%pF/%pf'.
      
      To refresh it:
        Some architectures (ia64, ppc64, parisc64) use an indirect pointer
        for C function pointers - the function pointer points to a function
        descriptor and we need to dereference it to get the actual function
        pointer.
      
        Function descriptors live in .opd elf section and all affected
        ARCHs (ia64, ppc64, parisc64) handle it properly for kernel and
        modules. So we, technically, can decide if the dereference is
        needed by simply looking at the pointer: if it belongs to .opd
        section then we need to dereference it.
      
        The kernel and modules have their own .opd sections, obviously,
        that's why we need to split dereference_function_descriptor()
        and use separate kernel and module dereference arch callbacks.
      
      Link: http://lkml.kernel.org/r/20171206043649.GB15885@jagdpanzerIV
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Tested-by: Tony Luck <tony.luck@intel.com> #ia64
      Tested-by: Santosh Sivaraj <santosh@fossix.org> #powerpc
      Tested-by: Helge Deller <deller@gmx.de> #parisc64
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      04b8eb7a
  13. 08 Jan, 2018 1 commit
  14. 06 Jan, 2018 1 commit