1. 15 Jan, 2018 1 commit
  2. 06 Jan, 2018 1 commit
  3. 18 Nov, 2017 1 commit
  4. 13 Nov, 2017 1 commit
    • Rob Herring's avatar
      kconfig: kill off GENERIC_IO option · 9de8da47
      Rob Herring authored
      The GENERIC_IO option is set for every architecture except tile and score
      as those define NO_IOMEM. The option only controls visibility of
      CONFIG_MTD which doesn't appear to be necessary for any reason, so let's
      just remove GENERIC_IO.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
      Cc: Marek Vasut <marek.vasut@gmail.com>
      Cc: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Cc: linux-mtd@lists.infradead.org
      Acked-by: default avatarRichard Weinberger <richard@nod.at>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      9de8da47
  5. 25 Sep, 2017 1 commit
  6. 09 Sep, 2017 1 commit
  7. 31 Aug, 2017 1 commit
    • Robin Murphy's avatar
      libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7
      Robin Murphy authored
      mmio_flush_range() suffers from a lack of clearly-defined semantics,
      and is somewhat ambiguous to port to other architectures where the
      scope of the writeback implied by "flush" and ordering might matter,
      but MMIO would tend to imply non-cacheable anyway. Per the rationale
      in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
      only existing use is actually to invalidate clean cache lines for
      ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
      cleanup of the pmem API, that also now happens to be the exact purpose
      of arch_invalidate_pmem(), which would be a far more well-defined tool
      for the job.
      
      Rather than risk potentially inconsistent implementations of
      mmio_flush_range() for the sake of one callsite, streamline things by
      removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
      definitions up to the libnvdimm level, so they can be shared by NFIT
      as well. This allows NFIT to be enabled for arm64.
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      5deb67f7
  8. 15 Aug, 2017 2 commits
    • Nick Terrell's avatar
      lib: Add zstd modules · 73f3d1b4
      Nick Terrell authored
      Add zstd compression and decompression kernel modules.
      zstd offers a wide varity of compression speed and quality trade-offs.
      It can compress at speeds approaching lz4, and quality approaching lzma.
      zstd decompressions at speeds more than twice as fast as zlib, and
      decompression speed remains roughly the same across all compression levels.
      
      The code was ported from the upstream zstd source repository. The
      `linux/zstd.h` header was modified to match linux kernel style.
      The cross-platform and allocation code was stripped out. Instead zstd
      requires the caller to pass a preallocated workspace. The source files
      were clang-formatted [1] to match the Linux Kernel style as much as
      possible. Otherwise, the code was unmodified. We would like to avoid
      as much further manual modification to the source code as possible, so it
      will be easier to keep the kernel zstd up to date.
      
      I benchmarked zstd compression as a special character device. I ran zstd
      and zlib compression at several levels, as well as performing no
      compression, which measure the time spent copying the data to kernel space.
      Data is passed to the compresser 4096 B at a time. The benchmark file is
      located in the upstream zstd source repository under
      `contrib/linux-kernel/zstd_compress_test.c` [2].
      
      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
      The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
      16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
      211,988,480 B large. Run the following commands for the benchmark:
      
          sudo modprobe zstd_compress_test
          sudo mknod zstd_compress_test c 245 0
          sudo cp silesia.tar zstd_compress_test
      
      The time is reported by the time of the userland `cp`.
      The MB/s is computed with
      
          1,536,217,008 B / time(buffer size, hash)
      
      which includes the time to copy from userland.
      The Adjusted MB/s is computed with
      
          1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
      
      The memory reported is the amount of memory the compressor requests.
      
      | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
      |----------|----------|----------|-------|---------|----------|----------|
      | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
      | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
      | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
      | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
      | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
      | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
      | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
      | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
      | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
      | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
      | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
      
      I benchmarked zstd decompression using the same method on the same machine.
      The benchmark file is located in the upstream zstd repo under
      `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is
      the amount of memory required to decompress data compressed with the given
      compression level. If you know the maximum size of your input, you can
      reduce the memory usage of decompression irrespective of the compression
      level.
      
      | Method   | Time (s) | MB/s    | Adjusted MB/s | Memory (MB) |
      |----------|----------|---------|---------------|-------------|
      | none     |    0.025 | 8479.54 |             - |           - |
      | zstd -1  |    0.358 |  592.15 |        636.60 |        0.84 |
      | zstd -3  |    0.396 |  535.32 |        571.40 |        1.46 |
      | zstd -5  |    0.396 |  535.32 |        571.40 |        1.46 |
      | zstd -10 |    0.374 |  566.81 |        607.42 |        2.51 |
      | zstd -15 |    0.379 |  559.34 |        598.84 |        4.61 |
      | zstd -19 |    0.412 |  514.54 |        547.77 |        8.80 |
      | zlib -1  |    0.940 |  225.52 |        231.68 |        0.04 |
      | zlib -3  |    0.883 |  240.08 |        247.07 |        0.04 |
      | zlib -6  |    0.844 |  251.17 |        258.84 |        0.04 |
      | zlib -9  |    0.837 |  253.27 |        287.64 |        0.04 |
      
      Tested in userland using the test-suite in the zstd repo under
      `contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel
      functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under
      `contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8]
      with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the
      BtrFS and SquashFS patches coming next.
      
      [1] https://clang.llvm.org/docs/ClangFormat.html
      [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c
      [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
      [4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c
      [5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp
      [6] http://llvm.org/docs/LibFuzzer.html
      [7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c
      [8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c
      
      zstd source repository: https://github.com/facebook/zstdSigned-off-by: default avatarNick Terrell <terrelln@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      73f3d1b4
    • Nick Terrell's avatar
      lib: Add xxhash module · 5d240522
      Nick Terrell authored
      Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
      extremely fast non-cryptographic hash algorithm for checksumming.
      The zstd compression and decompression modules added in the next patch
      require xxhash. I extracted it out from zstd since it is useful on its
      own. I copied the code from the upstream XXHash source repository and
      translated it into kernel style. I ran benchmarks and tests in the kernel
      and tests in userland.
      
      I benchmarked xxhash as a special character device. I ran in four modes,
      no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
      kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
      hashes on the copied data. I also ran it with four different buffer sizes.
      The benchmark file is located in the upstream zstd source repository under
      `contrib/linux-kernel/xxhash_test.c` [1].
      
      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
      The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
      16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
      from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
      Run the following commands for the benchmark:
      
          modprobe xxhash_test
          mknod xxhash_test c 245 0
          time cp filesystem.squashfs xxhash_test
      
      The time is reported by the time of the userland `cp`.
      The GB/s is computed with
      
          1,536,217,008 B / time(buffer size, hash)
      
      which includes the time to copy from userland.
      The Normalized GB/s is computed with
      
          1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
      
      | Buffer Size (B) | Hash  | Time (s) | GB/s | Adjusted GB/s |
      |-----------------|-------|----------|------|---------------|
      |            1024 | none  |    0.408 | 3.77 |             - |
      |            1024 | xxh32 |    0.649 | 2.37 |          6.37 |
      |            1024 | xxh64 |    0.542 | 2.83 |         11.46 |
      |            1024 | crc32 |    1.290 | 1.19 |          1.74 |
      |            4096 | none  |    0.380 | 4.04 |             - |
      |            4096 | xxh32 |    0.645 | 2.38 |          5.79 |
      |            4096 | xxh64 |    0.500 | 3.07 |         12.80 |
      |            4096 | crc32 |    1.168 | 1.32 |          1.95 |
      |            8192 | none  |    0.351 | 4.38 |             - |
      |            8192 | xxh32 |    0.614 | 2.50 |          5.84 |
      |            8192 | xxh64 |    0.464 | 3.31 |         13.60 |
      |            8192 | crc32 |    1.163 | 1.32 |          1.89 |
      |           16384 | none  |    0.346 | 4.43 |             - |
      |           16384 | xxh32 |    0.590 | 2.60 |          6.30 |
      |           16384 | xxh64 |    0.466 | 3.30 |         12.80 |
      |           16384 | crc32 |    1.183 | 1.30 |          1.84 |
      
      Tested in userland using the test-suite in the zstd repo under
      `contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
      kernel functions. A line in each branch of every function in `xxhash.c`
      was commented out to ensure that the test-suite fails. Additionally
      tested while testing zstd and with SMHasher [3].
      
      [1] https://phabricator.intern.facebook.com/P57526246
      [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
      [3] https://github.com/aappleby/smhasher
      
      zstd source repository: https://github.com/facebook/zstd
      XXHash source repository: https://github.com/cyan4973/xxhashSigned-off-by: default avatarNick Terrell <terrelln@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      5d240522
  9. 09 Jun, 2017 2 commits
    • Dan Williams's avatar
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams authored
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      0aed55af
    • Jeremy Kerr's avatar
      lib: Add crc4 module · 0cbaa448
      Jeremy Kerr authored
      Add a little helper for crc4 calculations. This works 4-bits-at-a-time,
      using a simple table approach.
      
      We will need this in the FSI core code, as well as any master
      implementations that need to calculate CRCs in software.
      Signed-off-by: default avatarJeremy Kerr <jk@ozlabs.org>
      Signed-off-by: default avatarChris Bostic <cbostic@linux.vnet.ibm.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cbaa448
  10. 27 Feb, 2017 1 commit
  11. 25 Feb, 2017 2 commits
  12. 24 Feb, 2017 1 commit
  13. 23 Feb, 2017 1 commit
  14. 03 Feb, 2017 1 commit
    • Jiri Pirko's avatar
      lib: Introduce priority array area manager · 44091d29
      Jiri Pirko authored
      This introduces a infrastructure for management of linear priority
      areas. Priority order in an array matters, however order of items inside
      a priority group does not matter.
      
      As an initial implementation, L-sort algorithm is used. It is quite
      trivial. More advanced algorithm called P-sort will be introduced as a
      follow-up. The infrastructure is prepared for other algos.
      
      Alongside this, a testing module is introduced as well.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44091d29
  15. 24 Jan, 2017 2 commits
  16. 27 Dec, 2016 1 commit
  17. 08 Oct, 2016 1 commit
    • Vineet Gupta's avatar
      atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE · 51a02124
      Vineet Gupta authored
      This came to light when implementing native 64-bit atomics for ARCv2.
      
      The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
      to check whether atomic64_dec_if_positive() is available.  It seems it
      was needed when not every arch defined it.  However as of current code
      the Kconfig option seems needless
      
       - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
         generic definition of API is present lib/atomic64.c
       - arches with native 64-bit atomics select it in arch/*/Kconfig and
         define the API in their headers
      
      So I see no point in keeping the Kconfig option
      
      Compile tested for:
       - blackfin (CONFIG_GENERIC_ATOMIC64)
       - x86 (!CONFIG_GENERIC_ATOMIC64)
       - ia64
      
      Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.comSigned-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ming Lin <ming.l@ssi.samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51a02124
  18. 17 Sep, 2016 1 commit
  19. 21 May, 2016 1 commit
  20. 15 Apr, 2016 1 commit
  21. 25 Mar, 2016 1 commit
    • Alexander Potapenko's avatar
      mm, kasan: stackdepot implementation. Enable stackdepot for SLAB · cd11016e
      Alexander Potapenko authored
      Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
      will allow KASAN store allocation/deallocation stack traces for memory
      chunks.  The stack traces are stored in a hash table and referenced by
      handles which reside in the kasan_alloc_meta and kasan_free_meta
      structures in the allocated memory chunks.
      
      IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
      duplication.
      
      Right now stackdepot support is only enabled in SLAB allocator.  Once
      KASAN features in SLAB are on par with those in SLUB we can switch SLUB
      to stackdepot as well, thus removing the dependency on SLUB stack
      bookkeeping, which wastes a lot of memory.
      
      This patch is based on the "mm: kasan: stack depots" patch originally
      prepared by Dmitry Chernenkov.
      
      Joonsoo has said that he plans to reuse the stackdepot code for the
      mm/page_owner.c debugging facility.
      
      [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
      [aryabinin@virtuozzo.com: comment style fixes]
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd11016e
  22. 18 Jan, 2016 1 commit
    • Arnd Bergmann's avatar
      lib: sw842: select crc32 · 5b571677
      Arnd Bergmann authored
      The sw842 library code was merged in linux-4.1 and causes a very rare randconfig
      failure when CONFIG_CRC32 is not set:
      
          lib/built-in.o: In function `sw842_compress':
          oid_registry.c:(.text+0x12ddc): undefined reference to `crc32_be'
          lib/built-in.o: In function `sw842_decompress':
          oid_registry.c:(.text+0x137e4): undefined reference to `crc32_be'
      
      This adds an explict 'select CRC32' statement, similar to what the other users
      of the crc32 code have. In practice, CRC32 is always enabled anyway because
      over 100 other symbols select it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 2da572c9 ("lib: add software 842 compression/decompression")
      Acked-by: default avatarDan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      5b571677
  23. 11 Dec, 2015 1 commit
  24. 08 Dec, 2015 1 commit
  25. 16 Oct, 2015 1 commit
  26. 27 Aug, 2015 1 commit
    • Ross Zwisler's avatar
      nd_blk: change aperture mapping from WC to WB · 67a3e8fe
      Ross Zwisler authored
      This should result in a pretty sizeable performance gain for reads.  For
      rough comparison I did some simple read testing using PMEM to compare
      reads of write combining (WC) mappings vs write-back (WB).  This was
      done on a random lab machine.
      
      PMEM reads from a write combining mapping:
      	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
      	100000+0 records in
      	100000+0 records out
      	409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s
      
      PMEM reads from a write-back mapping:
      	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
      	1000000+0 records in
      	1000000+0 records out
      	4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s
      
      To be able to safely support a write-back aperture I needed to add
      support for the "read flush" _DSM flag, as outlined in the DSM spec:
      
      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
      
      This flag tells the ND BLK driver that it needs to flush the cache lines
      associated with the aperture after the aperture is moved but before any
      new data is read.  This ensures that any stale cache lines from the
      previous contents of the aperture will be discarded from the processor
      cache, and the new data will be read properly from the DIMM.  We know
      that the cache lines are clean and will be discarded without any
      writeback because either a) the previous aperture operation was a read,
      and we never modified the contents of the aperture, or b) the previous
      aperture operation was a write and we must have written back the dirtied
      contents of the aperture to the DIMM before the I/O was completed.
      
      In order to add support for the "read flush" flag I needed to add a
      generic routine to invalidate cache lines, mmio_flush_range().  This is
      protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
      only supported on x86.
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      67a3e8fe
  27. 24 Aug, 2015 1 commit
    • Robert Jarzmik's avatar
      lib: scatterlist: add sg splitting function · f8bcbe62
      Robert Jarzmik authored
      Sometimes a scatter-gather has to be split into several chunks, or sub
      scatter lists. This happens for example if a scatter list will be
      handled by multiple DMA channels, each one filling a part of it.
      
      A concrete example comes with the media V4L2 API, where the scatter list
      is allocated from userspace to hold an image, regardless of the
      knowledge of how many DMAs will fill it :
       - in a simple RGB565 case, one DMA will pump data from the camera ISP
         to memory
       - in the trickier YUV422 case, 3 DMAs will pump data from the camera
         ISP pipes, one for pipe Y, one for pipe U and one for pipe V
      
      For these cases, it is necessary to split the original scatter list into
      multiple scatter lists, which is the purpose of this patch.
      
      The guarantees that are required for this patch are :
       - the intersection of spans of any couple of resulting scatter lists is
         empty.
       - the union of spans of all resulting scatter lists is a subrange of
         the span of the original scatter list.
       - streaming DMA API operations (mapping, unmapping) should not happen
         both on both the resulting and the original scatter list. It's either
         the first or the later ones.
       - the caller is reponsible to call kfree() on the resulting
         scatterlists.
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f8bcbe62
  28. 20 Aug, 2015 1 commit
  29. 15 Aug, 2015 1 commit
  30. 26 Jun, 2015 1 commit
  31. 11 May, 2015 1 commit
    • Dan Streetman's avatar
      lib: add software 842 compression/decompression · 2da572c9
      Dan Streetman authored
      Add 842-format software compression and decompression functions.
      Update the MAINTAINERS 842 section to include the new files.
      
      The 842 compression function can compress any input data into the 842
      compression format.  The 842 decompression function can decompress any
      standard-format 842 compressed data - specifically, either a compressed
      data buffer created by the 842 software compression function, or a
      compressed data buffer created by the 842 hardware compressor (located
      in PowerPC coprocessors).
      
      The 842 compressed data format is explained in the header comments.
      
      This is used in a later patch to provide a full software 842 compression
      and decompression crypto interface.
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      2da572c9
  32. 17 Apr, 2015 1 commit
  33. 10 Mar, 2015 1 commit
  34. 17 Feb, 2015 1 commit
  35. 07 Jan, 2015 1 commit
  36. 22 Dec, 2014 1 commit