1. 09 Apr, 2012 1 commit
  2. 05 Apr, 2012 1 commit
  3. 02 Apr, 2012 1 commit
  4. 29 Mar, 2012 3 commits
  5. 23 Mar, 2012 1 commit
  6. 20 Mar, 2012 1 commit
  7. 14 Mar, 2012 7 commits
    • Jussi Kivilinna's avatar
      crypto: camellia - add assembler implementation for x86_64 · 0b95ec56
      Jussi Kivilinna authored
      Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
      functions are provided. First set is regular 'one-block at time' encrypt/decrypt
      functions. Second is 'two-block at time' functions that gain performance increase
      on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way
      functions with in-order CPUs.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmark results:
      
      AMD Phenom II 1055T (fam:16, model:10):
      
      camellia-asm vs camellia_generic:
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.27x   1.22x   1.30x   1.42x   1.30x   1.34x   1.19x   1.05x   1.23x   1.24x
      64B     1.74x   1.79x   1.43x   1.87x   1.81x   1.87x   1.48x   1.38x   1.55x   1.62x
      256B    1.90x   1.87x   1.43x   1.94x   1.94x   1.95x   1.63x   1.62x   1.67x   1.70x
      1024B   1.96x   1.93x   1.43x   1.95x   1.98x   2.01x   1.67x   1.69x   1.74x   1.80x
      8192B   1.96x   1.96x   1.39x   1.93x   2.01x   2.03x   1.72x   1.64x   1.71x   1.76x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.23x   1.23x   1.33x   1.39x   1.34x   1.38x   1.04x   1.18x   1.21x   1.29x
      64B     1.72x   1.69x   1.42x   1.78x   1.81x   1.89x   1.57x   1.52x   1.56x   1.65x
      256B    1.85x   1.88x   1.42x   1.86x   1.93x   1.96x   1.69x   1.65x   1.70x   1.75x
      1024B   1.88x   1.86x   1.45x   1.95x   1.96x   1.95x   1.77x   1.71x   1.77x   1.78x
      8192B   1.91x   1.86x   1.42x   1.91x   2.03x   1.98x   1.73x   1.71x   1.78x   1.76x
      
      camellia-asm vs aes-asm (8kB block):
               128bit  256bit
      ecb-enc  1.15x   1.22x
      ecb-dec  1.16x   1.16x
      cbc-enc  0.85x   0.90x
      cbc-dec  1.20x   1.23x
      ctr-enc  1.28x   1.30x
      ctr-dec  1.27x   1.28x
      lrw-enc  1.12x   1.16x
      lrw-dec  1.08x   1.10x
      xts-enc  1.11x   1.15x
      xts-dec  1.14x   1.15x
      
      Intel Core2 T8100 (fam:6, model:23, step:6):
      
      camellia-asm vs camellia_generic:
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.10x   1.12x   1.14x   1.16x   1.16x   1.15x   1.02x   1.02x   1.08x   1.08x
      64B     1.61x   1.60x   1.17x   1.68x   1.67x   1.66x   1.43x   1.42x   1.44x   1.42x
      256B    1.65x   1.73x   1.17x   1.77x   1.81x   1.80x   1.54x   1.53x   1.58x   1.54x
      1024B   1.76x   1.74x   1.18x   1.80x   1.85x   1.85x   1.60x   1.59x   1.65x   1.60x
      8192B   1.77x   1.75x   1.19x   1.81x   1.85x   1.86x   1.63x   1.61x   1.66x   1.62x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.10x   1.07x   1.13x   1.16x   1.11x   1.16x   1.03x   1.02x   1.08x   1.07x
      64B     1.61x   1.62x   1.15x   1.66x   1.63x   1.68x   1.47x   1.46x   1.47x   1.44x
      256B    1.71x   1.70x   1.16x   1.75x   1.69x   1.79x   1.58x   1.57x   1.59x   1.55x
      1024B   1.78x   1.72x   1.17x   1.75x   1.80x   1.80x   1.63x   1.62x   1.65x   1.62x
      8192B   1.76x   1.73x   1.17x   1.78x   1.80x   1.81x   1.64x   1.62x   1.68x   1.64x
      
      camellia-asm vs aes-asm (8kB block):
               128bit  256bit
      ecb-enc  1.17x   1.21x
      ecb-dec  1.17x   1.20x
      cbc-enc  0.80x   0.82x
      cbc-dec  1.22x   1.24x
      ctr-enc  1.25x   1.26x
      ctr-dec  1.25x   1.26x
      lrw-enc  1.14x   1.18x
      lrw-dec  1.13x   1.17x
      xts-enc  1.14x   1.18x
      xts-dec  1.14x   1.17x
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      0b95ec56
    • Jussi Kivilinna's avatar
    • Jussi Kivilinna's avatar
      crypto: camellia - fix checkpatch warnings · e2861a71
      Jussi Kivilinna authored
      Fix checkpatch warnings before renaming file.
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      e2861a71
    • Jussi Kivilinna's avatar
      crypto: camellia - rename camellia module to camellia_generic · 075e39df
      Jussi Kivilinna authored
      Rename camellia module to camellia_generic to allow optimized assembler
      implementations to autoload with module-alias.
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      075e39df
    • Jussi Kivilinna's avatar
      crypto: tcrypt - add more camellia tests · 4de59337
      Jussi Kivilinna authored
      Add tests for CTR, LRW and XTS modes.
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      4de59337
    • Jussi Kivilinna's avatar
      crypto: testmgr - add more camellia test vectors · 0840605e
      Jussi Kivilinna authored
      New ECB, CBC, CTR, LRW and XTS test vectors for camellia. Larger ECB/CBC test
      vectors needed for parallel 2-way camellia implementation.
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      0840605e
    • Jussi Kivilinna's avatar
      crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro · c9b56d33
      Jussi Kivilinna authored
      camellia_setup_tail() applies 'inverse of the last half of P-function' to
      subkeys, which is unneeded if keys are applied directly to yl/yr in
      CAMELLIA_ROUNDSM.
      
      Patch speeds up key setup and should speed up CAMELLIA_ROUNDSM as applying
      key to yl/yr early has less register dependencies.
      
      Quick tcrypt camellia results:
       x86_64, AMD Phenom II, ~5% faster
       x86_64, Intel Core 2, ~0.5% faster
       i386, Intel Atom N270, ~1% faster
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      c9b56d33
  8. 26 Feb, 2012 1 commit
  9. 16 Feb, 2012 1 commit
  10. 05 Feb, 2012 2 commits
  11. 26 Jan, 2012 2 commits
  12. 15 Jan, 2012 3 commits
    • Alexey Dobriyan's avatar
      crypto: sha512 - use standard ror64() · b85a088f
      Alexey Dobriyan authored
      Use standard ror64() instead of hand-written.
      There is no standard ror64, so create it.
      
      The difference is shift value being "unsigned int" instead of uint64_t
      (for which there is no reason). gcc starts to emit native ROR instructions
      which it doesn't do for some reason currently. This should make the code
      faster.
      
      Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      b85a088f
    • Alexey Dobriyan's avatar
      crypto: sha512 - reduce stack usage to safe number · 51fc6dc8
      Alexey Dobriyan authored
      For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
      Consequently, keeping all W[80] array on stack is unnecessary,
      only 16 values are really needed.
      
      Using W[16] instead of W[80] greatly reduces stack usage
      (~750 bytes to ~340 bytes on x86_64).
      
      Line by line explanation:
      * BLEND_OP
        array is "circular" now, all indexes have to be modulo 16.
        Round number is positive, so remainder operation should be
        without surprises.
      
      * initial full message scheduling is trimmed to first 16 values which
        come from data block, the rest is calculated before it's needed.
      
      * original loop body is unrolled version of new SHA512_0_15 and
        SHA512_16_79 macros, unrolling was done to not do explicit variable
        renaming. Otherwise it's the very same code after preprocessing.
        See sha1_transform() code which does the same trick.
      
      Patch survives in-tree crypto test and original bugreport test
      (ping flood with hmac(sha512).
      
      See FIPS 180-2 for SHA-512 definition
      http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdfSigned-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      51fc6dc8
    • Alexey Dobriyan's avatar
      crypto: sha512 - make it work, undo percpu message schedule · 84e31fdb
      Alexey Dobriyan authored
      commit f9e2bca6
      aka "crypto: sha512 - Move message schedule W[80] to static percpu area"
      created global message schedule area.
      
      If sha512_update will ever be entered twice, hash will be silently
      calculated incorrectly.
      
      Probably the easiest way to notice incorrect hashes being calculated is
      to run 2 ping floods over AH with hmac(sha512):
      
      	#!/usr/sbin/setkey -f
      	flush;
      	spdflush;
      	add IP1 IP2 ah 25 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025;
      	add IP2 IP1 ah 52 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000052;
      	spdadd IP1 IP2 any -P out ipsec ah/transport//require;
      	spdadd IP2 IP1 any -P in  ipsec ah/transport//require;
      
      XfrmInStateProtoError will start ticking with -EBADMSG being returned
      from ah_input(). This never happens with, say, hmac(sha1).
      
      With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
      with multiple bidirectional ping flood streams like it doesn't tick
      with SHA-1.
      
      After this patch sha512_transform() will start using ~750 bytes of stack on x86_64.
      This is OK for simple loads, for something more heavy, stack reduction will be done
      separatedly.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      84e31fdb
  13. 20 Dec, 2011 5 commits
  14. 30 Nov, 2011 3 commits
  15. 21 Nov, 2011 3 commits
  16. 13 Nov, 2011 1 commit
  17. 10 Nov, 2011 1 commit
  18. 09 Nov, 2011 3 commits