Commit b71acb0e authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Add 1472-byte test to tcrypt for IPsec
   - Reintroduced crypto stats interface with numerous changes
   - Support incremental algorithm dumps

  Algorithms:
   - Add xchacha12/20
   - Add nhpoly1305
   - Add adiantum
   - Add streebog hash
   - Mark cts(cbc(aes)) as FIPS allowed

  Drivers:
   - Improve performance of arm64/chacha20
   - Improve performance of x86/chacha20
   - Add NEON-accelerated nhpoly1305
   - Add SSE2 accelerated nhpoly1305
   - Add AVX2 accelerated nhpoly1305
   - Add support for 192/256-bit keys in gcmaes AVX
   - Add SG support in gcmaes AVX
   - ESN for inline IPsec tx in chcr
   - Add support for CryptoCell 703 in ccree
   - Add support for CryptoCell 713 in ccree
   - Add SM4 support in ccree
   - Add SM3 support in ccree
   - Add support for chacha20 in caam/qi2
   - Add support for chacha20 + poly1305 in caam/jr
   - Add support for chacha20 + poly1305 in caam/qi2
   - Add AEAD cipher support in cavium/nitrox"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (130 commits)
  crypto: skcipher - remove remnants of internal IV generators
  crypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS
  crypto: salsa20-generic - don't unnecessarily use atomic walk
  crypto: skcipher - add might_sleep() to skcipher_walk_virt()
  crypto: x86/chacha - avoid sleeping under kernel_fpu_begin()
  crypto: cavium/nitrox - Added AEAD cipher support
  crypto: mxc-scc - fix build warnings on ARM64
  crypto: api - document missing stats member
  crypto: user - remove unused dump functions
  crypto: chelsio - Fix wrong error counter increments
  crypto: chelsio - Reset counters on cxgb4 Detach
  crypto: chelsio - Handle PCI shutdown event
  crypto: chelsio - cleanup:send addr as value in function argument
  crypto: chelsio - Use same value for both channel in single WR
  crypto: chelsio - Swap location of AAD and IV sent in WR
  crypto: chelsio - remove set but not used variable 'kctx_len'
  crypto: ux500 - Use proper enum in hash_set_dma_transfer
  crypto: ux500 - Use proper enum in cryp_set_dma_transfer
  crypto: aesni - Add scatter/gather avx stubs, and use them in C
  crypto: aesni - Introduce partial block macro
  ..
parents e0c38a4d c79b411e
Programming Interface
=====================
Please note that the kernel crypto API contains the AEAD givcrypt API
(crypto_aead_giv\* and aead_givcrypt\* function calls in
include/crypto/aead.h). This API is obsolete and will be removed in the
future. To obtain the functionality of an AEAD cipher with internal IV
generation, use the IV generator as a regular cipher. For example,
rfc4106(gcm(aes)) is the AEAD cipher with external IV generation and
seqniv(rfc4106(gcm(aes))) implies that the kernel crypto API generates
the IV. Different IV generators are available.
.. class:: toc-title
Table of contents
......
......@@ -157,10 +157,6 @@ applicable to a cipher, it is not displayed:
- rng for random number generator
- givcipher for cipher with associated IV generator (see the geniv
entry below for the specification of the IV generator type used by
the cipher implementation)
- kpp for a Key-agreement Protocol Primitive (KPP) cipher such as
an ECDH or DH implementation
......@@ -174,16 +170,7 @@ applicable to a cipher, it is not displayed:
- digestsize: output size of the message digest
- geniv: IV generation type:
- eseqiv for encrypted sequence number based IV generation
- seqiv for sequence number based IV generation
- chainiv for chain iv generation
- <builtin> is a marker that the cipher implements IV generation and
handling as it is specific to the given cipher
- geniv: IV generator (obsolete)
Key Sizes
---------
......@@ -218,10 +205,6 @@ the aforementioned cipher types:
- CRYPTO_ALG_TYPE_ABLKCIPHER Asynchronous multi-block cipher
- CRYPTO_ALG_TYPE_GIVCIPHER Asynchronous multi-block cipher packed
together with an IV generator (see geniv field in the /proc/crypto
listing for the known IV generators)
- CRYPTO_ALG_TYPE_KPP Key-agreement Protocol Primitive (KPP) such as
an ECDH or DH implementation
......@@ -338,18 +321,14 @@ uses the API applicable to the cipher type specified for the block.
The following call sequence is applicable when the IPSEC layer triggers
an encryption operation with the esp_output function. During
configuration, the administrator set up the use of rfc4106(gcm(aes)) as
the cipher for ESP. The following call sequence is now depicted in the
ASCII art above:
configuration, the administrator set up the use of seqiv(rfc4106(gcm(aes)))
as the cipher for ESP. The following call sequence is now depicted in
the ASCII art above:
1. esp_output() invokes crypto_aead_encrypt() to trigger an
encryption operation of the AEAD cipher with IV generator.
In case of GCM, the SEQIV implementation is registered as GIVCIPHER
in crypto_rfc4106_alloc().
The SEQIV performs its operation to generate an IV where the core
function is seqiv_geniv().
The SEQIV generates the IV.
2. Now, SEQIV uses the AEAD API function calls to invoke the associated
AEAD cipher. In our case, during the instantiation of SEQIV, the
......
Arm TrustZone CryptoCell cryptographic engine
Required properties:
- compatible: Should be one of: "arm,cryptocell-712-ree",
"arm,cryptocell-710-ree" or "arm,cryptocell-630p-ree".
- compatible: Should be one of -
"arm,cryptocell-713-ree"
"arm,cryptocell-703-ree"
"arm,cryptocell-712-ree"
"arm,cryptocell-710-ree"
"arm,cryptocell-630p-ree"
- reg: Base physical address of the engine and length of memory mapped region.
- interrupts: Interrupt number for the device.
......
......@@ -6,6 +6,8 @@ Required properties:
- interrupts : Should contain MXS DCP interrupt numbers, VMI IRQ and DCP IRQ
must be supplied, optionally Secure IRQ can be present, but
is currently not implemented and not used.
- clocks : Clock reference (only required on some SOCs: 6ull and 6sll).
- clock-names : Must be "dcp".
Example:
......
......@@ -3484,6 +3484,7 @@ F: include/linux/spi/cc2520.h
F: Documentation/devicetree/bindings/net/ieee802154/cc2520.txt
CCREE ARM TRUSTZONE CRYPTOCELL REE DRIVER
M: Yael Chemla <yael.chemla@foss.arm.com>
M: Gilad Ben-Yossef <gilad@benyossef.com>
L: linux-crypto@vger.kernel.org
S: Supported
......@@ -7147,7 +7148,9 @@ F: crypto/842.c
F: lib/842/
IBM Power in-Nest Crypto Acceleration
M: Paulo Flabiano Smorigo <pfsmorigo@linux.ibm.com>
M: Breno Leitão <leitao@debian.org>
M: Nayna Jain <nayna@linux.ibm.com>
M: Paulo Flabiano Smorigo <pfsmorigo@gmail.com>
L: linux-crypto@vger.kernel.org
S: Supported
F: drivers/crypto/nx/Makefile
......@@ -7211,7 +7214,9 @@ S: Supported
F: drivers/scsi/ibmvscsi_tgt/
IBM Power VMX Cryptographic instructions
M: Paulo Flabiano Smorigo <pfsmorigo@linux.ibm.com>
M: Breno Leitão <leitao@debian.org>
M: Nayna Jain <nayna@linux.ibm.com>
M: Paulo Flabiano Smorigo <pfsmorigo@gmail.com>
L: linux-crypto@vger.kernel.org
S: Supported
F: drivers/crypto/vmx/Makefile
......
......@@ -69,6 +69,15 @@ config CRYPTO_AES_ARM
help
Use optimized AES assembler routines for ARM platforms.
On ARM processors without the Crypto Extensions, this is the
fastest AES implementation for single blocks. For multiple
blocks, the NEON bit-sliced implementation is usually faster.
This implementation may be vulnerable to cache timing attacks,
since it uses lookup tables. However, as countermeasures it
disables IRQs and preloads the tables; it is hoped this makes
such attacks very difficult.
config CRYPTO_AES_ARM_BS
tristate "Bit sliced AES using NEON instructions"
depends on KERNEL_MODE_NEON
......@@ -117,9 +126,14 @@ config CRYPTO_CRC32_ARM_CE
select CRYPTO_HASH
config CRYPTO_CHACHA20_NEON
tristate "NEON accelerated ChaCha20 symmetric cipher"
tristate "NEON accelerated ChaCha stream cipher algorithms"
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
config CRYPTO_NHPOLY1305_NEON
tristate "NEON accelerated NHPoly1305 hash function (for Adiantum)"
depends on KERNEL_MODE_NEON
select CRYPTO_NHPOLY1305
endif
......@@ -9,7 +9,8 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o
obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
......@@ -52,7 +53,8 @@ aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o
ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o
nhpoly1305-neon-y := nh-neon-core.o nhpoly1305-neon-glue.o
ifdef REGENERATE_ARM_CRYPTO
quiet_cmd_perl = PERL $@
......
......@@ -10,7 +10,6 @@
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/hwcap.h>
#include <crypto/aes.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
......
......@@ -10,6 +10,7 @@
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
#include <asm/cache.h>
.text
......@@ -41,7 +42,7 @@
.endif
.endm
.macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc, sz, op
.macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc, sz, op, oldcpsr
__select \out0, \in0, 0
__select t0, \in1, 1
__load \out0, \out0, 0, \sz, \op
......@@ -73,6 +74,14 @@
__load t0, t0, 3, \sz, \op
__load \t4, \t4, 3, \sz, \op
.ifnb \oldcpsr
/*
* This is the final round and we're done with all data-dependent table
* lookups, so we can safely re-enable interrupts.
*/
restore_irqs \oldcpsr
.endif
eor \out1, \out1, t1, ror #24
eor \out0, \out0, t2, ror #16
ldm rk!, {t1, t2}
......@@ -83,14 +92,14 @@
eor \out1, \out1, t2
.endm
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op, oldcpsr
__hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1, \sz, \op
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op, \oldcpsr
.endm
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op, oldcpsr
__hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0, \sz, \op
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op, \oldcpsr
.endm
.macro __rev, out, in
......@@ -118,13 +127,14 @@
.macro do_crypt, round, ttab, ltab, bsz
push {r3-r11, lr}
// Load keys first, to reduce latency in case they're not cached yet.
ldm rk!, {r8-r11}
ldr r4, [in]
ldr r5, [in, #4]
ldr r6, [in, #8]
ldr r7, [in, #12]
ldm rk!, {r8-r11}
#ifdef CONFIG_CPU_BIG_ENDIAN
__rev r4, r4
__rev r5, r5
......@@ -138,6 +148,25 @@
eor r7, r7, r11
__adrl ttab, \ttab
/*
* Disable interrupts and prefetch the 1024-byte 'ft' or 'it' table into
* L1 cache, assuming cacheline size >= 32. This is a hardening measure
* intended to make cache-timing attacks more difficult. They may not
* be fully prevented, however; see the paper
* https://cr.yp.to/antiforgery/cachetiming-20050414.pdf
* ("Cache-timing attacks on AES") for a discussion of the many
* difficulties involved in writing truly constant-time AES software.
*/
save_and_disable_irqs t0
.set i, 0
.rept 1024 / 128
ldr r8, [ttab, #i + 0]
ldr r9, [ttab, #i + 32]
ldr r10, [ttab, #i + 64]
ldr r11, [ttab, #i + 96]
.set i, i + 128
.endr
push {t0} // oldcpsr
tst rounds, #2
bne 1f
......@@ -151,8 +180,21 @@
\round r4, r5, r6, r7, r8, r9, r10, r11
b 0b
2: __adrl ttab, \ltab
\round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b
2: .ifb \ltab
add ttab, ttab, #1
.else
__adrl ttab, \ltab
// Prefetch inverse S-box for final round; see explanation above
.set i, 0
.rept 256 / 64
ldr t0, [ttab, #i + 0]
ldr t1, [ttab, #i + 32]
.set i, i + 64
.endr
.endif
pop {rounds} // oldcpsr
\round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b, rounds
#ifdef CONFIG_CPU_BIG_ENDIAN
__rev r4, r4
......@@ -175,7 +217,7 @@
.endm
ENTRY(__aes_arm_encrypt)
do_crypt fround, crypto_ft_tab, crypto_ft_tab + 1, 2
do_crypt fround, crypto_ft_tab,, 2
ENDPROC(__aes_arm_encrypt)
.align 5
......
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
* ChaCha/XChaCha NEON helper functions
*
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
......@@ -27,9 +27,9 @@
* (d) vtbl.8 + vtbl.8 (multiple of 8 bits rotations only,
* needs index vector)
*
* ChaCha20 has 16, 12, 8, and 7-bit rotations. For the 12 and 7-bit
* rotations, the only choices are (a) and (b). We use (a) since it takes
* two-thirds the cycles of (b) on both Cortex-A7 and Cortex-A53.
* ChaCha has 16, 12, 8, and 7-bit rotations. For the 12 and 7-bit rotations,
* the only choices are (a) and (b). We use (a) since it takes two-thirds the
* cycles of (b) on both Cortex-A7 and Cortex-A53.
*
* For the 16-bit rotation, we use vrev32.16 since it's consistently fastest
* and doesn't need a temporary register.
......@@ -52,30 +52,20 @@
.fpu neon
.align 5
ENTRY(chacha20_block_xor_neon)
// r0: Input state matrix, s
// r1: 1 data block output, o
// r2: 1 data block input, i
//
// This function encrypts one ChaCha20 block by loading the state matrix
// in four NEON registers. It performs matrix operation on four words in
// parallel, but requireds shuffling to rearrange the words after each
// round.
//
// x0..3 = s0..3
add ip, r0, #0x20
vld1.32 {q0-q1}, [r0]
vld1.32 {q2-q3}, [ip]
vmov q8, q0
vmov q9, q1
vmov q10, q2
vmov q11, q3
/*
* chacha_permute - permute one block
*
* Permute one 64-byte block where the state matrix is stored in the four NEON
* registers q0-q3. It performs matrix operations on four words in parallel,
* but requires shuffling to rearrange the words after each round.
*
* The round count is given in r3.
*
* Clobbers: r3, ip, q4-q5
*/
chacha_permute:
adr ip, .Lrol8_table
mov r3, #10
vld1.8 {d10}, [ip, :64]
.Ldoubleround:
......@@ -139,9 +129,31 @@ ENTRY(chacha20_block_xor_neon)
// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
vext.8 q3, q3, q3, #4
subs r3, r3, #1
subs r3, r3, #2
bne .Ldoubleround
bx lr
ENDPROC(chacha_permute)
ENTRY(chacha_block_xor_neon)
// r0: Input state matrix, s
// r1: 1 data block output, o
// r2: 1 data block input, i
// r3: nrounds
push {lr}
// x0..3 = s0..3
add ip, r0, #0x20
vld1.32 {q0-q1}, [r0]
vld1.32 {q2-q3}, [ip]
vmov q8, q0
vmov q9, q1
vmov q10, q2
vmov q11, q3
bl chacha_permute
add ip, r2, #0x20
vld1.8 {q4-q5}, [r2]
vld1.8 {q6-q7}, [ip]
......@@ -166,15 +178,33 @@ ENTRY(chacha20_block_xor_neon)
vst1.8 {q0-q1}, [r1]
vst1.8 {q2-q3}, [ip]
bx lr
ENDPROC(chacha20_block_xor_neon)
pop {pc}
ENDPROC(chacha_block_xor_neon)
ENTRY(hchacha_block_neon)
// r0: Input state matrix, s
// r1: output (8 32-bit words)
// r2: nrounds
push {lr}
vld1.32 {q0-q1}, [r0]!
vld1.32 {q2-q3}, [r0]
mov r3, r2
bl chacha_permute
vst1.32 {q0}, [r1]!
vst1.32 {q3}, [r1]
pop {pc}
ENDPROC(hchacha_block_neon)
.align 4
.Lctrinc: .word 0, 1, 2, 3
.Lrol8_table: .byte 3, 0, 1, 2, 7, 4, 5, 6
.align 5
ENTRY(chacha20_4block_xor_neon)
ENTRY(chacha_4block_xor_neon)
push {r4-r5}
mov r4, sp // preserve the stack pointer
sub ip, sp, #0x20 // allocate a 32 byte buffer
......@@ -184,9 +214,10 @@ ENTRY(chacha20_4block_xor_neon)
// r0: Input state matrix, s
// r1: 4 data blocks output, o
// r2: 4 data blocks input, i
// r3: nrounds
//
// This function encrypts four consecutive ChaCha20 blocks by loading
// This function encrypts four consecutive ChaCha blocks by loading
// the state matrix in NEON registers four times. The algorithm performs
// each operation on the corresponding word of each state matrix, hence
// requires no word shuffling. The words are re-interleaved before the
......@@ -219,7 +250,6 @@ ENTRY(chacha20_4block_xor_neon)
vdup.32 q0, d0[0]
adr ip, .Lrol8_table
mov r3, #10
b 1f
.Ldoubleround4:
......@@ -417,7 +447,7 @@ ENTRY(chacha20_4block_xor_neon)
vsri.u32 q5, q8, #25
vsri.u32 q6, q9, #25
subs r3, r3, #1
subs r3, r3, #2
bne .Ldoubleround4
// x0..7[0-3] are in q0-q7, x10..15[0-3] are in q10-q15.
......@@ -527,4 +557,4 @@ ENTRY(chacha20_4block_xor_neon)
pop {r4-r5}
bx lr
ENDPROC(chacha20_4block_xor_neon)
ENDPROC(chacha_4block_xor_neon)
/*
* ARM NEON accelerated ChaCha and XChaCha stream ciphers,
* including ChaCha20 (RFC7539)
*
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* Based on:
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <crypto/algapi.h>
#include <crypto/chacha.h>
#include <crypto/internal/skcipher.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
asmlinkage void chacha_block_xor_neon(const u32 *state, u8 *dst, const u8 *src,
int nrounds);
asmlinkage void chacha_4block_xor_neon(const u32 *state, u8 *dst, const u8 *src,
int nrounds);
asmlinkage void hchacha_block_neon(const u32 *state, u32 *out, int nrounds);
static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes, int nrounds)
{
u8 buf[CHACHA_BLOCK_SIZE];
while (bytes >= CHACHA_BLOCK_SIZE * 4) {
chacha_4block_xor_neon(state, dst, src, nrounds);
bytes -= CHACHA_BLOCK_SIZE * 4;
src += CHACHA_BLOCK_SIZE * 4;
dst += CHACHA_BLOCK_SIZE * 4;
state[12] += 4;
}
while (bytes >= CHACHA_BLOCK_SIZE) {
chacha_block_xor_neon(state, dst, src, nrounds);
bytes -= CHACHA_BLOCK_SIZE;
src += CHACHA_BLOCK_SIZE;
dst += CHACHA_BLOCK_SIZE;
state[12]++;
}
if (bytes) {
memcpy(buf, src, bytes);
chacha_block_xor_neon(state, buf, buf, nrounds);
memcpy(dst, buf, bytes);
}
}
static int chacha_neon_stream_xor(struct skcipher_request *req,
struct chacha_ctx *ctx, u8 *iv)
{
struct skcipher_walk walk;
u32 state[16];
int err;
err = skcipher_walk_virt(&walk, req, false);
crypto_chacha_init(state, ctx, iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total)
nbytes = round_down(nbytes, walk.stride);
kernel_neon_begin();
chacha_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
nbytes, ctx->nrounds);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
static int chacha_neon(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
return crypto_chacha_crypt(req);
return chacha_neon_stream_xor(req, ctx, req->iv);
}
static int xchacha_neon(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
struct chacha_ctx subctx;
u32 state[16];
u8 real_iv[16];
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
return crypto_xchacha_crypt(req);
crypto_chacha_init(state, ctx, req->iv);
kernel_neon_begin();
hchacha_block_neon(state, subctx.key, ctx->nrounds);
kernel_neon_end();
subctx.nrounds = ctx->nrounds;
memcpy(&real_iv[0], req->iv + 24, 8);
memcpy(&real_iv[8], req->iv + 16, 8);
return chacha_neon_stream_xor(req, &subctx, real_iv);
}
static struct skcipher_alg algs[] = {
{
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = CHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 4 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = chacha_neon,
.decrypt = chacha_neon,
}, {
.base.cra_name = "xchacha20",
.base.cra_driver_name = "xchacha20-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 4 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = xchacha_neon,
.decrypt = xchacha_neon,
}, {
.base.cra_name = "xchacha12",
.base.cra_driver_name = "xchacha12-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 4 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha12_setkey,
.encrypt = xchacha_neon,
.decrypt = xchacha_neon,
}
};
static int __init chacha_simd_mod_init(void)
{
if (!(elf_hwcap & HWCAP_NEON))
return -ENODEV;
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
}
static void __exit chacha_simd_mod_fini(void)
{
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
}
module_init(chacha_simd_mod_init);
module_exit(chacha_simd_mod_fini);
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (NEON accelerated)");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-neon");
MODULE_ALIAS_CRYPTO("xchacha20");