| #
4b908403
|
| 18-Feb-2026 |
Eric Biggers <ebiggers@kernel.org> |
lib/crypto: arm64/aes: Move assembly code for AES modes into libaes
To migrate the support for CBC-based MACs into libaes, the corresponding arm64 assembly code needs to be moved there. However, th
lib/crypto: arm64/aes: Move assembly code for AES modes into libaes
To migrate the support for CBC-based MACs into libaes, the corresponding arm64 assembly code needs to be moved there. However, the arm64 AES assembly code groups many AES modes together; individual modes aren't easily separable. (This isn't unique to arm64; other architectures organize their AES modes similarly.)
Since the other AES modes will be migrated into the library eventually too, just move the full assembly files for the AES modes into the library. (This is similar to what I already did for PowerPC and SPARC.)
Specifically: move the assembly files aes-ce.S, aes-modes.S, and aes-neon.S and their build rules; declare the assembly functions in <crypto/aes.h>; and export the assembly functions from libaes.
Note that the exports and public declarations of the assembly functions are temporary. They exist only to keep arch/arm64/crypto/ working until the AES modes are fully moved into the library.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260218213501.136844-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
571e557c
|
| 15-Apr-2024 |
Ard Biesheuvel <ardb@kernel.org> |
crypto: arm64/aes-ce - Simplify round key load sequence
Tweak the round key logic so that they can be loaded using a single branchless sequence using overlapping loads. This is shorter and simpler,
crypto: arm64/aes-ce - Simplify round key load sequence
Tweak the round key logic so that they can be loaded using a single branchless sequence using overlapping loads. This is shorter and simpler, and puts the conditional branches based on the key size further apart, which might benefit microarchitectures that cannot record taken branches at every instruction. For these branches, use test-bit-branch instructions that don't clobber the condition flags.
Note that none of this has any impact on performance, positive or otherwise (and the branch prediction benefit would only benefit AES-192 which nobody uses). It does make for nicer code, though.
While at it, use \@ to generate the labels inside the macros, which is more robust than using fixed numbers, which could clash inadvertently. Also, bring aes-neon.S in line with these changes, including the switch to test-and-branch instructions, to avoid surprises in the future when we might start relying on the condition flags being preserved in the chaining mode wrappers in aes-modes.S
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
b8e50548
|
| 18-Feb-2020 |
Mark Brown <broonie@kernel.org> |
arm64: crypto: Modernize names for AES function macros
Now that the rest of the code has been converted to the modern START/END macros the AES_ENTRY() and AES_ENDPROC() macros look out of place and
arm64: crypto: Modernize names for AES function macros
Now that the rest of the code has been converted to the modern START/END macros the AES_ENTRY() and AES_ENDPROC() macros look out of place and like they need updating. Rename them to AES_FUNC_START() and AES_FUNC_END() to line up with the modern style assembly macros.
Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
| #
0e89640b
|
| 13-Dec-2019 |
Mark Brown <broonie@kernel.org> |
crypto: arm64 - Use modern annotations for assembly functions
In an effort to clarify and simplify the annotation of assembly functions in the kernel new macros have been introduced. These replace E
crypto: arm64 - Use modern annotations for assembly functions
In an effort to clarify and simplify the annotation of assembly functions in the kernel new macros have been introduced. These replace ENTRY and ENDPROC and also add a new annotation for static functions which previously had no ENTRY equivalent. Update the annotations in the crypto code to the new macros.
There are a small number of files imported from OpenSSL where the assembly is generated using perl programs, these are not currently annotated at all and have not been modified.
Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
67cfa5d3
|
| 03-Sep-2019 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
crypto: arm64/aes-neonbs - implement ciphertext stealing for XTS
Update the AES-XTS implementation based on NEON instructions so that it can deal with inputs whose size is not a multiple of the ciph
crypto: arm64/aes-neonbs - implement ciphertext stealing for XTS
Update the AES-XTS implementation based on NEON instructions so that it can deal with inputs whose size is not a multiple of the cipher block size. This is part of the original XTS specification, but was never implemented before in the Linux kernel.
Since the bit slicing driver is only faster if it can operate on at least 7 blocks of input at the same time, let's reuse the alternate path we are adding for CTS to process any data tail whose size is not a multiple of 128 bytes.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
4d2fa8b4
|
| 09-Jul-2019 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu: "Here is the crypto update for 5.3:
API: - Test shash interface d
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu: "Here is the crypto update for 5.3:
API: - Test shash interface directly in testmgr - cra_driver_name is now mandatory
Algorithms: - Replace arc4 crypto_cipher with library helper - Implement 5 way interleave for ECB, CBC and CTR on arm64 - Add xxhash - Add continuous self-test on noise source to drbg - Update jitter RNG
Drivers: - Add support for SHA204A random number generator - Add support for 7211 in iproc-rng200 - Fix fuzz test failures in inside-secure - Fix fuzz test failures in talitos - Fix fuzz test failures in qat"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (143 commits) crypto: stm32/hash - remove interruptible condition for dma crypto: stm32/hash - Fix hmac issue more than 256 bytes crypto: stm32/crc32 - rename driver file crypto: amcc - remove memset after dma_alloc_coherent crypto: ccp - Switch to SPDX license identifiers crypto: ccp - Validate the the error value used to index error messages crypto: doc - Fix formatting of new crypto engine content crypto: doc - Add parameter documentation crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR crypto: arm64/aes-ce - add 5 way interleave routines crypto: talitos - drop icv_ool crypto: talitos - fix hash on SEC1. crypto: talitos - move struct talitos_edesc into talitos.h lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE crypto/NX: Set receive window credits to max number of CRBs in RxFIFO crypto: asymmetric_keys - select CRYPTO_HASH where needed crypto: serpent - mark __serpent_setkey_sbox noinline crypto: testmgr - dynamically allocate crypto_shash crypto: testmgr - dynamically allocate testvec_config crypto: talitos - eliminate unneeded 'done' functions at build time ...
show more ...
|
| #
7367bfeb
|
| 24-Jun-2019 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR
This implements 5-way interleaving for ECB, CBC decryption and CTR, resulting in a speedup of ~11% on Marvell ThunderX2, which
crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR
This implements 5-way interleaving for ECB, CBC decryption and CTR, resulting in a speedup of ~11% on Marvell ThunderX2, which has a very deep pipeline and therefore a high issue latency for NEON instructions operating on the same registers.
Note that XTS is left alone: implementing 5-way interleave there would either involve spilling of the calculated tweaks to the stack, or recalculating them after the encryption operation, and doing either of those would most likely penalize low end cores.
For ECB, this is not a concern at all, given that we have plenty of spare registers. For CTR and CBC decryption, we take advantage of the fact that v16 is not used by the CE version of the code (which is the only one targeted by the optimization), and so we can reshuffle the code a bit and avoid having to spill to memory (with the exception of one extra reload in the CBC routine)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
e2174139
|
| 24-Jun-2019 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
crypto: arm64/aes-ce - add 5 way interleave routines
In preparation of tweaking the accelerated AES chaining mode routines to be able to use a 5-way stride, implement the core routines to support pr
crypto: arm64/aes-ce - add 5 way interleave routines
In preparation of tweaking the accelerated AES chaining mode routines to be able to use a 5-way stride, implement the core routines to support processing 5 blocks of input at a time. While at it, drop the 2 way versions, which have been unused for a while now.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
d2912cb1
|
| 04-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of th
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
| #
2e5d2f33
|
| 10-Sep-2018 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
crypto: arm64/aes-blk - improve XTS mask handling
The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure
crypto: arm64/aes-blk - improve XTS mask handling
The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure NEON flavor uses 16 NEON registers for the AES S-box.
This means we have a spare register available that we can use to hold the XTS mask vector, removing the need to reload it at every iteration of the inner loop.
Since the pure NEON version does not permit this optimization, tweak the macros so we can factor out this functionality. Also, replace the literal load with a short sequence to compose the mask vector.
On Cortex-A53, this results in a ~4% speedup.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
0c8f838a
|
| 30-Apr-2018 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
crypto: arm64/aes-blk - yield NEON after every block of input
Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input.
Signed-off-by: Ard Bieshe
crypto: arm64/aes-blk - yield NEON after every block of input
Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
f402e311
|
| 24-Jul-2017 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
crypto: arm64/aes-ce-cipher - match round key endianness with generic code
In order to be able to reuse the generic AES code as a fallback for situations where the NEON may not be used, update the k
crypto: arm64/aes-ce-cipher - match round key endianness with generic code
In order to be able to reuse the generic AES code as a fallback for situations where the NEON may not be used, update the key handling to match the byte order of the generic code: it stores round keys as sequences of 32-bit quantities rather than streams of bytes, and so our code needs to be updated to reflect that.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
caf4b9e2
|
| 11-Oct-2016 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
crypto: arm64/aes-xts-ce: fix for big endian
Emit the XTS tweak literal constants in the appropriate order for a single 128-bit scalar literal load.
Fixes: 49788fe2a128 ("arm64/crypto: AES-ECB/CBC/
crypto: arm64/aes-xts-ce: fix for big endian
Emit the XTS tweak literal constants in the appropriate order for a single 128-bit scalar literal load.
Fixes: 49788fe2a128 ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
4a97abd4
|
| 17-Mar-2015 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
arm64/crypto: issue aese/aesmc instructions in pairs
This changes the AES core transform implementations to issue aese/aesmc (and aesd/aesimc) in pairs. This enables a micro-architectural optimizati
arm64/crypto: issue aese/aesmc instructions in pairs
This changes the AES core transform implementations to issue aese/aesmc (and aesd/aesimc) in pairs. This enables a micro-architectural optimization in recent Cortex-A5x cores that improves performance by 50-90%.
Measured performance in cycles per byte (Cortex-A57):
CBC enc CBC dec CTR before 3.64 1.34 1.32 after 1.95 0.85 0.93
Note that this results in a ~5% performance decrease for older cores.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
| #
49788fe2
|
| 21-Mar-2014 |
Ard Biesheuvel <ard.biesheuvel@linaro.org> |
arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions
This adds ARMv8 implementations of AES in ECB, CBC, CTR and XTS modes, both for ARMv8 with Crypto Extensions and for plain AR
arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions
This adds ARMv8 implementations of AES in ECB, CBC, CTR and XTS modes, both for ARMv8 with Crypto Extensions and for plain ARMv8 NEON.
The Crypto Extensions version can only run on ARMv8 implementations that have support for these optional extensions.
The plain NEON version is a table based yet time invariant implementation. All S-box substitutions are performed in parallel, leveraging the wide range of ARMv8's tbl/tbx instructions, and the huge NEON register file, which can comfortably hold the entire S-box and still have room to spare for doing the actual computations.
The key expansion routines were borrowed from aes_generic.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|