Skip to content

Commit 4b6d196

Browse files
ebiggersherbertx
authored andcommitted
crypto: arm64/chacha - fix chacha_4block_xor_neon() for big endian
The change to encrypt a fifth ChaCha block using scalar instructions caused the chacha20-neon, xchacha20-neon, and xchacha12-neon self-tests to start failing on big endian arm64 kernels. The bug is that the keystream block produced in 32-bit scalar registers is directly XOR'd with the data words, which are loaded and stored in native endianness. Thus in big endian mode the data bytes end up XOR'd with the wrong bytes. Fix it by byte-swapping the keystream words in big endian mode. Fixes: 2fe5598 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
1 parent c643165 commit 4b6d196

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

arch/arm64/crypto/chacha-neon-core.S

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -532,6 +532,10 @@ ENTRY(chacha_4block_xor_neon)
532532
add v3.4s, v3.4s, v19.4s
533533
add a2, a2, w8
534534
add a3, a3, w9
535+
CPU_BE( rev a0, a0 )
536+
CPU_BE( rev a1, a1 )
537+
CPU_BE( rev a2, a2 )
538+
CPU_BE( rev a3, a3 )
535539

536540
ld4r {v24.4s-v27.4s}, [x0], #16
537541
ld4r {v28.4s-v31.4s}, [x0]
@@ -552,6 +556,10 @@ ENTRY(chacha_4block_xor_neon)
552556
add v7.4s, v7.4s, v23.4s
553557
add a6, a6, w8
554558
add a7, a7, w9
559+
CPU_BE( rev a4, a4 )
560+
CPU_BE( rev a5, a5 )
561+
CPU_BE( rev a6, a6 )
562+
CPU_BE( rev a7, a7 )
555563

556564
// x8[0-3] += s2[0]
557565
// x9[0-3] += s2[1]
@@ -569,6 +577,10 @@ ENTRY(chacha_4block_xor_neon)
569577
add v11.4s, v11.4s, v27.4s
570578
add a10, a10, w8
571579
add a11, a11, w9
580+
CPU_BE( rev a8, a8 )
581+
CPU_BE( rev a9, a9 )
582+
CPU_BE( rev a10, a10 )
583+
CPU_BE( rev a11, a11 )
572584

573585
// x12[0-3] += s3[0]
574586
// x13[0-3] += s3[1]
@@ -586,6 +598,10 @@ ENTRY(chacha_4block_xor_neon)
586598
add v15.4s, v15.4s, v31.4s
587599
add a14, a14, w8
588600
add a15, a15, w9
601+
CPU_BE( rev a12, a12 )
602+
CPU_BE( rev a13, a13 )
603+
CPU_BE( rev a14, a14 )
604+
CPU_BE( rev a15, a15 )
589605

590606
// interleave 32-bit words in state n, n+1
591607
ldp w6, w7, [x2], #64

0 commit comments

Comments
 (0)