Discussion:
inconsistent handling of unaligned hash inputs
David Miller
2014-10-01 05:10:34 UTC
Permalink
Bob Picco sent me a bootup trace showing that we've started to get
unaligned accesses when the generic sha256 is tested on sparc64. I
believe this is introduced by:

commit 950e4e1c1b334c4975b91106f23fd343be3eb7a0
Author: Jussi Kivilinna <***@iki.fi>
Date: Sat Apr 12 15:35:29 2014 +0300

crypto: testmgr - add empty and large test vectors for SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512

That change looks perfectly correct, it's just adding new legitimate
tests to run, but when I went to look to see how unaligned inputs are
handled I see:

1) SHA1 uses get_unaligned_be32()
2) SHA256/SHA224 uses direct u32 derefencing
3) SHA384/SHA512 likewise
4) MD5 always operates on the md5 context's mctx->block which is
u32 aligned.

The sparc64 assembler that uses the chip's crypto instructions doesn't
have this problem because we have two code paths, one for aligned
data and one for unaligned data, in each routine.

Anyways, I suspect that we need to use get_unaligned_be{32,64}() in
generic SHA256 and SHA512.

The following seems to fix things for me:

====================
crypto: Handle unaligned input data in generic sha256 and sha512.

Like SHA1, use get_unaligned_be*() on the raw input data.

Reported-by: Bob Picco <***@oracle.com>
Signed-off-by: David S. Miller <***@davemloft.net>

diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index 5433667..0bb5583 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -24,6 +24,7 @@
#include <linux/types.h>
#include <crypto/sha.h>
#include <asm/byteorder.h>
+#include <asm/unaligned.h>

static inline u32 Ch(u32 x, u32 y, u32 z)
{
@@ -42,7 +43,7 @@ static inline u32 Maj(u32 x, u32 y, u32 z)

static inline void LOAD_OP(int I, u32 *W, const u8 *input)
{
- W[I] = __be32_to_cpu( ((__be32*)(input))[I] );
+ W[I] = get_unaligned_be32((__u32 *)input + I);
}

static inline void BLEND_OP(int I, u32 *W)
diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 6ed124f..6dde57d 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -20,6 +20,7 @@
#include <crypto/sha.h>
#include <linux/percpu.h>
#include <asm/byteorder.h>
+#include <asm/unaligned.h>

static inline u64 Ch(u64 x, u64 y, u64 z)
{
@@ -68,7 +69,7 @@ static const u64 sha512_K[80] = {

static inline void LOAD_OP(int I, u64 *W, const u8 *input)
{
- W[I] = __be64_to_cpu( ((__be64*)(input))[I] );
+ W[I] = get_unaligned_be64((__u64 *)input + I);
}

static inline void BLEND_OP(int I, u64 *W)
Herbert Xu
2014-10-02 00:33:53 UTC
Permalink
Post by David Miller
Anyways, I suspect that we need to use get_unaligned_be{32,64}() in
generic SHA256 and SHA512.
The other option is to set cra_alignmask and have the crypto API
handle this. So the question is what is the cost benefit of using
a normal load vs. get_unaligned_*?

Most in-kernel crypto uses should be using aligned input/output.

Cheers,
--
Email: Herbert Xu <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller
2014-10-02 01:06:14 UTC
Permalink
From: Herbert Xu <***@gondor.apana.org.au>
Date: Thu, 2 Oct 2014 08:33:53 +0800
Post by Herbert Xu
Post by David Miller
Anyways, I suspect that we need to use get_unaligned_be{32,64}() in
generic SHA256 and SHA512.
The other option is to set cra_alignmask and have the crypto API
handle this. So the question is what is the cost benefit of using
a normal load vs. get_unaligned_*?
Most in-kernel crypto uses should be using aligned input/output.
In these specific hash functions we only read the u32/u64 inputs
a byte at a time once, to get them into the work buffer.

If we have the crypto layer do it, we'll bounce the data around
once to the crypto layer bounce buffer, then once again into
the hash implementation's work buffer.

I think it's better to avoid the extra copy right?
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Herbert Xu
2014-10-02 02:04:39 UTC
Permalink
Post by David Miller
In these specific hash functions we only read the u32/u64 inputs
a byte at a time once, to get them into the work buffer.
If we have the crypto layer do it, we'll bounce the data around
once to the crypto layer bounce buffer, then once again into
the hash implementation's work buffer.
Oh of course if your data is unaligned it'll be worse. But most
in-kernel input should be aligned. So we need to balance this
against the cost of unaligned loads on aligned data. If the cost
of unaligned loads on aligned data is negligible then sure let's
just do unaligned loads unconditionally.

PS Sorry for the earlier HTML email, I sent it from Android and
it seems that there is no way to disable HTML.

Cheers,
--
Email: Herbert Xu <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller
2014-10-02 02:35:57 UTC
Permalink
From: Herbert Xu <***@gondor.apana.org.au>
Date: Thu, 2 Oct 2014 09:30:28 +0800
Post by Herbert Xu
Post by David Miller
Date: Thu, 2 Oct 2014 08:33:53 +0800
In these specific hash functions we only read the u32/u64 inputs
a byte at a time once, to get them into the work buffer.
If we have the crypto layer do it, we'll bounce the data around
once to the crypto layer bounce buffer, then once again into
the hash implementation's work buffer.
Oh of course if your data is unaligned it'll be worse. But most
in-kernel input should be aligned. So we need to balance this
against the cost of unaligned loads on aligned data. If the cost
of unaligned loads on aligned data is negligible then sure let's
just do unaligned loads unconditionally.
I see what you're saying.

Probably things are aligned most of the time.

Actually the "cost" of the unaligned load is variable, in that it
depends upon the host cpu endianness. By doing the byte loads that
part of the byte shuffling is sort of free. :-) But if the native
endianness matches what the SHA code wants (big endian) then there
isn't any such ammortization going on.

Furthermore, this doesn't take into consideration when the cpu
has endian swapping load/store instructions.

It would be nice to have all the modules specify the alignment,
however in the SHA1 case the code is under lib/ and therefore getting
rid of the get_unaligned_*() usage would prove difficult without
code duplication.

Therefore, for now, it's probably best to use my patch, and use
get_unaligned_*() consistently throughout the sha* implementations.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Herbert Xu
2014-10-02 07:15:32 UTC
Permalink
Post by David Miller
Therefore, for now, it's probably best to use my patch, and use
get_unaligned_*() consistently throughout the sha* implementations.
OK I've applied your patch to cryptodev. Thanks!
--
Email: Herbert Xu <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...