Hi,
Nikos Mavrogiannopoulos have been looking into support for Galois Counter Mode (GCM), see http://www.cryptobarn.com/papers/gcm-spec.pdf
My understanding of GCM is that the main point is a new MAC function which allows efficient hardware implementation. As far as I see, there's no clear advantage of using GCM instead of plain CTR mode combined with the same MAC function (applied to the plaintext).
For Nettle, I think the first step ought to be to properly support the MAC function, GMAC. The most fundamental difference to other MAC functions is that it takes two input strings (besides the key). When used as a plain MAC, the second input is empty, while when used with GCM, the first input is auxillary data to be authenticated, and the second input is the cryptotext.
Some questions:
* Naming: Is "gmac" a good enough name? Or "ghash" (the name of the primitive which takes a key and two inputs, in the paper)? Or do we need something more verbose, like galois_mac or gmac128 or so?
* Specification: It's not entirely clear to me how the spec is to be interpreted when one of the input strings is empty. The most reasonable interpretation would be that there should be zero blocks to process (n or m equal to zero). This requires some bending of the notation in equation (2), for example, with m = 0, n = 1, we should have
X_0 = 0 X_1 = C_1^* · H X_2 = (X_1 + (0 || len(C))) · H
and with m = 1, n = 0,
X_0 = 0 X_1 = A_1^* · H X_2 = (X_1 0 (len(A) || 0)) · H
Do you agree?
* Interface: I think the basic use case with empty second input should be just like other MAC:s,
struct gmac_ctx;
/* Key size fixed to GMAC_KEY_SIZE == 16 */ void gmac_set_key(struct gmac_ctx *ctx, const uint8_t *key);
void gmac_update(struct gmac_ctx *ctx, unsigned length, const uint8_t *data);
void gmac_digest(struct gmac_ctx *ctx, unsigned length, uint8_t *digest);
The context struct and the set_key function is essential to be able to do any optimizations using key-dependant tables.
But then we need a function to mark the end of the first input and the start of the second. Name for that one?
void gmac_next(struct gmac_ctx *ctx);
This will pad the current input to a block boundary, and switch to using a different length counter.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Hi,
Nikos Mavrogiannopoulos have been looking into support for Galois Counter Mode (GCM), see http://www.cryptobarn.com/papers/gcm-spec.pdf
Hi! NIST Has some other links for it:
http://csrc.nist.gov/groups/ST/toolkit/BCM/modes_development.html
My understanding of GCM is that the main point is a new MAC function which allows efficient hardware implementation. As far as I see, there's no clear advantage of using GCM instead of plain CTR mode combined with the same MAC function (applied to the plaintext).
Shouldn't you MAC the ciphertext? That's the proved secure approach.
- Naming: Is "gmac" a good enough name? Or "ghash" (the name of the primitive which takes a key and two inputs, in the paper)? Or do we need something more verbose, like galois_mac or gmac128 or so?
The name GMAC is well established:
http://en.wikipedia.org/wiki/Galois/Counter_Mode
- Specification: It's not entirely clear to me how the spec is to be interpreted when one of the input strings is empty. The most reasonable interpretation would be that there should be zero blocks to process (n or m equal to zero). This requires some bending of the notation in equation (2), for example, with m = 0, n = 1, we should have
If this is a real problem with latest specification, it might make sense to bring this up somewhere.
Interface: I think the basic use case with empty second input should be just like other MAC:s,
struct gmac_ctx;
/* Key size fixed to GMAC_KEY_SIZE == 16 */ void gmac_set_key(struct gmac_ctx *ctx, const uint8_t *key);
void gmac_update(struct gmac_ctx *ctx, unsigned length, const uint8_t *data);
void gmac_digest(struct gmac_ctx *ctx, unsigned length, uint8_t *digest);
The context struct and the set_key function is essential to be able to do any optimizations using key-dependant tables.
But then we need a function to mark the end of the first input and the start of the second. Name for that one?
void gmac_next(struct gmac_ctx *ctx);
This will pad the current input to a block boundary, and switch to using a different length counter.
How about gmac_authenticate?
Further, I'm wondering if some other authenticating MACs cannot process data in parallell, which would argue for an interface like this:
struct gmac_ctx;
/* Key size fixed to GMAC_KEY_SIZE == 16 */ void gmac_set_key(struct gmac_ctx *ctx, const uint8_t *key);
void gmac_update(struct gmac_ctx *ctx, unsigned length, const uint8_t *data);
void gmac_digest(struct gmac_ctx *ctx, unsigned length, uint8_t *digest);
void gmac_authenticate(struct gmac_ctx *ctx, unsigned length, const uint8_t *data);
Or something.
/Simon
Simon Josefsson simon@josefsson.org writes:
http://csrc.nist.gov/groups/ST/toolkit/BCM/modes_development.html
I was just wondering what an "authoritative" reference would be.
Shouldn't you MAC the ciphertext? That's the proved secure approach.
I'm not familiar with those subtleties. Intuitively, it makes sense to me to MAC the cleartext, since that's closest to the *meaning* of the message. IIRC I picked up this (possibly outdated) advice from Applied Cryptography years ago.
If you can trick the receiver to use the wrong key or iv for decryption, then the receiver gets a garbled message, and if the MAC is applied to cryptotext rather than cleartext, the message will still apear to be authentic. So some care is needed when applying the MAC to ciphertext only (and I'm talking about the general combination of encryption and MAC, not the specific combination in GCM which I hope gets things right).
E.g., in ssh the mac is done as
mac = MAC(key, sequence_number || unencrypted_packet)
and replacing the unencrypted_packet by the corresponding ciphertext, with no other changes, would likely cause some trouble.
- Naming: Is "gmac" a good enough name? Or "ghash" (the name of the primitive which takes a key and two inputs, in the paper)? Or do we need something more verbose, like galois_mac or gmac128 or so?
The name GMAC is well established:
Ok. gmac it should be then. Or perhaps gmac128, in case anyone is using 64-bit gmac or planning for larger sizes?
If this is a real problem with latest specification, it might make sense to bring this up somewhere.
I'd have to check the NIST version of the spec.
Further, I'm wondering if some other authenticating MACs cannot process data in parallell, which would argue for an interface like this:
struct gmac_ctx; /* Key size fixed to GMAC_KEY_SIZE == 16 */ void gmac_set_key(struct gmac_ctx *ctx, const uint8_t *key); void gmac_update(struct gmac_ctx *ctx, unsigned length, const uint8_t *data); void gmac_digest(struct gmac_ctx *ctx, unsigned length, uint8_t *digest); void gmac_authenticate(struct gmac_ctx *ctx, unsigned length, const uint8_t *data);
Or something.
I'm not sure I understand what you are referring to. At least for gmac, I don't think one can mix the two inputs, one must complete one before starting on the other. And I'd prefer that this restriction is clearly expressed in the interface.
Regards, /Niels
On Thu, Feb 3, 2011 at 7:40 AM, Niels Möller nisse@lysator.liu.se wrote:
Simon Josefsson simon@josefsson.org writes:
http://csrc.nist.gov/groups/ST/toolkit/BCM/modes_development.html
I was just wondering what an "authoritative" reference would be.
NIST SP800-38D. See http://csrc.nist.gov/groups/ST/toolkit/BCM/current_modes.html and http://csrc.nist.gov/publications/nistpubs/800-38D/SP-800-38D.pdf.
[SNIP]
Jeff
Simon Josefsson simon@josefsson.org writes:
The name GMAC is well established:
And if I understand the spec correctly, T = GMAC(K, M) is computed roughly as follows
H = E_K(0...0) T = GHASH_H(M || ...) XOR E_K(IV)
I.e, the MAC key K is converted to the "hash subkey H" using the encryption function of some block cipher (typically AES), and this block cipher is also used together with the IV to get a value XOR:ed to the output of GHASH.
I imagine the final XOR is essential for the MAC security (to hide the otherwise very regular algebraic structure of GHASH).
When writing the previous mail, I hadn't realized that also the MAC part depends on the block cipher, and should be parametrized by the block cipher used. This makes it less natural to view GMAC as an independent algorithm.
Also, the need for an IV (never repeated with the same key) necessarily makes the interface more complex than, e.g., the HMAC interface.
Just like for DSA, where would be some use for a deterministic variant where the IV (or random number in the case of DSA) is determined as some function of the message (and possibly also of the key, although the dependence on the key clearly should be one-way).
Regards, /Niels
Replying to myself:
The context struct and the set_key function is essential to be able to do any optimizations using key-dependant tables.
It actually gets a bit complicated. I think we need several context/state structs.
First, the key. The only way the key is directly used is for encrypting things using the underlying block cipher. So it makes sense to represent the key as
const void *cipher_ctx; nettle_crypt_func *encrypt;
(except that for historical reasons, the first argument for nettle_crypt_func is not const).
Next, we have tables derived from the key. These are needed for otpimized implementations of ghash, and should be reused when several messages are processed using the same key. E.g., this could be done as
struct gcm128_ctx { const void *cipher_ctx; nettle_crypt_func *encrypt;
uint8_t H[16]; /* Hash subkey */ uint8_t M0[16][256]; /* Key-dependent table. */ };
void gcm128_ctx_init(struct gcm128_ctx *ctx, const void *cipher_ctx, nettle_crypt_func *encrypt);
(It would be more consistent with the rest of nettle to not store those pointers in gcm128_ctx, since its supposed to be ok to memcpy context structs around. One could either pass them as arguments to all functions, or inline an aes_ctx if aes is all we care about. I include them here, to simplify function prototypes).
Functions to process complete gcm messages can take this ctx as argument. The main ghash iteration would also take this context struct as argument,
/* Set state = (state XOR data) dot H */ void ghash_update(const struct gcm128_ctx *ctx, uint8_t *state, const uint8_t *data);
For streaming operations, we also need a per-message state struct, something like
struct gcm128_msg { uint8_t Y[16]; /* counter */ uint8_t J[16]; /* encryption of that */ uint8_t J0[16]; /* first encrypted counter block to be used for constructing the digest. */ uint8_t hash[16]; /* hashing state */
unsigned index; /* Index when doing partial blocks */ uint32_t auth_length; uint32_t msg_length; };
void gcm128_msg_init(struct gcm128_msg *msg, const struct gcm128_ctx *ctx, uint32_t iv_length, const uint8_t *iv);
/* Process auxillary auth data */ void gcm128_msg_auth(struct gcm128_msg *msg, const struct gcm128_ctx *ctx, uint32_t length, const uint8_t *data);
/* Only difference between encrypt and decrypt is if data is hashed before or after xoring with the key stream. */ void gcm128_msg_encrypt(struct gcm128_msg *msg, const struct gcm128_ctx *ctx, uint32_t length, uint8_t *dst, const uint8_t *src);
void gcm128_msg_decrypt(struct gcm128_msg *msg, const struct gcm128_ctx *ctx, uint32_t length, uint8_t *dst, const uint8_t *src);
gcm128_msg_digest(struct gcm128_msg *msg, const struct gcm128_ctx *ctx, uint32_t length, uint8_t *dst);
Comments? It gets more complicated than almost anything currently in Nettle.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Nikos Mavrogiannopoulos have been looking into support for Galois Counter Mode (GCM), see http://www.cryptobarn.com/papers/gcm-spec.pdf
I've checked in a first version, based on Nikos' code. Tentative interface as follows:
#define GCM_BLOCK_SIZE 16 #define GCM_IV_SIZE (GCM_BLOCK_SIZE - 4)
#define GCM_TABLE_BITS 0
struct gcm_ctx { /* Key-dependent state. */ /* Hashing subkey */ uint8_t h[GCM_BLOCK_SIZE]; #if GCM_TABLE_BITS uint8_t h_table[1 << GCM_TABLE_BITS][GCM_BLOCK_SIZE]; #endif /* Per-message state, depending on the iv */ /* Original counter block */ uint8_t iv[GCM_BLOCK_SIZE]; /* Updated for each block. */ uint8_t ctr[GCM_BLOCK_SIZE]; /* Hashing state */ uint8_t x[GCM_BLOCK_SIZE]; uint64_t auth_size; uint64_t data_size; };
/* FIXME: Should use const for the cipher context. Then needs const for nettle_crypt_func, which also rules out using that abstraction for arcfour. */ void gcm_set_key(struct gcm_ctx *ctx, void *cipher, nettle_crypt_func *f);
void gcm_set_iv(struct gcm_ctx *ctx, unsigned length, const uint8_t *iv);
void gcm_auth(struct gcm_ctx *ctx, unsigned length, const uint8_t *data);
void gcm_encrypt(struct gcm_ctx *ctx, void *cipher, nettle_crypt_func *f, unsigned length, uint8_t *dst, const uint8_t *src);
void gcm_decrypt(struct gcm_ctx *ctx, void *cipher, nettle_crypt_func *f, unsigned length, uint8_t *dst, const uint8_t *src);
void gcm_digest(struct gcm_ctx *ctx, void *cipher, nettle_crypt_func *f, unsigned length, uint8_t *digest);
Comments on both structure and naming are welcome.
My understanding of GCM is that the main point is a new MAC function which allows efficient hardware implementation.
The unoptimized GF(2^128) multiply function really is awfully slow. On x86_64, gmac takes 830 cycles/byte! We can compare to the sha functions, where sha1, sha256 and sha512 take respectively 8, 18 and 12 cycles/byte, so the current code is two orders of magnitude slower than hmac-sha1.
It remains to see how much table space and/or assembly hacking is needed to get reasonable performance.
Regards, /Niels
On 02/06/2011 12:08 AM, Niels Möller wrote:
The unoptimized GF(2^128) multiply function really is awfully slow. On x86_64, gmac takes 830 cycles/byte! We can compare to the sha functions, where sha1, sha256 and sha512 take respectively 8, 18 and 12 cycles/byte, so the current code is two orders of magnitude slower than hmac-sha1. It remains to see how much table space and/or assembly hacking is needed to get reasonable performance.
There is a special instruction for that on new intel and AMD CPUs... http://software.intel.com/en-us/articles/intel-carry-less-multiplication-ins... http://en.wikipedia.org/wiki/CLMUL_instruction_set
Unfortunately I don't have anything close to those cpus...
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
On 02/06/2011 12:08 AM, Niels Möller wrote:
It remains to see how much table space and/or assembly hacking is needed to get reasonable performance.
There is a special instruction for that on new intel and AMD CPUs... http://software.intel.com/en-us/articles/intel-carry-less-multiplication-ins... http://en.wikipedia.org/wiki/CLMUL_instruction_set
Interesting. I haven't played with any such special instructions (even if it ought to make a bit of difference also for aes).
Anyway, I've been hacking a bit on the C-implementation over the day, and the galois hashing (gmac) is now 18 times(!) faster. Summary of changes:
Original unoptimized code:
Algorithm mode Mbyte/s cycles/byte cycles/block gmac auth 1.49 829.83 13277.27
Optimized rshift, rewritten to use word-sized operations:
Algorithm mode Mbyte/s cycles/byte cycles/block gmac auth 4.62 268.14 4290.23
Optimized gf_mul, rewritten to use separate byte and bit loops:
Algorithm mode Mbyte/s cycles/byte cycles/block gmac auth 5.46 227.18 3634.90
Moved reduction into shift function:
Algorithm mode Mbyte/s cycles/byte cycles/block gmac auth 6.79 182.69 2923.02
Introduced 4-bit tables:
Algorithm mode Mbyte/s cycles/byte cycles/block gmac auth 27.14 45.68 730.82
Remaining tricks:
* Try 8-bit tables (which increases storage need a lot, the modest 4-bit tables need only 256 additional bytes per key, for gf_mul, and a 32-byte constant table for gf_shift. Extending to 8 bits makes both tables 16 times larger).
* See if it makes sense to write any assembler for the hashing function.
* Various smaller microoptimizations, like a public memxor-variant for when areas are known to be word-aligned. Or inline that xor:ing.
Regards, /Niels
On 02/06/2011 10:23 PM, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
On 02/06/2011 12:08 AM, Niels Möller wrote:
It remains to see how much table space and/or assembly hacking is needed to get reasonable performance.
There is a special instruction for that on new intel and AMD CPUs... http://software.intel.com/en-us/articles/intel-carry-less-multiplication-ins... http://en.wikipedia.org/wiki/CLMUL_instruction_set
Interesting. I haven't played with any such special instructions (even if it ought to make a bit of difference also for aes).
Anyway, I've been hacking a bit on the C-implementation over the day, and the galois hashing (gmac) is now 18 times(!) faster. Summary of changes:
[...]
Introduced 4-bit tables:
Algorithm mode Mbyte/s cycles/byte cycles/block gmac auth 27.14 45.68 730.82
That's pretty impressive!
On 02/06/2011 10:23 PM, Niels Möller wrote:
Interesting. I haven't played with any such special instructions (even if it ought to make a bit of difference also for aes). Anyway, I've been hacking a bit on the C-implementation over the day, and the galois hashing (gmac) is now 18 times(!) faster. Summary of changes:
I've also done a comparison benchmark of AES-GCM (the 4-bit table one) versus HMAC-SHAx+AES-CBC... AES-GCM in software is disappointing...
Checking AES-128-GCM (16kb payload)... Encrypted 97.67 Mb in 5.00 secs: 19.53 Mb/sec
Checking AES-128-CBC with SHA256 (16kb payload)... Encrypted and hashed 246.14 Mb in 5.00 secs: 49.23 Mb/sec
Checking AES-128-CBC with SHA1 (16kb payload)... Encrypted and hashed 354.16 Mb in 5.00 secs: 70.83 Mb/sec
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I've also done a comparison benchmark of AES-GCM (the 4-bit table one) versus HMAC-SHAx+AES-CBC... AES-GCM in software is disappointing...
Now I've tried 8-bit tables. Then I get into the same ballpark as md5 and the sha functions (benchmarking on intel x86_64):
Algorithm mode Mbyte/s cycles/byte cycles/block md5 update 174.20 7.12 455.48 sha1 update 158.09 7.84 501.89 sha256 update 68.36 18.14 1160.65 sha512 update 104.99 11.81 1511.55 gmac auth 65.93 18.80 300.87
I think both sha512 and gmac benefit from 64-bit wide registers, while md5, sha1 and sha256 does not. And I think there are still a couple of microoptimizations left to do for gmac.
(I'm only benchmarking gmac; the encryption should be about the same as AES in ECB or CTR mode, which is roughly 17 cycles/byte on the same hardware).
Now the question is if it's a good tradeoff to expand the key to a 4 KB table.
BTW, I hadn't noticed before that sha512 is faster per byte than sha256.
Regards, /Niels
On 02/07/2011 01:20 PM, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I've also done a comparison benchmark of AES-GCM (the 4-bit table one) versus HMAC-SHAx+AES-CBC... AES-GCM in software is disappointing...
Now I've tried 8-bit tables. Then I get into the same ballpark as md5 and the sha functions (benchmarking on intel x86_64): Algorithm mode Mbyte/s cycles/byte cycles/block md5 update 174.20 7.12 455.48 sha1 update 158.09 7.84 501.89 sha256 update 68.36 18.14 1160.65 sha512 update 104.99 11.81 1511.55 gmac auth 65.93 18.80 300.87 I think both sha512 and gmac benefit from 64-bit wide registers, while md5, sha1 and sha256 does not. And I think there are still a couple of microoptimizations left to do for gmac. (I'm only benchmarking gmac; the encryption should be about the same as AES in ECB or CTR mode, which is roughly 17 cycles/byte on the same hardware). Now the question is if it's a good tradeoff to expand the key to a 4 KB table.
4kb is not much on a desktop. There are constraint systems where this might be a problem though. Libtomcrypt had a definition to cope with this issue (e.g. LOW_FOOTPRINT or so).
On the other hand systems that might have an assembler-optimized version, would need to share the same big state as well... I don't know. That's why I like hiding that stuff :)
regards, Nikos
I've done some further updates.
* I've introduced a specialized function gcm_gf_add, used instead of memxor when blocks are aligned, and also avoiding looping overhead and (if it is inlined, which I think it should be) call overhead. Current performance on x86_64 is 28.5 cycles / byte with 4-bit tables (current default), and 8.5 cycles / byte with 8-bit tables. Close to a factor of two improvement.
* I've introduced a union gcm_block, which is used internally to ensure that the gf elements have the right alignment. Tested on sparc32 and sparc64 (big endian and pickier about alignment).
* I've split out the message-independent state to a separate struct gcm_key, which needs to be passed as argument to all gcm functions.
* I've added a struct gcm_aes_ctx and related functions. This is an all-in-one context, including all of the cipher context, the hashing subkey, and message state.
* I've added support for IV:s of arbitrary lengths, and added the rest of the testcases from http://www.cryptobarn.com/papers/gcm-spec.pdf
* I've simplified the configuration of internal multiplication routines a bit, and rewritten the table generation to use just shifts and adds (as suggested in http://www.cryptobarn.com/papers/gcm-spec.pdf), which means that when tables are used, there's no need to keep the bitwise multiplication function which doesn't use tables.
I think the code is stabilizing a bit now.
One naming question: Should gcm_aes_auth be renamed to gcm_aes_update, for consistency with other hash and mac functions? I'm tempted to do that.
Regards, /Niels
On 02/06/2011 12:08 AM, Niels Möller wrote:
void gcm_set_key(struct gcm_ctx *ctx, void *cipher, nettle_crypt_func *f);
I don't like the name of the function name. It doesn't reveal anything about its purpose. There is no key to set there. I'd suggest the original gcm_init.
Moreover by not allowing the setting the blocksize as option any extension on that code to work with 64-bit ciphers, will require an abi break, or a new gcm64 mode... (what if 256-bit ciphers are added in the future?)
void gcm_encrypt(struct gcm_ctx *ctx, void *cipher, nettle_crypt_func *f, unsigned length, uint8_t *dst, const uint8_t *src); void gcm_digest(struct gcm_ctx *ctx, void *cipher, nettle_crypt_func *f, unsigned length, uint8_t *digest);
As I already mentioned I prefer having the cipher and f, to context to avoid supplying on individual calls. There is no advantage (that I can see) on having on each function parameters, and it just delegates the storage of those two pointers, to caller's structures instead. It's no big deal but it is inconvenience.
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
On 02/06/2011 12:08 AM, Niels Möller wrote:
void gcm_set_key(struct gcm_ctx *ctx, void *cipher, nettle_crypt_func *f);
I don't like the name of the function name. It doesn't reveal anything about its purpose. There is no key to set there. I'd suggest the original gcm_init.
I see your point, and I think I'll change the name. If we introduce a gcm_aes wrapper, that will have a _set_key method.
Moreover by not allowing the setting the blocksize as option any extension on that code to work with 64-bit ciphers, will require an abi break, or a new gcm64 mode... (what if 256-bit ciphers are added in the future?)
I don't think it's useful to be that general here. Variants with different block sizes will likely need a different context struct anyway.
I could rename gcm to gcm128 in the interface if that's clearer.
As I already mentioned I prefer having the cipher and f, to context to avoid supplying on individual calls. There is no advantage (that I can see) on having on each function parameters, and it just delegates the storage of those two pointers, to caller's structures instead. It's no big deal but it is inconvenience.
I haven't yet made up my mind on this, but let me explain the reason for having these pointers as function arguments.
The idea is that context structs in nettle should be non-magic with no pointers, so that it can be copied or relocated in memory at will. Say we implement gcm_aes as
struct gcm_aes_ctx { struct gcm_ctx gcm; struct aes_ctx aes; };
void gcm_aes_encrypt(struct gcm_aes_ctx *ctx, unsigned length, uint8_t *dst, const uint8_t *src) { gcm_encrypt(&ctx->gcm, &ctx->aes, (nettle_crypt_func) aes_encrypt, length, dst, src); }
Then the context struct is still non-magic. We can call gcm_aes_set_key, and then create multiple copies of gcm_aes_ctx (using plain memcpy) which are independent. If we add some pointers to struct gcm_ctx (which *does* increase the storage size of gcm_aes_ctx, although that's maybe not a big deal), plain copying will leave pointers pointing to other objects, and we'll have to introduce a function gcm_aes_copy.
And if you think copying here is the wrong thing (since we waste memory with multiple identical copies of an aes_ctx representing the key), then I think one should also split gcm_ctx into a key-dependent part (which should be shared rather than copied, in particular if we go for larger key-dependent tables) and a message-dependent part (which also shouldn't be copied, instead multiple instances should be independently initialized).
We can compare with the hmac code; there's no context struct for the general construction, but hmac_sha1_ctx and other's defined using the HMAC_CTX macro put both key-the dependent parts (.inner, .outer) and the message dependent part (.state) in a single struct.
When thinking about it, maybe the right thing is to redesign the general gcm-code to use a separate struct for the hash subkey, passed as argument to the functions needing it.
Regards, /Niels
On 02/07/2011 10:26 AM, Niels Möller wrote:
Moreover by not allowing the setting the blocksize as option any extension on that code to work with 64-bit ciphers, will require an abi break, or a new gcm64 mode... (what if 256-bit ciphers are added in the future?)
I don't think it's useful to be that general here. Variants with different block sizes will likely need a different context struct anyway.
Indeed...
I could rename gcm to gcm128 in the interface if that's clearer.
I like the plain gcm...
As I already mentioned I prefer having the cipher and f, to context to avoid supplying on individual calls. There is no advantage (that I can see) on having on each function parameters, and it just delegates the storage of those two pointers, to caller's structures instead. It's no big deal but it is inconvenience.
I haven't yet made up my mind on this, but let me explain the reason for having these pointers as function arguments. The idea is that context structs in nettle should be non-magic with no pointers, so that it can be copied or relocated in memory at will. Say we implement gcm_aes as
[...] It makes sense, and although I've never used context structs like that I could understand if someone did. I like to see context structures as things that will take away my burden of maintaining several pointers and data that relate to the operation.
Moreover different AEAD modes might have different requirements, but it might be nice to have a consistent low level interface on them (if possible of course). If an AEAD mode doesn't require the encryption function at the _digest operation, it would mean it would have different function parameters. For me it would be best if it was consistent, even if it is low-level... but it's your call...
[...]
When thinking about it, maybe the right thing is to redesign the general gcm-code to use a separate struct for the hash subkey, passed as argument to the functions needing it.
I would like less of the internals of gcm exposed to the user rather than more. As a user of nettle I wouldn't even want to know that there is a hash subkey on gcm.
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I would like less of the internals of gcm exposed to the user rather than more. As a user of nettle I wouldn't even want to know that there is a hash subkey on gcm.
In any case we should probably have a gcm_aes interface (and whatever other variants are relevant) that is easier to use than the lowest level gcm interface.
Regards, /Niels
On 02/07/2011 12:21 PM, Niels Möller wrote:
I would like less of the internals of gcm exposed to the user rather than more. As a user of nettle I wouldn't even want to know that there is a hash subkey on gcm.
In any case we should probably have a gcm_aes interface (and whatever other variants are relevant) that is easier to use than the lowest level gcm interface.
Could be... Another thing. I've implicitly used gcm_set_iv() as a way to reset the GCM mode. Unfortunately it is not enough. The auth_size and data_size have to be set to zero as well. Do you think that should be done in the set_iv function as well?
I've currently done that in gnutls, and with that change gnutls talks GCM with others servers.
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
Could be... Another thing. I've implicitly used gcm_set_iv() as a way to reset the GCM mode. Unfortunately it is not enough.
It's intended to work, current gcm_set_iv in cvs does
/* Reset the rest of the message-dependent state. */ memset(ctx->x, 0, sizeof(ctx->x)); ctx->auth_size = ctx->data_size = 0;
Is there something I'm missing?
Regards, /Niels
On 02/07/2011 05:01 PM, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
Could be... Another thing. I've implicitly used gcm_set_iv() as a way to reset the GCM mode. Unfortunately it is not enough.
It's intended to work, current gcm_set_iv in cvs does /* Reset the rest of the message-dependent state. */ memset(ctx->x, 0, sizeof(ctx->x)); ctx->auth_size = ctx->data_size = 0; Is there something I'm missing?
No forget it. I was mistaken on the reason of the issue I had. The current version is ok and inter-operable with.other TLS-GCM versions.
regards, Nikos
On 02/07/2011 10:26 AM, Niels Möller wrote:
I haven't yet made up my mind on this, but let me explain the reason for having these pointers as function arguments. The idea is that context structs in nettle should be non-magic with no pointers, so that it can be copied or relocated in memory at will. Say we implement gcm_aes as
What I was wondering is how would you think of implementing the cpu-specific optimizations in assembly? What I had in mind is that the _init function would detect the particular instructions present and set some function pointers to the structure that will assist the operations such as gf_mul to select the proper variant... However as it seems, this is not how it can be done if function pointers are not stored...
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
What I was wondering is how would you think of implementing the cpu-specific optimizations in assembly?
If/when we do that, and if we want to be able to select which code to use at runtime (rater than compile time), I think we should use global pointers, one for each routine that can make use special instructions. Either setup automagically at first use, or at library load time (using the same mechanisms as C++ constructors).
Regards, /Niels
On 02/08/2011 12:11 PM, Niels Möller wrote:
What I was wondering is how would you think of implementing the cpu-specific optimizations in assembly?
If/when we do that, and if we want to be able to select which code to use at runtime (rater than compile time),
The compile time will cause problems to distributions that ship a single library across compatible architectures. Given that they will ship the version without special instructions for compatibility, most systems will not be affected by such optimizations.
I think we should use global pointers, one for each routine that can make use special instructions. Either setup automagically at first use, or at library load time (using the same mechanisms as C++ constructors).
I like the latter... An explicit global library initialization function might also do.
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
On 02/08/2011 12:11 PM, Niels Möller wrote:
I think we should use global pointers, one for each routine that can make use special instructions. Either setup automagically at first use, or at library load time (using the same mechanisms as C++ constructors).
I like the latter... An explicit global library initialization function might also do.
GMP does the former. Not as pretty, but seems to work well, and maybe a bit easier to do portably.
Regards /Niels
On 02/08/2011 01:26 PM, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
On 02/08/2011 12:11 PM, Niels Möller wrote:
I think we should use global pointers, one for each routine that can make use special instructions. Either setup automagically at first use, or at library load time (using the same mechanisms as C++ constructors).
I like the latter... An explicit global library initialization function might also do.
GMP does the former. Not as pretty, but seems to work well, and maybe a bit easier to do portably.
How does it deal with multi-threaded access?
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
How does it deal with multi-threaded access?
The worst thing that can happen is that several threads examine the cpuid flags, and then write identical pointers to the same locations at approximately the same time.
And as far as I have understod, that is no problem, because, reading and writing a pointer value is an atomic operations on the concerned architectures.
Regards, /Niels
I looked at recent GCM code and noticed this:
/* FIXME: Should use const for the cipher context. Then needs const for nettle_crypt_func, which also rules out using that abstraction for arcfour. */ void gcm_set_key(struct gcm_key *key, void *cipher, nettle_crypt_func *f);
However GCM (like CCM) is only specified for block ciphers, and further, only for 128-bit block ciphers. Thus I wonder if avoiding use of const just to let the abstraction support a stream cipher is wise?
/Simon
Simon Josefsson simon@josefsson.org writes:
/* FIXME: Should use const for the cipher context. Then needs const for nettle_crypt_func, which also rules out using that abstraction for arcfour. */
However GCM (like CCM) is only specified for block ciphers, and further, only for 128-bit block ciphers. Thus I wonder if avoiding use of const just to let the abstraction support a stream cipher is wise?
This nettle_Crypt_func is not gcm-specific. It is used primarily for the nettle_cipher class in nettle-meta.h:
struct nettle_cipher { const char *name;
unsigned context_size;
/* Zero for stream ciphers */ unsigned block_size;
/* Suggested key size; other sizes are sometimes possible. */ unsigned key_size;
nettle_set_key_func *set_encrypt_key; nettle_set_key_func *set_decrypt_key;
nettle_crypt_func *encrypt; nettle_crypt_func *decrypt; };
This currently is used to represent both block and stream ciphers,
[...] extern const struct nettle_cipher nettle_aes256;
extern const struct nettle_cipher nettle_arcfour128;
extern const struct nettle_cipher nettle_camellia128; [...]
Currently, arcfour is the only supported stream cipher (they seem to be out of fashion, are thare any other stream ciphers in use? A5 maybe?)
So the question is, should we decide that nettle_cipher is för block ciphers only (where the encrypt and decrypt functions don't change any state )? Fitting arcfour and block ciphers into the same abstraction doesn't make much sense anyway, since they should be used very differently. Then we can make the context argument const for nettle_crypt_func, but we'd also have to delete
extern const struct nettle_cipher nettle_arcfour128;
or replace it with something else, which is an incompatible interface change. As long as it's the only supported stream cipher, it doesn't make much sense to me create a new general stream cipher construction.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Currently, arcfour is the only supported stream cipher (they seem to be out of fashion, are thare any other stream ciphers in use? A5 maybe?)
There are newer stream ciphers, mostly due to eSTREAM:
http://www.ecrypt.eu.org/stream/
I don't understand the rationale for stream ciphers today though. The traditional argument for stream ciphers was speed but you get 10GBps+ with nice modes like AES-GCM. Further, you can build a secure key stream generator from any secure block cipher (see for example [1]).
Maybe the argument today is cost of hardware, but for that to be effective in the long run you have to beat Moore's law.
/Simon
[1] http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/kfb/kfb-s...
nisse@lysator.liu.se (Niels Möller) writes:
So the question is, should we decide that nettle_cipher is för block ciphers only (where the encrypt and decrypt functions don't change any state )? Fitting arcfour and block ciphers into the same abstraction doesn't make much sense anyway, since they should be used very differently. Then we can make the context argument const for nettle_crypt_func, but we'd also have to delete
extern const struct nettle_cipher nettle_arcfour128;
or replace it with something else, which is an incompatible interface change. As long as it's the only supported stream cipher, it doesn't make much sense to me create a new general stream cipher construction.
Breaking API compatibility is painful. Other libraries like libgcrypt also try to use the same interface for both stream ciphers and block ciphers too. However, in my experience this makes things difficult at a higher level -- the distinction perculate up because stream ciphers doesn't (for example) have a block length so you either have to say it has a block length of 1 byte (or 1 bit if you support bit-lengths) which can cause problems if you want to do MAC or other processing on a block-by-block basis (MAC:ing each byte is not a good idea..).
So I would support making stream ciphers a different beast than block ciphers as far as the API goes, unless the API change is too painful.
If we had used a object oriented language, there could be a super-class "cipher" and two sub-classes "block cipher" and "stream cipher". Then some functions could take any "cipher" and some (like GCM) could take any "block cipher". Fortunately we aren't using OO here though. :-)
/Simon
nettle-bugs@lists.lysator.liu.se