-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
RFC 4648 (https://tools.ietf.org/html/rfc4648) standardizes two Base-64 alphabets. Nettle currently only supports the traditional base-64 alphabet from section 4.
There is growing use amongst new protocol definitions and extensions, particularly in the HTTP area for the URL-safe extension alphabet instead of the classical Base-64 alphabet.
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
External code simply calls the init() function relevant to the alphabet it is needing to encode/decode with. The library internally uses the context to select which lookup table to use for later base64 function calls.
The base64_encode_raw() and base64_encode_group() functions which do not use contexts are left untouched for now.
Amos Jeffries Treehouse Networks Ltd.
Amos Jeffries squid3@treenet.co.nz writes:
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
Thanks. It makes sense to me to add support for this in one way or the other.
External code simply calls the init() function relevant to the alphabet it is needing to encode/decode with. The library internally uses the context to select which lookup table to use for later base64 function calls.
Passing the alphabet to the init function and recording it in the context is nice in some ways, but it breaks the ABI (since the size of the context struct is part of the ABI). Now, maybe we have to break the ABI for 3.1 anyway, to introduce versioned symbols. I don't yet know for sure if an soname change is necessary or not.
The alternative to storing this in the context, is to introduce some general encoding/decoding functions which support arbitrary alphabets and take the needed table as an additional argument. Then base64_* and base64url_* would be simple wrapper functions passing the right table.
The base64_encode_raw() and base64_encode_group() functions which do not use contexts are left untouched for now.
Both functions are undocumented, so we can consider changing them if neeed. As for encode_group, that seems to be used only in the openpgp code which is not really in a working state. So we can probably ignore that one.
But base64_encode_raw is used by the main encoding function, base64_encode_update. So to support other alphabets one either needs to pass some kind of alphabet argument to base64_encode_raw, or introduce a new function with such an argument.
Does anyone know if applications are using base64_encode_raw, despite it's status as undocumented?
+/* which alphabet to use */ +#define BASE64_ALPHABET 0 +#define BASE64URL_ALPHABET 1
I think I'd prefer to avoid an enumeration of available alphabets. And in the API you sketch, applications will never use these constants, right?
/* Base64 encoding */
/* Maximum length of output for base64_encode_update. NOTE: Doesn't @@ -73,11 +79,17 @@ struct base64_encode_ctx { unsigned word; /* Leftover bits */ unsigned bits; /* Number of bits, always 0, 2, or 4. */
- unsigned alphabet; /* which alphabet to use for encoding */
};
Assuming we go with adding stuff to the context struct, I'd prefer to add a pointer directly to the alphabet (and for decoding, a pointer directly to the lookup table). So we don't need any switch on an alphabet enum.
diff --git a/base64-decode.c b/base64-decode.c index f622baa..fbaf54f 100644 --- a/base64-decode.c +++ b/base64-decode.c @@ -43,7 +43,7 @@ #define TABLE_END -3
static const signed char -decode_table[0x100] = +default_decode_table[0x100] = { /* White space is HT, VT, FF, CR, LF and SPC */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -2, -2, -2, -2, -2, -1, -1, @@ -64,10 +64,40 @@ decode_table[0x100] = -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, };
+static const signed char +urlextended_decode_table[0x100] = +{
- /* White space is HT, VT, FF, CR, LF and SPC */
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -2, -2, -2, -2, -2, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1,
- 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -3, -1, -1,
- -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, 63,
- -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
- 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
I think the base64url decoding functions, and its table, should be put in its own object file. Maybe static linking is obscure these days, but it's desirable that if one uses only the standard base64 alphabet, one shouldn't get dependencies causing the linker to drag in tables for other alphabets.
void base64_decode_init(struct base64_decode_ctx *ctx) { ctx->word = ctx->bits = ctx->padding = 0;
- ctx->alphabet = BASE64_ALPHABET;
+}
+void +base64url_decode_init(struct base64_decode_ctx *ctx) +{
- ctx->word = ctx->bits = ctx->padding = 0;
- ctx->alphabet = BASE64URL_ALPHABET;
}
And then the assignment of ctx->alphabet would be
ctx->table = base64_decode_table;
and
ctx->table = base64url_decode_table;
in the respective init functions.
int @@ -76,8 +106,11 @@ base64_decode_single(struct base64_decode_ctx *ctx, uint8_t src) { int data;
- data = decode_table[src];
- if (ctx->alphabet == BASE64URL_ALPHABET)
- data = urlextended_decode_table[src];
- else
- data = default_decode_table[src];
And this piece of code would then simplify to
data = ctx->table[src];
@@ -155,6 +171,7 @@ base64_encode_single(struct base64_encode_ctx *ctx, unsigned done = 0; unsigned word = ctx->word << 8 | src; unsigned bits = ctx->bits + 8;
const uint8_t *encode_table = (ctx->alphabet == BASE64URL_ALPHABET ? urlextended_encode_table : default_encode_table);
while (bits >= 6) {
I think this works, but only because the ENCODE macro which refers to the name "encode_table" now gets a local variable, instead of the global table in the old code. At a casual look, the local variable appears unused. I think it's easier to understand if this dependency where made more explicit.
Finally, are there other alphabets that are in use? You have mentioned, earlier, the alphabet used for crypt(3) (where is that documented???), and iirc, SRP files also use some unusual alphabet for base64.
Maybe we should make it reasonably easy to let applications supply custom alphabets? This should be almost trivial for encoding, while for decoding, we would either need to document the format of the lookup table, or provide a function to generate a decoding table for a given alphabet.
Best regards, /Niels
On 12/12/2014 01:49 PM, nisse@lysator.liu.se (Niels Möller) wrote:
Now, maybe we have to break the ABI for 3.1 anyway, to introduce versioned symbols. I don't yet know for sure if an soname change is necessary or not.
My experience is that tools built against a library with unversioned symbols can work fine when linked against a library with versioned symbols, but that it produces warnings from the dynamic linker. This came up recently for libgpg-error, which recently added symbol versioning:
https://bugs.debian.org/771100 https://bugs.debian.org/765430
In short, i don't think an SONAME bump is functionally necessary just for introduction of versioned symbols, though lack of an SONAME bump may introduce some cosmetic changes for people who don't rebuild against the newer version.
Does anyone know if applications are using base64_encode_raw, despite it's status as undocumented?
in the debian project, it looks like there are code examples from aria2 that use base64_encode_raw, and it's also used in rtmpdump's librtmp:
https://codesearch.debian.net/results/base64_encode_raw/page_0
We can probably get the aria2 examples fixed, but librtmp seems more problematic.
--dkg
Daniel Kahn Gillmor dkg@fifthhorseman.net writes:
In short, i don't think an SONAME bump is functionally necessary just for introduction of versioned symbols, though lack of an SONAME bump may introduce some cosmetic changes for people who don't rebuild against the newer version.
And in this case, I think the cosmetic problem is less important, since I'd expect most users to upgrade from 2.7.1 to 3.1, and never install 3.0. But I guess, for the same reason, another soname change should also be relatively painless. Do you agree?
Hmm. So if we conclude that it really would make a better interface with extended base64 contexts, we could do that and bump the soname.
(And API changes are a different matter, I think it is important to not break existing source code using base64. So the idea is that code using base64_init and friends should only need a recompile to work with nettle-3.1).
Does anyone know if applications are using base64_encode_raw, despite it's status as undocumented?
in the debian project, it looks like there are code examples from aria2 that use base64_encode_raw, and it's also used in rtmpdump's librtmp:
Thanks for looking this up. Seems like we should keep (and document?) base64_encode_raw unchanged. And we then have to come up with a new name for a base64_encode_raw_with_alphabet function.
And doing a similar search for base64_encode_group luckily gives no matches at all, outside of nettle.
Regards, /Niels
On 13/12/2014 7:49 a.m., Niels Möller wrote:
Amos Jeffries squid3@treenet.co.nz writes:
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
Thanks. It makes sense to me to add support for this in one way or the other.
External code simply calls the init() function relevant to the alphabet it is needing to encode/decode with. The library internally uses the context to select which lookup table to use for later base64 function calls.
Passing the alphabet to the init function and recording it in the context is nice in some ways, but it breaks the ABI (since the size of the context struct is part of the ABI). Now, maybe we have to break the ABI for 3.1 anyway, to introduce versioned symbols. I don't yet know for sure if an soname change is necessary or not.
The alternative to storing this in the context, is to introduce some general encoding/decoding functions which support arbitrary alphabets and take the needed table as an additional argument. Then base64_* and base64url_* would be simple wrapper functions passing the right table.
The base64_encode_raw() and base64_encode_group() functions which do not use contexts are left untouched for now.
Both functions are undocumented, so we can consider changing them if neeed. As for encode_group, that seems to be used only in the openpgp code which is not really in a working state. So we can probably ignore that one.
But base64_encode_raw is used by the main encoding function, base64_encode_update. So to support other alphabets one either needs to pass some kind of alphabet argument to base64_encode_raw, or introduce a new function with such an argument.
Does anyone know if applications are using base64_encode_raw, despite it's status as undocumented?
+/* which alphabet to use */ +#define BASE64_ALPHABET 0 +#define BASE64URL_ALPHABET 1
I think I'd prefer to avoid an enumeration of available alphabets. And in the API you sketch, applications will never use these constants, right?
Yes.
/* Base64 encoding */
/* Maximum length of output for base64_encode_update. NOTE: Doesn't @@ -73,11 +79,17 @@ struct base64_encode_ctx { unsigned word; /* Leftover bits */ unsigned bits; /* Number of bits, always 0, 2, or 4. */
- unsigned alphabet; /* which alphabet to use for encoding */
};
Assuming we go with adding stuff to the context struct, I'd prefer to add a pointer directly to the alphabet (and for decoding, a pointer directly to the lookup table). So we don't need any switch on an alphabet enum.
diff --git a/base64-decode.c b/base64-decode.c index f622baa..fbaf54f 100644 --- a/base64-decode.c +++ b/base64-decode.c @@ -43,7 +43,7 @@ #define TABLE_END -3
static const signed char -decode_table[0x100] = +default_decode_table[0x100] = { /* White space is HT, VT, FF, CR, LF and SPC */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -2, -2, -2, -2, -2, -1, -1, @@ -64,10 +64,40 @@ decode_table[0x100] = -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, };
+static const signed char +urlextended_decode_table[0x100] = +{
- /* White space is HT, VT, FF, CR, LF and SPC */
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -2, -2, -2, -2, -2, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1,
- 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -3, -1, -1,
- -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, 63,
- -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
- 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
- -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
I think the base64url decoding functions, and its table, should be put in its own object file. Maybe static linking is obscure these days, but it's desirable that if one uses only the standard base64 alphabet, one shouldn't get dependencies causing the linker to drag in tables for other alphabets.
Are you sure you want that?
Except for the init function and table the remainder of the code is all shared. Separating the tables from the code means we would end up with +4 files just holding lookup tables and have to move the table symbols/pointers into a .h.
void base64_decode_init(struct base64_decode_ctx *ctx) { ctx->word = ctx->bits = ctx->padding = 0;
- ctx->alphabet = BASE64_ALPHABET;
+}
+void +base64url_decode_init(struct base64_decode_ctx *ctx) +{
- ctx->word = ctx->bits = ctx->padding = 0;
- ctx->alphabet = BASE64URL_ALPHABET;
}
And then the assignment of ctx->alphabet would be
ctx->table = base64_decode_table;
and
ctx->table = base64url_decode_table;
in the respective init functions.
int @@ -76,8 +106,11 @@ base64_decode_single(struct base64_decode_ctx *ctx, uint8_t src) { int data;
- data = decode_table[src];
- if (ctx->alphabet == BASE64URL_ALPHABET)
- data = urlextended_decode_table[src];
- else
- data = default_decode_table[src];
And this piece of code would then simplify to
data = ctx->table[src];
@@ -155,6 +171,7 @@ base64_encode_single(struct base64_encode_ctx *ctx, unsigned done = 0; unsigned word = ctx->word << 8 | src; unsigned bits = ctx->bits + 8;
const uint8_t *encode_table = (ctx->alphabet == BASE64URL_ALPHABET ? urlextended_encode_table : default_encode_table);
while (bits >= 6) {
I think this works, but only because the ENCODE macro which refers to the name "encode_table" now gets a local variable, instead of the global table in the old code. At a casual look, the local variable appears unused. I think it's easier to understand if this dependency where made more explicit.
Would you like that done as code documentation, manually unrolling ENCODE, or replacing it with a static function?
Finally, are there other alphabets that are in use? You have mentioned, earlier, the alphabet used for crypt(3) (where is that documented???), and iirc, SRP files also use some unusual alphabet for base64.
Maybe we should make it reasonably easy to let applications supply custom alphabets? This should be almost trivial for encoding, while for decoding, we would either need to document the format of the lookup table, or provide a function to generate a decoding table for a given alphabet.
When the alphabet pointer is stored in the context the caller is able to change that however and whenever they wish. For my first submission I thought that was slightly too dangerous to propose.
But if you are happy to do it, then it resolves both the enumeration existence and the ability for callers to use custom alphabets.
Amos
RFC 4648 (https://tools.ietf.org/html/rfc4648) standardizes two Base-64 alphabets. Nettle currently only supports the traditional base-64 alphabet from section 4.
There is growing use amongst new protocol definitions and extensions, particularly in the HTTP area for the URL-safe extension alphabet instead of the classical Base-64 alphabet.
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
For the standardized alphabets external code simply calls the init() function relevant to the alphabet it is needing to encode/decode with. The library internally uses the context to select which lookup table to use for later base64 function calls.
For custom or non-standard alphabets a pointer to the alphabet lookup table is included in the encode/decode contexts. External code can memset() a context to empty and provide the alphabet lookup table pointer.
Amos Jeffries Treehouse Networks Ltd.
PS, I have also removed the dead code wrappend in "#if 0" rather than updating it.
On 29/01/2015 5:51 p.m., Amos Jeffries wrote:
RFC 4648 (https://tools.ietf.org/html/rfc4648) standardizes two Base-64 alphabets. Nettle currently only supports the traditional base-64 alphabet from section 4.
There is growing use amongst new protocol definitions and extensions, particularly in the HTTP area for the URL-safe extension alphabet instead of the classical Base-64 alphabet.
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
For the standardized alphabets external code simply calls the init() function relevant to the alphabet it is needing to encode/decode with. The library internally uses the context to select which lookup table to use for later base64 function calls.
For custom or non-standard alphabets a pointer to the alphabet lookup table is included in the encode/decode contexts. External code can memset() a context to empty and provide the alphabet lookup table pointer.
Amos Jeffries Treehouse Networks Ltd.
PS, I have also removed the dead code wrappend in "#if 0" rather than updating it.
Meh Nix that earlier patch. Here is an updated version without the stupid syntax errors.
This passes the base64 unit tests, but there are no specific tests for the difference in lookup tables (and no test vectors in the RFC either). If you want some added I can look at adding some.
Amos
Amos Jeffries squid3@treenet.co.nz writes:
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
Thanks. I have some comments below. Also, testcases are essential (and I suspect that the definition of base64_encode_group is not really accepted by the compiler).
For the standardized alphabets external code simply calls the init() function relevant to the alphabet it is needing to encode/decode with. The library internally uses the context to select which lookup table to use for later base64 function calls.
Or we could introduce a generalized init function, if that need arises.
-#define ENCODE(x) (encode_table[0x3F & (x)]) +#define ENCODE(x) (ctx->alphabet[0x3F & (x)])
I think
#define ENCODE(alphabet, x) ((alphabet)[0x3F & (x)])
is better, not relying on "ctx" variable being defined where the macro is used.
void base64_encode_raw(uint8_t *dst, size_t length, const uint8_t *src) {
- struct base64_encode_ctx ctx;
- base64_encode_init(&ctx);
- _base64_encode_raw(&cxt, dst, length, src);
+}
I think it would be better to let _base64_encode_raw take the alphabet as an argument. base64_encode_raw would then be simply
void base64_encode_raw(uint8_t *dst, size_t length, const uint8_t *src) { _base64_encode_raw(default_encode_table, dst, length, src); }
void base64_encode_group(uint8_t *dst, uint32_t group) {
- struct base64_encode ctx;
- base64_encode_init(&ctx);
- *dst++ = ENCODE(group >> 18); *dst++ = ENCODE(group >> 12); *dst++ = ENCODE(group >> 6);
@@ -144,6 +111,7 @@ void
And similarly here, I think it would be better to not use any ctx struct, and instead do
*dst++ = ENCODE(default_encode_table, group >> 18); *dst++ = ENCODE(default_encode_table, group >> 12); *dst++ = ENCODE(default_encode_table, group >> 6);
(And this function is of questionable utility...)
Regards, /Niels
On 29/01/2015 9:08 p.m., Niels Möller wrote:
Amos Jeffries writes:
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
Thanks. I have some comments below. Also, testcases are essential (and I suspect that the definition of base64_encode_group is not really accepted by the compiler).
I have added a fuzz tester that runs a set of randomly generated strings past the default encoder and decoder, then the extended one.
I am using rand() to generate 1020 input octets each loop then for each alphabet: encode checking the output lengths match the crypted string length, decode checking for success, then compare the round-trip output matches the original input. - if you have a better random generator you would like to use please let me know where to find a code example or docs for it.
I've tested the unit test itself in a few ways: * eyeball comparison of several base64 vs base64url crypted strings to verify they match modulo the different alphabet characters. * eyeball check to ensure the fuzz does include the alphabet unique characters fairly often. * inverting test assertions to verify the checks are asserting on the right boundary conditions * growing the fuzz length and count form small to large values checking for consistent pass/fail. * seeding rand() with different fixed values, and leaving default seed value.
The test and the coders seem to be working fine. Except that when I increase the *count* of fuzz test run it asserts regular as clockwork on the 950th loop:
base64-test: base64-encode.c:96: _base64_encode_raw: Assertion `in == src' failed.
For the standardized alphabets external code simply calls the init() function relevant to the alphabet it is needing to encode/decode with. The library internally uses the context to select which lookup table to use for later base64 function calls.
Or we could introduce a generalized init function, if that need arises.
-#define ENCODE(x) (encode_table[0x3F & (x)]) +#define ENCODE(x) (ctx->alphabet[0x3F & (x)])
I think
#define ENCODE(alphabet, x) ((alphabet)[0x3F & (x)])
is better, not relying on "ctx" variable being defined where the macro is used.
void base64_encode_raw(uint8_t *dst, size_t length, const uint8_t *src) {
- struct base64_encode_ctx ctx;
- base64_encode_init(&ctx);
- _base64_encode_raw(&cxt, dst, length, src);
+}
I think it would be better to let _base64_encode_raw take the alphabet as an argument. base64_encode_raw would then be simply
void base64_encode_raw(uint8_t *dst, size_t length, const uint8_t *src) { _base64_encode_raw(default_encode_table, dst, length, src); }
void base64_encode_group(uint8_t *dst, uint32_t group) {
- struct base64_encode ctx;
- base64_encode_init(&ctx);
- *dst++ = ENCODE(group >> 18); *dst++ = ENCODE(group >> 12); *dst++ = ENCODE(group >> 6);
Okay, done all that.
@@ -144,6 +111,7 @@ void
And similarly here, I think it would be better to not use any ctx struct, and instead do
*dst++ = ENCODE(default_encode_table, group >> 18); *dst++ = ENCODE(default_encode_table, group >> 12); *dst++ = ENCODE(default_encode_table, group >> 6);
(And this function is of questionable utility...)
Do you wish me to remove base64_encode_group entirely?
Cheers Amos
Amos Jeffries squid3@treenet.co.nz writes:
I have added a fuzz tester that runs a set of randomly generated strings past the default encoder and decoder, then the extended one.
Nice. It would be good with a couple of short manual examples as well. A pity that RFC 4648 doesn't include any testvectors for base64url. (Or am I missing something? Simon?)
- if you have a better random generator you would like to use please
let me know where to find a code example or docs for it.
I'd suggest using lfib_knuth generator in nettle. And maybe you already to that, but it's good to randomize also the length of the data.
The test and the coders seem to be working fine. Except that when I increase the *count* of fuzz test run it asserts regular as clockwork on the 950th loop:
base64-test: base64-encode.c:96: _base64_encode_raw: Assertion `in == src' failed.
This needs investigation. Have you run the tests under valgrind? ("make check EMULATOR='$(VALGRIND)'")?
Do you wish me to remove base64_encode_group entirely?
No, not at this time (it's used in one place, in the unfinished openpgp code).
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Amos Jeffries squid3@treenet.co.nz writes:
I have added a fuzz tester that runs a set of randomly generated strings past the default encoder and decoder, then the extended one.
Nice. It would be good with a couple of short manual examples as well. A pity that RFC 4648 doesn't include any testvectors for base64url. (Or am I missing something? Simon?)
No, there aren't any -- you can s/+/-/g and s,/,_,g the base64 test vectors. Btw, if you produce more complete test vectors, I'd be interested in adding them to some future update of that document.
/Simon
On 30/01/2015 1:27 a.m., Niels Möller wrote:
Amos Jeffries writes:
I have added a fuzz tester that runs a set of randomly generated strings past the default encoder and decoder, then the extended one.
Nice. It would be good with a couple of short manual examples as well. A pity that RFC 4648 doesn't include any testvectors for base64url. (Or am I missing something? Simon?)
- if you have a better random generator you would like to use please
let me know where to find a code example or docs for it.
I'd suggest using lfib_knuth generator in nettle. And maybe you already to that, but it's good to randomize also the length of the data.
Aha. Thank you. I missed that in the docs. (Kudos to the person who designed that function set, its probably the most lovely I've seen in a long while.)
Yes I was randomizing the length, which actually turned out to be the assert problem. rand()%N-4 [vs. rand()%(N-4)] will eventualy become negative and break tha stack. The consistency in the face of changing seed values has also confirmed my aversion to staying with rand() for the final version.
It would seem worthwhile adding some additional compiler flags to add things like stack protectors specifically for the unit tests to catch other mistakes like mine.
The test and the coders seem to be working fine. Except that when I increase the *count* of fuzz test run it asserts regular as clockwork on the 950th loop:
base64-test: base64-encode.c:96: _base64_encode_raw: Assertion `in == src' failed.
This needs investigation. Have you run the tests under valgrind? ("make check EMULATOR='$(VALGRIND)'")?
Do you wish me to remove base64_encode_group entirely?
No, not at this time (it's used in one place, in the unfinished openpgp code).
Regards, /Niels
Okay. v3 patch about to follow.
Amos
RFC 4648 (https://tools.ietf.org/html/rfc4648) standardizes two Base-64 alphabets. Nettle currently only supports the traditional base-64 alphabet from section 4.
There is growing use amongst new protocol definitions and extensions, particularly in the HTTP area for the URL-safe extension alphabet instead of the classical Base-64 alphabet.
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
For the standardized alphabets external code simply calls the init() function relevant to the alphabet it is needing to encode/decode with. The library internally uses the context to select which lookup table to use for later base64 function calls.
For custom or non-standard alphabets a pointer to the alphabet lookup table is included in the encode/decode contexts. External code can memset() a context to empty and provide the alphabet lookup table pointer.
Addtionally this patch adds a simple fuzz unit test for both Base-64 and Base-64 URL extended alphabet encoders and decoders.
Amos Jeffries Treehouse Networks Ltd.
PS, I have also removed the dead code wrappend in "#if 0" rather than updating it.
Amos Jeffries squid3@treenet.co.nz writes:
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
Committed now, with some edits. Thanks!
I also added a
struct nettle_armor nettle_base64_url;
Looking at the base64_{en,de}code_ctx, maybe one could micro-optimize them while at it, like changing
struct base64_decode_ctx { unsigned word; unsigned bits; unsigned padding; const signed char *table; };
(typically 16 bytes on 32-bit machines, 24 (incl. alignment padding) on 64-bit machines) to
struct base64_decode_ctx { const signed char *table; unsigned short word; unsigned char bits; unsigned char padding; };
which would reduce the size to 8 and 16 bytes, respectively. At least, I think it's prettier to put the large pointer first in the struct.
Regards, /Niels
On 11/02/2015 10:24 a.m., Niels Möller wrote:
Amos Jeffries writes:
The attached patch implements a proposed API/ABI extension adding support for RFC 4648 section 5 "Base 64 Encoding with URL and Filename Safe Alphabet"
Committed now, with some edits. Thanks!
I also added a
struct nettle_armor nettle_base64_url;
Looking at the base64_{en,de}code_ctx, maybe one could micro-optimize them while at it, like changing
struct base64_decode_ctx { unsigned word; unsigned bits; unsigned padding; const signed char *table; };
(typically 16 bytes on 32-bit machines, 24 (incl. alignment padding) on 64-bit machines) to
struct base64_decode_ctx { const signed char *table; unsigned short word; unsigned char bits; unsigned char padding; };
which would reduce the size to 8 and 16 bytes, respectively. At least, I think it's prettier to put the large pointer first in the struct.
Agreed. I will leave that commit to you though since you seem to already have the patch above.
Amos
Amos Jeffries squid3@treenet.co.nz writes:
I think the base64url decoding functions, and its table, should be put in its own object file. Maybe static linking is obscure these days, but it's desirable that if one uses only the standard base64 alphabet, one shouldn't get dependencies causing the linker to drag in tables for other alphabets.
Are you sure you want that?
Except for the init function and table the remainder of the code is all shared. Separating the tables from the code means we would end up with +4 files just holding lookup tables and have to move the table symbols/pointers into a .h.
The idea is that the tables should be statically defined in the file that use it. (I think you did it like that in the updated patch).
I think this works, but only because the ENCODE macro which refers to the name "encode_table" now gets a local variable, instead of the global table in the old code. At a casual look, the local variable appears unused. I think it's easier to understand if this dependency where made more explicit.
Would you like that done as code documentation, manually unrolling ENCODE, or replacing it with a static function?
I think a simple and clear way is to add the table as an argument to the ENCODE macro.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se