Michael Weiser michael.weiser@gmx.de writes:
sorry for the delay - I've been on vacation.
No problem. If you can test and debug arm big-endian, that's apprecated.
We still have the ARM BE CI ready to go. Is it maybe time to get it activated on GitLab? I've put it in an MR for reference (https://git.lysator.liu.se/nettle/nettle/-/merge_requests/8) but can also submit via the list once we've decided where to put the container images for good. I'd still vote for GnuTLS's build-images, as the others.
If the gnutls people are willing to host it, that would be nice. Do you think that can happen soon? Otherwise, I'd be happy to merge as is.
I think Nikos wrote a while back that he's less active in gnutls, so I'm not sure who we'd need to coordinate with. (And I haven't followed all the details in how you generate the buildroot images).
BTW, I've noticed that the debian qemu-user package does include qemu-armeb, but still no packaged armeb cross compiler, as far as I'm aware.
master indeed fails:
https://gitlab.com/michaelweiser/nettle/-/jobs/648334928
libnettle: cpu features: arch:6,neon libnettle: enabling armv6 code. libnettle: enabling neon code. Assert failed: testutils.c:831: MEMEQ(length, data, ciphertext->data) qemu: uncaught target signal 6 (Aborted) - core dumped Aborted (core dumped) FAIL: chacha-poly1305
Is this about what you've expected? Then I'll look into it.
I expect anything calling the new functions _chacha_3core and _salsa_2core to fail. Easiest way to debug and fix is to run the test cases salsa-20-test and chacha-test, they're exercised by the functions test_chacha_core and test_salsa20_core I added to the tests recently.
Those tests have the advantage that they set the input to 0,1,2,...,15 (except one counter word is set to 0xffffffff, to test carry propagation), so it should be fairly easy to follow the permutations at the top of the functions. At least, I've found that very helpful when debugging the most recent neon and x86 code.
_chacha_3core interleaves three blocks with 4 separate state registers for each block, so big-endian fixes should be very similar to what you've done for _chacha_core (which I believe is still in working shape). _salsa20_2core, on the other hand, uses a bit different register allocation, each register holding corresponding words from two input blocks.
Any other branches I should try?
The new code has been pushed to master, so that's the most relevant branch.
Regards, /Niels