On Wed, Nov 25, 2020 at 10:15 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
Let's leave that as is, then. Do you want to make another pull request with only the fixes for register usage?
Sure. I updated the pull request.
I was thinking of something similar to how the unaligned input is handled in arm/v6/sha1-compress.asm. And then, to handle leftovers at the end, one would need to compare leftover size with the alignment related address bits, to decide whether or not to load one more word. But perhaps only worth the effort if there's a performance advantage in avoiding unaligned loads also in the main loop.
Yes, it makes sense to avoid unaligned loads in the main loop by checking low-order bits of address, but still I can't imagine it would be more simple in this case. Allocating stack buffers used very often along the lifespan of process and I think it's ok to be used for this purpose.
regards, Mamone