On Sun, May 9, 2021 at 9:49 PM Niels Möller nisse@lysator.liu.se wrote:
This seems to confirm that cbc encrypt is the operation that gains the most from assembly for the combined operation. That aes decrypt can also gain a factor two in performance, does that mean that both aes-cbc and memxor run at speed limited by memory bandwidth? And then the gain is from one less pass loading and storing data from memory?
I can't think of another reason.
What unit is "cbp"?
Yes, Cycles per byte. I spelled it wrong in the last message.
If it's cycles per byte, 0.77 cycles/byte for memxor
(the cost of "Basic AES-Accelerator with memxor" minus cost of CBC-Accellerator) sounds unexpectedly slow, compared to, e.g, x86_64, where I get 0.08 cycles per byte (regardless of alignment), or 0.64 cycles per 64-bit word.
I'm calculating cycles per byte as follows: Frequency/(Buf_size/Elapsed_time); Units are Hz, Byte, Second respectively. I measured the cycles per byte for memxor on z15 I got: 2.8 cpb for C implementation 0.9 cpb for optimized memxor If my calculation is correct, then accessing memory in z/architecture processors in a quit expensive comparing to other architectures.
regards, Mamone