On Mon, Jul 20, 2020 at 8:41 PM Niels Möller nisse@lysator.liu.se wrote:
Latency less than one cycle sounds wrong. Usually, simple ALU
instructions like xor has a latency of exactly one cycle (i.e., when an instruction starts executing (all inputs are available), the result is available for depending instructions exactly one cycle later). While deeply pipelined instructions, e.g., multiplication, can have a latency of several cycles but still a throughput of one or a few instructions per cycle.
I had the same concern, I measured the clock time from the start of the instruction execution until the start of the next dependent instruction. I'm sure about the latency numbers but not sure how to subtend them with cycle numbers.
I take it "P8" in the path is for power 8? Are the crypto extensions
always available for power 8? If not, directory should be named differently.
Yes, it stands for POWER8, it's the minimal processor that supports the crypto extensions, sticking crypto extensions with POWER8 is fine.
To get going, I've merged this and the machine.m4 patch to a development
branch. I'd like to do things stepwise, first do the minimal configure changes to get AES working (and maybe with default on, to get it exercised by the .gitlab-ci machinery), then add ghash and fat builds (not sure in which order). I wanted to also merge the README patch right away, but that failed due to line breaks in the email.
Great, I will reupload the README file without incompatible line breaks.
BTW, about fat tests, I'm considering adding a make target "check-fat"
which will run make check with some different settings of NETTLE_FAT_OVERRIDE (platform specific, and determined by configure).
I can help implementing this feature if you give me more details on how to go with it.
Regards, Mamone