Hi all,
On Sat, Feb 22, 2020 at 07:43:18PM +0100, Michael Weiser wrote:
- Eliminate use of rev in the armbe code.
... I've been looking at the revs and they now strike me as taking the
easy way out anyway. They work around the implicit LE order in which
Updated code is now at
https://git.lysator.liu.se/michaelweiser/nettle/-/tree/arm-memxor-generic
and inline below for comments.
It now compiles and runs the testsuite fine on my native armv7veb when
configurd with:
CFLAGS="-march=armv6" LDFLAGS="-march=armv6" \
../configure --disable-documentation \
--host=armv6b-unknown-linux-gnueabihf
[...]
Assembly files: arm/neon arm/v6 arm
and:
CFLAGS="-march=armv5te" LDFLAGS="-march=armv5te -Wl,--be8" \
../configure --disable-documentation \
--host=armv5b-unknown-linux-gnueabihf
[...]
Assembly files: arm
LDFLAGS "-Wl,--be8" is necessary for armv5teb to work on my system
because it is BE8 which the gcc linker driver defaults to when run with
-march=armv6 but not for armv5 which causes the resuling binaries to be
BE32 and segfault or bus error in ld-linux.so.3 on startup. For a
(likely wrong) explanation of BE8 vs. BE32 see
https://lists.lysator.liu.se/pipermail/nettle-bugs/2018/007059.html.
A quick check can be done with file:
$ echo "int main(void) {}" > t.c
$ gcc -march=armv5te -o t t.c
$ ./t
Segmentation fault
$ file t
t: ELF 32-bit MSB shared object, ARM, EABI5 version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux
3.2.0, not stripped
$ gcc -march=armv5te -Wl,--be8 -o t t.c
$ ./t
$ file t
t: ELF 32-bit MSB shared object, ARM, EABI5 BE8 version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux
3.2.0, not stripped
The qemu environment is churning along in compilation currently.
Previous assembler error for reference:
$ make -j4
[...]
/usr/bin/m4 ../asm.m4 machine.m4 config.m4 memxor.asm >memxor.s
/usr/bin/m4 ../asm.m4 machine.m4 config.m4 memxor3.asm >memxor3.s
gcc -I. -DHAVE_CONFIG_H -march=armv5te -ggdb3 -Wall -W
-Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes
-Wpointer-arith -Wbad-function-cast -Wnested-externs -fpic -MT memxor.o
-MD -MP -MF memxor.o.d -c memxor.s
gcc -I. -DHAVE_CONFIG_H -march=armv5te -ggdb3 -Wall -W
-Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes
-Wpointer-arith -Wbad-function-cast -Wnested-externs -fpic -MT memxor3.o
-MD -MP -MF memxor3.o.d -c memxor3.s
memxor.s: memxor3.s: Assembler messages:
memxor3.s:146: Error: selected processor does not support `rev r4,r4' in ARM mode
Assembler messages:
memxor3.s:256: Error: selected processor does not support `rev r4,r4' in ARM mode
memxor.s:126: Error: selected processor does not support `rev r3,r3' in ARM mode
--
Thanks,
Michael
>From 3e2118d41472842c368bb5bb56d71023b861b59d Mon Sep 17 00:00:00 2001
From: Michael Weiser
michael.weiser@gmx.de
Date: Sun, 23 Feb 2020 15:22:51 +0100
Subject: [PATCH] arm: Fix memxor for non-armv6+ big-endian systems
ARM assembly adjustments for big-endian systems contained armv6+-only
instructions (rev) in generic arm memxor code. Replace those with an
actual conversion of the leftover byte store routines for big-endian
systems. This also provides a slight optimisation by removing the
additional instruction as well as increased symmetry between little- and
big-endian implementations.
Signed-off-by: Michael Weiser
michael.weiser@gmx.de
---
arm/memxor.asm | 12 ++++++------
arm/memxor3.asm | 27 ++++++++++++++-------------
2 files changed, 20 insertions(+), 19 deletions(-)
diff --git a/arm/memxor.asm b/arm/memxor.asm
index 239a4034..b802e95c 100644
--- a/arm/memxor.asm
+++ b/arm/memxor.asm
@@ -138,24 +138,24 @@ PROLOGUE(nettle_memxor)
adds N, #8
beq .Lmemxor_odd_done
- C We have TNC/8 left-over bytes in r4, high end
+ C We have TNC/8 left-over bytes in r4, (since working upwards) low
+ C end on LE and high end on BE
S0ADJ r4, CNT
ldr r3, [DST]
eor r3, r4
- C memxor_leftover does an LSB store
- C so we need to reverse if actually BE
-IF_BE(< rev r3, r3>)
-
pop {r4,r5,r6}
C Store bytes, one by one.
.Lmemxor_leftover:
+ C bring uppermost byte down for saving while preserving lower ones
+IF_BE(< ror r3, #24>)
strb r3, [DST], #+1
subs N, #1
beq .Lmemxor_done
subs TNC, #8
- lsr r3, #8
+ C bring down next byte, no need to preserve
+IF_LE(< lsr r3, #8>)
bne .Lmemxor_leftover
b .Lmemxor_bytes
.Lmemxor_odd_done:
diff --git a/arm/memxor3.asm b/arm/memxor3.asm
index 69598e1c..76b8aae6 100644
--- a/arm/memxor3.asm
+++ b/arm/memxor3.asm
@@ -159,21 +159,21 @@ PROLOGUE(nettle_memxor3)
adds N, #8
beq .Lmemxor3_done
- C Leftover bytes in r4, low end
+ C Leftover bytes in r4, (since working downwards) in high end on LE and
+ C low end on BE
ldr r5, [AP, #-4]
eor r4, r5, r4, S1ADJ ATNC
- C leftover does an LSB store
- C so we need to reverse if actually BE
-IF_BE(< rev r4, r4>)
-
.Lmemxor3_au_leftover:
C Store a byte at a time
- ror r4, #24
+ C bring uppermost byte down for saving while preserving lower ones
+IF_LE(< ror r4, #24>)
strb r4, [DST, #-1]!
subs N, #1
beq .Lmemxor3_done
subs ACNT, #8
+ C bring down next byte, no need to preserve
+IF_BE(< lsr r4, #8>)
sub AP, #1
bne .Lmemxor3_au_leftover
b .Lmemxor3_bytes
@@ -273,18 +273,19 @@ IF_BE(< rev r4, r4>)
adds N, #8
beq .Lmemxor3_done
- C leftover does an LSB store
- C so we need to reverse if actually BE
-IF_BE(< rev r4, r4>)
-
- C Leftover bytes in a4, low end
- ror r4, ACNT
+ C Leftover bytes in r4, (since working downwards) in high end on LE and
+ C low end on BE after preparatory alignment correction
+IF_LE(< ror r4, ACNT>)
+IF_BE(< ror r4, ATNC>)
.Lmemxor3_uu_leftover:
- ror r4, #24
+ C bring uppermost byte down for saving while preserving lower ones
+IF_LE(< ror r4, #24>)
strb r4, [DST, #-1]!
subs N, #1
beq .Lmemxor3_done
subs ACNT, #8
+ C bring down next byte, no need to preserve
+IF_BE(< lsr r4, #8>)
bne .Lmemxor3_uu_leftover
b .Lmemxor3_bytes
--
2.25.0