In pikefarm, the devel2 host has gcc-6 and gcc-7 (Debian 7.2.0-19); by default it picks gcc-7 and then fails. Manually compiling with gcc-6 works, I think I already tried gcc-7 using -O1, and that also works.
Still trying to investigate why exactly. The only visible odd warning during compilation is this one:
Compiling backend.c In file included from /var/src/roxen/81pike/src/array.h:10:0, from /var/src/roxen/81pike/src/callback.h:10, from /var/src/roxen/81pike/src/backend.h:11, from /var/src/roxen/81pike/src/backend.cmod:13: /var/src/roxen/81pike/src/backend.cmod: In function ‘f_Backend_get_stats’: /var/src/roxen/81pike/src/svalue.h:753:13: warning: iteration 9223372036854775807 invokes undefined behavior [-Waggressive-loop-optimizations] while(num_--) \ ~~~~^ /var/src/roxen/81pike/src/interpret.h:265:6: note: in expansion of macro ‘free_mixed_svalues’ free_mixed_svalues(_sp_, x_); \ ^~~~~~~~~~~~~~~~~~ /var/src/roxen/81pike/src/interpret.h:279:7: note: in expansion of macro ‘pop_n_elems’ pop_n_elems (x2_ - 1); \ ^~~~~~~~~~~ /var/src/roxen/81pike/src/interpret.h:283:39: note: in expansion of macro ‘stack_unlink’ #define stack_pop_n_elems_keep_top(X) stack_unlink(X) ^~~~~~~~~~~~ /var/src/roxen/81pike/src/backend.cmod:947:1: note: in expansion of macro ‘stack_pop_n_elems_keep_top’ stack_pop_n_elems_keep_top(args); ^ ~~~~~~~~~~~~~~~~~~~~~~ /var/src/roxen/81pike/src/svalue.h:753:8: note: within this loop while(num_--) \ ^ /var/src/roxen/81pike/src/interpret.h:265:6: note: in expansion of macro ‘free_mixed_svalues’ free_mixed_svalues(_sp_, x_); \ ^~~~~~~~~~~~~~~~~~ /var/src/roxen/81pike/src/interpret.h:279:7: note: in expansion of macro ‘pop_n_elems’ pop_n_elems (x2_ - 1); \ ^~~~~~~~~~~ /var/src/roxen/81pike/src/interpret.h:283:39: note: in expansion of macro ‘stack_unlink’ #define stack_pop_n_elems_keep_top(X) stack_unlink(X) ^~~~~~~~~~~~ /var/src/roxen/81pike/src/backend.cmod:947:1: note: in expansion of macro ‘stack_pop_n_elems_keep_top’ stack_pop_n_elems_keep_top(args); ^ ~~~~~~~~~~~~~~~~~~~~~~
Stephen R. van den Berg wrote:
In pikefarm, the devel2 host has gcc-6 and gcc-7 (Debian 7.2.0-19); by default it picks gcc-7 and then fails. Manually compiling with gcc-6 works, I think I already tried gcc-7 using -O1, and that also works.
Still trying to investigate why exactly.
The culprit appears to be pike_memory.c
Stephen R. van den Berg wrote:
Stephen R. van den Berg wrote:
In pikefarm, the devel2 host has gcc-6 and gcc-7 (Debian 7.2.0-19); by default it picks gcc-7 and then fails. Manually compiling with gcc-6 works, I think I already tried gcc-7 using -O1, and that also works.
Still trying to investigate why exactly.
The culprit appears to be pike_memory.c
More specifically, in the gcc-7 case, it uses an xmm0 register to copy things in reorder() in the 16-byte special case.
I'm not quite sure why this goes wrong. Most likely explanations: a. Somehow the rest of the system does not expect xmm0 to be clobbered. b. The gcc-7 compiler gets some of the address calculations wrong.
Looking at the assembly code, I decode the following:
gcc-6 --- pike_memory6.s 2018-02-01 13:39:43.825976318 +0100
; case 16: ; B16_T *from = (B16_T *) memory; ; B16_T *to = (B16_T *) tmp;
- leal -1(%r13), %eax - leaq 4(,%rax,4), %rcx ; nitems = 8 + nitems * 4 ?? - xorl %eax, %eax ; e = 0
; for(e=0;e<nitems;e++) ; Register assignment %rbx : order .L173: - movslq (%rbx,%rax), %rdx ; %rdx = order[e] salq $4, %rdx ; %rdx *= 16
; to[e]=from[order[e]]; ; Register assignment %r14 : from ; Register assignment %r12 : to
- movq (%r14,%rdx), %rsi ; %rsi = from[%rdx] - movq 8(%r14,%rdx), %rdi ; %rdi = from[%rdx + 8] - movq %rsi, (%r12,%rax,4) ; to[e*4] = %rsi - movq %rdi, 8(%r12,%rax,4) ; to[e*4+8] = %rdi
- addq $4, %rax ; e += 4 cmpq %rax, %rcx ; while e != nitems jne .L173
gcc-7 +++ pike_memory7.s 2018-02-01 13:40:11.505289854 +0100
; case 16: ; B16_T *from = (B16_T *) memory; ; B16_T *to = (B16_T *) tmp;
+ leal -1(%r13), %edx + salq $4, %rdx ; * 16 + leaq 16(%rax,%rdx), %rcx ; nitems = 16 + nitems * 4 ??
; for(e=0;e<nitems;e++) ; Register assignment %rbx : order .L173: + movslq (%rbx), %rdx ; %rdx = *order + addq $16, %rax ; to += 16 + addq $4, %rbx ; order += 4 salq $4, %rdx ; %rdx *= 16
; to[e]=from[order[e]]; ; Register assignment %r14 : from ; Register assignment %r12 : to
+ movdqa (%r14,%rdx), %xmm0 ; %xmm0 = from[%rdx] + movaps %xmm0, -16(%rax) ; to[-16] = %xmm0 cmpq %rax, %rcx ; while to != to_end jne .L173
I'm not entirely sure about the instructions annotated with "??". My x86 assembly-fu is not perfect.
Anybody any ideas?
Martin Nilsson (Coppermist) @ Pike (-) developers forum wrote:
What is B16_T defined to?
On both gcc-6 and gcc-7 B16_T expands to __int128. Case in point:
case 16: { __int128 *from=(__int128 *) memory; __int128 *to=(__int128 *) tmp; for(e=0;e<nitems;e++) to[e]=from[order[e]]; break; }
Identical for both compilers.
Stephen R. van den Berg wrote:
Martin Nilsson (Coppermist) @ Pike (-) developers forum wrote:
What is B16_T defined to?
On both gcc-6 and gcc-7 B16_T expands to __int128. Case in point:
case 16: { __int128 *from=(__int128 *) memory; __int128 *to=(__int128 *) tmp; for(e=0;e<nitems;e++) to[e]=from[order[e]]; break; }
The plot thickens. If I disable __int128, and let it go for struct b16_t_s { B8_T x,y; }; instead, everything stays the same, except the following code changes in reorder():
--- pike_memory7int128.s 2018-02-01 16:29:58.093322706 +0100 +++ pike_memory7struct.s 2018-02-01 16:30:15.302896717 +0100 @@ -64,8 +64,8 @@ addq $16, %rax addq $4, %r15 salq $4, %rdx - movdqa 0(%r13,%rdx), %xmm0 - movaps %xmm0, -16(%rax) + movdqu 0(%r13,%rdx), %xmm0 + movups %xmm0, -16(%rax) .LVL175: cmpq %rcx, %rax jne .L172
But this time, everything works (gcc-7 -O2, both cases).
a=arithmetic, u=unsigned? __int128 sounds like it would be a signed type. Doet it work with "unsigned __int128"?
Ah, no, it's "aligned" and "unaligned". The cast is breaking the alignment rules of __int128 without telling the compiler.
Stephen R. van den Berg wrote:
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Ah, no, it's "aligned" and "unaligned". The cast is breaking the alignment rules of __int128 without telling the compiler.
Yes, already figured that. Currently working on a workaround.
The workaround is in.
It is a bit of a kludge-solution. The proper solution would involve checking compiler alignment expectations in configure, I guess.
pike-devel@lists.lysator.liu.se