The following test program leads to SIGSEGV in 'get_x86_features' while loading libnettle.so from nettle-3.1 built with --enable-fat:
--8<---------------cut here---------------start------------->8--- #include <stdio.h> #include <stdlib.h> #include <dlfcn.h>
int main (int argc, char *argv[]) { void *handle;
handle = dlopen ("libnettle.so", RTLD_NOW); if (!handle) { fprintf (stderr, "%s\n", dlerror ()); exit (EXIT_FAILURE); } dlclose (handle); exit (EXIT_SUCCESS); } --8<---------------cut here---------------end--------------->8---
Here's the backtrace:
--8<---------------cut here---------------start------------->8--- #0 0x0000000000009006 in ?? () #1 0x00007ffff73f6a37 in get_x86_features (features=<synthetic pointer>) at fat-x86_64.c:94 #2 fat_init () at fat-x86_64.c:133 #3 0x00007ffff7412475 in nettle_memxor_resolve () at fat-x86_64.c:185 #4 0x00007ffff7de8c41 in elf_machine_rela (reloc=0x7ffff73f5170, reloc=0x7ffff73f5170, skip_ifunc=0, reloc_addr_arg=<optimized out>, version=<optimized out>, sym=0x7ffff73ee2d0, map=0x601050) at ../sysdeps/x86_64/dl-machine.h:286 #5 elf_dynamic_do_Rela (skip_ifunc=0, lazy=0, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=0x601050) at do-rel.h:137 #6 _dl_relocate_object (scope=<optimized out>, reloc_mode=reloc_mode@entry=0, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:264 #7 0x00007ffff7defb36 in dl_open_worker (a=a@entry=0x7fffffffcea8) at dl-open.c:418 #8 0x00007ffff7deb704 in _dl_catch_error (objname=objname@entry=0x7fffffffce98, errstring=errstring@entry=0x7fffffffcea0, mallocedp=mallocedp@entry=0x7fffffffce97, operate=operate@entry=0x7ffff7def800 <dl_open_worker>, args=args@entry=0x7fffffffcea8) at dl-error.c:187 #9 0x00007ffff7def2db in _dl_open (file=0x4009f4 "libnettle.so", mode=-2147483646, caller_dlopen=<optimized out>, nsid=-2, argc=1, argv=0x7fffffffd1f8, env=0x7fffffffd208) at dl-open.c:652 #10 0x00007ffff7bd9fab in dlopen_doit (a=a@entry=0x7fffffffd0c0) at dlopen.c:66 #11 0x00007ffff7deb704 in _dl_catch_error (objname=0x7ffff7ddc0f0 <last_result+16>, errstring=0x7ffff7ddc0f8 <last_result+24>, mallocedp=0x7ffff7ddc0e8 <last_result+8>, operate=0x7ffff7bd9f50 <dlopen_doit>, args=0x7fffffffd0c0) at dl-error.c:187 #12 0x00007ffff7bda55d in _dlerror_run (operate=operate@entry=0x7ffff7bd9f50 <dlopen_doit>, args=args@entry=0x7fffffffd0c0) at dlerror.c:163 #13 0x00007ffff7bda041 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87 #14 0x0000000000400924 in main () --8<---------------cut here---------------end--------------->8---
The SIGSEGV happens in the following call:
_nettle_cpuid (0, cpuid_data);
The problem appears to be that the PLT entry for '_nettle_cpuid' has not yet been initialized when 'fat_init' is called via 'nettle_memxor_resolve':
--8<---------------cut here---------------start------------->8--- mhw@jojen:~$ LD_LIBRARY_PATH=$HOME/.guix-profile/lib gdb ./test GNU gdb (GDB) 7.9.1 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./test...(no debugging symbols found)...done. (gdb) directory /home/mhw/nettle-3.1 Source directories searched: /home/mhw/nettle-3.1:$cdir:$cwd (gdb) run Starting program: /home/mhw/test
Program received signal SIGSEGV, Segmentation fault. 0x0000000000009006 in ?? () (gdb) up #1 0x00007ffff73f6a37 in get_x86_features (features=<synthetic pointer>) at fat-x86_64.c:94 warning: Source file is more recent than executable. 94 _nettle_cpuid (0, cpuid_data); (gdb) break 94 Breakpoint 1 at 0x7ffff73f6a1f: file fat-x86_64.c, line 94. (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/mhw/test
Breakpoint 1, fat_init () at fat-x86_64.c:133 warning: Source file is more recent than executable. 133 get_x86_features (&features); (gdb) si 0x00007ffff73f6a24 in get_x86_features (features=<synthetic pointer>) at fat-x86_64.c:94 94 _nettle_cpuid (0, cpuid_data); (gdb) 96 features->vendor = X86_INTEL; (gdb) 94 _nettle_cpuid (0, cpuid_data); (gdb) 95 if (memcmp (cpuid_data + 1, "Genu" "ntel" "ineI", 12) == 0) (gdb) 94 _nettle_cpuid (0, cpuid_data); (gdb) 0x00007ffff73f6000 in _nettle_cpuid@plt () from /home/mhw/.guix-profile/lib/libnettle.so (gdb) disassemble /m 0x00007ffff73f6000 Dump of assembler code for function _nettle_cpuid@plt: => 0x00007ffff73f6000 <+0>: jmpq *0x22c722(%rip) # 0x7ffff7622728 _nettle_cpuid@got.plt 0x00007ffff73f6006 <+6>: pushq $0x26 0x00007ffff73f600b <+11>: jmpq 0x7ffff73f5d90 End of assembler dump. (gdb) disassemble /m 0x7ffff7622728 Dump of assembler code for function _nettle_cpuid@got.plt: 0x00007ffff7622728 <+0>: (bad) 0x00007ffff7622729 <+1>: nop 0x00007ffff762272a <+2>: add %al,(%rax) 0x00007ffff762272c <+4>: add %al,(%rax) 0x00007ffff762272e <+6>: add %al,(%rax) End of assembler dump. (gdb) si 0x0000000000009006 in ?? () (gdb) --8<---------------cut here---------------end--------------->8---
This is a problem for GNU Guix because 'gstreamer' loads its modules using the glib function 'g_module_open' (based on 'dlopen') and the 'gst-libav' module is linked with nettle. This breaks 'gst-libav' on x86_64.
Details ======= System type: x86_64-unknown-linux-gnu nettle-3.1 compiled with: gcc-4.8.4 binutils-2.25 glibc-2.21
Nettle configure flags: CONFIG_SHELL=/gnu/store/wxcgfy43r6lmxhm2m7xk1vsgyddhx3y0-bash-4.3.33/bin/bash SHELL=/gnu/store/wxcgfy43r6lmxhm2m7xk1vsgyddhx3y0-bash-4.3.33/bin/bash --prefix=/gnu/store/k0bhgy2l5kj1hp2vxx3ys42pn4kr599h-nettle-3.1 --enable-fast-install --enable-fat LDFLAGS=-Wl,-rpath=/gnu/store/k0bhgy2l5kj1hp2vxx3ys42pn4kr599h-nettle-3.1/lib
Here's the build log for the nettle used in the above tests:
http://hydra.gnu.org/build/381502/log/raw
If needed, I can explain how to reproduce this exact environment using GNU Guix.
Mark
Mark H Weaver mhw@netris.org writes:
The SIGSEGV happens in the following call:
_nettle_cpuid (0, cpuid_data);
The problem appears to be that the PLT entry for '_nettle_cpuid' has not yet been initialized when 'fat_init' is called via 'nettle_memxor_resolve':
Sounds pretty bad... We really need to fix this in one way or the other. I'm not 100% sure I understand what's going on, but from your gdb session, I think I agree with your analysis.
I havent't seen any documentation explaing precisely what one can and cannot do in an ifunc resolver. Do you know?
Is RTLD_NOW part of the problem (i.e., does it work if you change the test program to use RTLD_LAZY and then call nettle_memxor)? If RTLD_NOW either
1. resolved all normal (i.e., not ifunc) symbols first, before calling the ifunc resolvers, or
2. first initialized the plt entries in the same way as for RTLD_LAZY, and then replace the entries by resolving one symbol at a time.
Some things you could try,
* Undefine HAVE_LINK_IFUNC, falling back to the non-ifunc code.
* Declare _nettle_cpuid as having visibility hidden (then I think the call should not jump via the plt). Might need corresponding pseudo-ops also in x86_64/fat/cpuid.asm, I'm not sure.
We'd really need to ask some glibc guru about the ordering. To me, it seems like a bug if ifunc resolver functions can't call any other functions symbols in the library.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
- Declare _nettle_cpuid as having visibility hidden (then I think the call should not jump via the plt). Might need corresponding pseudo-ops also in x86_64/fat/cpuid.asm, I'm not sure.
I tried this idea first, and that allowed the _nettle_cpuid call to succeed. However, it then crashed in the call to 'memcmp' on the next line, again due to an uninitialized PLT entry.
More later...
Mark
Mark H Weaver mhw@netris.org writes:
I tried this idea first, and that allowed the _nettle_cpuid call to succeed. However, it then crashed in the call to 'memcmp' on the next line, again due to an uninitialized PLT entry.
If we can't even call libc functions, that makes things more difficult... I'll mail libc-help.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Is RTLD_NOW part of the problem (i.e., does it work if you change the test program to use RTLD_LAZY and then call nettle_memxor)?
I haven't tried this, because in order to solve my original problem (gstreamer is unable to load the gst-libav plugin in GNU Guix on x86_64) this workaround would have to be applied to gstreamer to work around a problem in nettle. I'd rather fix the problem in nettle itself.
- Undefine HAVE_LINK_IFUNC, falling back to the non-ifunc code.
This is what I ended up doing for GNU Guix, and it has solved all of the problems for me.
Thanks! Mark
nettle-bugs@lists.lysator.liu.se