The idea of the interface
/* null-terminated list of digests implemented by this version of nettle */ extern const struct nettle_hash * const nettle_hashes[];
was that the size of the array isn't part of the ABI; a new Nettle version should be able to extend it with more entries, without breaking the ABI.
However, I've recently learnt that it does break, in a subtle way. (See https://bugs.gentoo.org/show_bug.cgi?id=601512 and http://trofi.github.io/posts/195-dynamic-linking-ABI-is-hard.html).
The problematic case is a traditional non-PIC executable linking with a nettle shared library; in this post, I care exclusively about this case.
All symbol references in the executable are resolved at link time, and at load time it is mapped at a fixed address (traditionally 0, I don't know if current systems skip the first page), without any relocations.
Now, if the executable links with libnettle.so, and contains a reference to the symbol nettle_hashes, the address where nettle.so is going to be mapped isn't known at link time? So how can the executable refer to it without load-time relocation of its references? The linker solves that problem using the curious relocation type R_X86_64_COPY.
What the linker does is that it allocates space for a copy of the data in the BSS segment of the executable, and resolve all references to point to that copy. At the time libnettle.so is loaded, the list in the library is copied (after relocating it, but that's a minor complication in this context) to the space in the BSS. I imagine the dynamic linker also adjusts the pointers in libnettle.so's GOT table to refer to the copy rather than the original.
Now the problem is that the allocation in the BSS segment, as well as the copying operation, are based on the size of the data object as recorded in the version of libnettle.so available at the time the executable was linked.
If the array size is larger in the version of libnettle.so actally loaded, the copy operation is truncates it, which is particularly bad when it's NULL-terminated. So the array size, which was intended to not be part of the ABI, creeps into the ABI.
So what to do about this? We have to break the ABI, but I'd prefer if we keep the API unchanged. Some alternatives:
1. Define nettle_hashesp as a constant pointer to the current nettle_hashes list, and
#define nettle_hashes (*nettle_hashesp)
Then nettle_hashesp will still get a R_X86_64_COPY relocation, but now the size is always a single pointer, regardless of the array size. At load time, it will be set to point directly to the list in the data segment of the loaded libnettle.so.
2. Define a function get_nettle_hashes returning a pointer to the list, and
#define nettle_hashes (get_nettle_hashes())
In this case, the indirection is via a PLT entry in the executable.
3. Define the array with a size explicitly part of the ABI,
extern const struct nettle_hash * const nettle_hashes[17];
Add a some reasonable number of reserved NULL entries at the end, and make an ABI break whenever we run out of reserved places and have to increase the size.
We also have other public data, e.g., individual nettle_hash structs, like
extern const struct nettle_hash nettle_sha256;
These will also get a R_X86_64_COPY relocation if referenced (and all internal references within libnettle.so will be relocated to the copy in the executable's BSS, I guess). But that's less of a problem, since the size and layout is already part of the ABI.
More problematic are the objects declared in ecc-curve.h; the size and layout of struct ecc_curve was intended to be an implementation detail, not part of the ABI, but isn't, when R_X86_64_COPY is involved.
Advice appreciated. I'd also like to hear if anyone on the list knows how these things work with windows dlls. I've read sometime that exporting data (in contrast to functions) from a dll is extra tricky, but I don't remember any details. If we make any changes to fix the exported-data issues with ELF, it would be good if we could ensure that we solve any related problems for libnettle.dll too.
I understand why R_X86_64_COPY is needed, but it works in a way that was pretty counter-intutive to me. The effect is, more or less, that the library's data is *statically* linked into the executable. And then initialized at load time based on the contents of the loaded library. And we then get a mix of statically linked and dynamically linked parts which might originate in different versions of the library.
It would be prettier if we could force the executable to always access library data via GOT (like it works for PIC code), and never use R_X86_64_COPY. But I guess that has to be known at code generation time, and likely too late to fixup at link time, which is when we know which external data objects are defined by some shared library, and which aren't. But if anyone knows how to fix the issue in this way, I'd be delighted.
Regards, /Niels