Subtle ABI problem with nettle_hashes

21 Dec 2016


      The idea of the interface
/* null-terminated list of digests implemented by this version of nettle */
  extern const struct nettle_hash * const nettle_hashes[];
was that the size of the array isn't part of the ABI; a new Nettle version
should be able to extend it with more entries, without breaking the ABI.
However, I've recently learnt that it does break, in a subtle way. (See
https://bugs.gentoo.org/show_bug.cgi?id=601512 and
http://trofi.github.io/posts/195-dynamic-linking-ABI-is-hard.html).
The problematic case is a traditional non-PIC executable linking with a
nettle shared library; in this post, I care exclusively about this case.
All symbol references in the executable are resolved at link time, and
at load time it is mapped at a fixed address (traditionally 0, I don't
know if current systems skip the first page), without any relocations.
Now, if the executable links with libnettle.so, and contains a reference
to the symbol nettle_hashes, the address where nettle.so is going to be
mapped isn't known at link time? So how can the executable refer to it
without load-time relocation of its references? The linker solves that
problem using the curious relocation type R_X86_64_COPY.
What the linker does is that it allocates space for a copy of the data
in the BSS segment of the executable, and resolve all references to
point to that copy. At the time libnettle.so is loaded, the list in the
library is copied (after relocating it, but that's a minor complication
in this context) to the space in the BSS. I imagine the dynamic linker
also adjusts the pointers in libnettle.so's GOT table to refer to the
copy rather than the original.
Now the problem is that the allocation in the BSS segment, as well as
the copying operation, are based on the size of the data object as
recorded in the version of libnettle.so available at the time the
executable was linked.
If the array size is larger in the version of libnettle.so actally
loaded, the copy operation is truncates it, which is particularly bad
when it's NULL-terminated. So the array size, which was intended to not
be part of the ABI, creeps into the ABI.
So what to do about this? We have to break the ABI, but I'd prefer if we
keep the API unchanged. Some alternatives:
1. Define nettle_hashesp as a constant pointer to the current
   nettle_hashes list, and
#define nettle_hashes (*nettle_hashesp)
Then nettle_hashesp will still get a R_X86_64_COPY relocation, but now
   the size is always a single pointer, regardless of the array size. At
   load time, it will be set to point directly to the list in the
   data segment of the loaded libnettle.so.
2. Define a function get_nettle_hashes returning a pointer to the list,
   and
#define nettle_hashes (get_nettle_hashes())
In this case, the indirection is via a PLT entry in the executable.
3. Define the array with a size explicitly part of the ABI,
extern const struct nettle_hash * const nettle_hashes[17];
Add a some reasonable number of reserved NULL entries at the end, and
   make an ABI break whenever we run out of reserved places and have to
   increase the size.
We also have other public data, e.g., individual nettle_hash structs,
like
extern const struct nettle_hash nettle_sha256;
These will also get a R_X86_64_COPY relocation if referenced (and all
internal references within libnettle.so will be relocated to the copy in
the executable's BSS, I guess). But that's less of a problem, since the
size and layout is already part of the ABI.
More problematic are the objects declared in ecc-curve.h; the size and
layout of struct ecc_curve was intended to be an implementation detail,
not part of the ABI, but isn't, when R_X86_64_COPY is involved.
Advice appreciated. I'd also like to hear if anyone on the list knows
how these things work with windows dlls. I've read sometime that
exporting data (in contrast to functions) from a dll is extra tricky,
but I don't remember any details. If we make any changes to fix the
exported-data issues with ELF, it would be good if we could ensure that
we solve any related problems for libnettle.dll too.
I understand why R_X86_64_COPY is needed, but it works in a way that was
pretty counter-intutive to me. The effect is, more or less, that the
library's data is *statically* linked into the executable. And then
initialized at load time based on the contents of the loaded library.
And we then get a mix of statically linked and dynamically linked parts
which might originate in different versions of the library.
It would be prettier if we could force the executable to always access
library data via GOT (like it works for PIC code), and never use
R_X86_64_COPY. But I guess that has to be known at code generation time,
and likely too late to fixup at link time, which is when we know which
external data objects are defined by some shared library, and which
aren't. But if anyone knows how to fix the issue in this way, I'd be
delighted.
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Subtle ABI problem with nettle_hashes