I just pushed a few changes to master that get the build working (on 64-bit Debian Linux, no other platforms tested). They're mostly small changes and I hope they won't break anything. This actually gets "make doc" happy again after a good while, so it would be really awesome if I haven't just broken something unrelated :)
There's another, more significant, change that happened in the past few months. It's not a simple fix. The default behaviour of Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP, changed from "only look up A records" to "look up AAAA and A records, and return the first AAAA if found, otherwise the first A". Unfortunately everything is built to assume that there's at most one IP address for that host name.
The ultimate solution would be to have an asynchronous "establish connection" call, which will do an async DNS lookup, then attempt to connect to each IP in turn until one succeeds. That would require some other changes, and I'm not sure how it would interact with HTTP keep-alive, for instance. As a lesser solution, would it be reasonable to have a method like Protocols.DNS.prefer_ipv4() to invert the return order of DNS records?
ChrisA
On Wed, Jun 9, 2021 at 3:48 PM Chris Angelico rosuav@gmail.com wrote:
There's another, more significant, change that happened in the past few months. It's not a simple fix. The default behaviour of Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP, changed from "only look up A records" to "look up AAAA and A records, and return the first AAAA if found, otherwise the first A". Unfortunately everything is built to assume that there's at most one IP address for that host name.
Sounds like a good enough reason to revert that change to me...
On Thu, Jun 10, 2021 at 1:41 AM Tobias S. Josefowitz t.josefowitz@gmail.com wrote:
On Wed, Jun 9, 2021 at 3:48 PM Chris Angelico rosuav@gmail.com wrote:
There's another, more significant, change that happened in the past few months. It's not a simple fix. The default behaviour of Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP, changed from "only look up A records" to "look up AAAA and A records, and return the first AAAA if found, otherwise the first A". Unfortunately everything is built to assume that there's at most one IP address for that host name.
Sounds like a good enough reason to revert that change to me...
Needn't revert completely. If it's decided that the preference should continue to be A, then it's just a matter of reversing the order of returned values - a simple matter of switching two numbers.
ChrisA
Chris Angelico wrote:
few months. It's not a simple fix. The default behaviour of Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP, changed from "only look up A records" to "look up AAAA and A records, and return the first AAAA if found, otherwise the first A".
In accordance to the rule that if a connection over IPv6 is possible, it should be preferred.
Unfortunately everything is built to assume that there's at most one IP address for that host name.
I'd say, the way forward should be changing the connecting functions to iterate through the IP addresses and find the first that actually connects.
Needn't revert completely. If it's decided that the preference should continue to be A, then it's just a matter of reversing the order of returned values - a simple matter of switching two numbers.
That would be an acceptable workaround until the connection code is fixed, I think.
On Wed, Jun 9, 2021 at 10:49 PM Stephen R. van den Berg srb@cuci.nl wrote:
I'd say, the way forward should be changing the connecting functions to iterate through the IP addresses and find the first that actually connects.
Needn't revert completely. If it's decided that the preference should continue to be A, then it's just a matter of reversing the order of returned values - a simple matter of switching two numbers.
That would be an acceptable workaround until the connection code is fixed, I think.
Maybe I am the one missing something here, but even if "the connection code" were to learn to try to establish connections to all IPv4+IPv6 addresses resolved for a hostname, it could not do that on top of Protocols.DNS.async_host_to_ip, as that returns a single result. Changing it to result in something other than a single result will break every single user of Protocols.DNS.async_host_to_ip.
As Protocols.DNS.async_host_to_ip is thus constrained to providing a single response, changing it to respond with an IPv6 IP preferred at this stage of IPv6 deployment is breaking more than it fixes. I would assume there are still much more IPv4-only than IPv6 hosts in the world, and the amount of IPv6-only hosts will be severely limited - at least some 6to4 translation should normally be available. Responding with IPv6 addresses first, esp. without applying any heuristics whatsoever to determine if a system might be IPv6-enabled, can thus only be seen as a severe regression.
Responding with an IPv6-address if no IPv4-address can be resolved might not be a regression, but is neither terribly consistent or elegant, so it might be best to not do that either.
All in all, I think the solution - beyond restoring useful and expected behaviour of Protocols.DNS.async_host_to_ip - would be to introduce a more suitable API - if necessary - and to start using that instead.
On Thu, Jun 10, 2021 at 8:21 AM Tobias S. Josefowitz t.josefowitz@gmail.com wrote:
On Wed, Jun 9, 2021 at 10:49 PM Stephen R. van den Berg srb@cuci.nl wrote:
I'd say, the way forward should be changing the connecting functions to iterate through the IP addresses and find the first that actually connects.
Needn't revert completely. If it's decided that the preference should continue to be A, then it's just a matter of reversing the order of returned values - a simple matter of switching two numbers.
That would be an acceptable workaround until the connection code is fixed, I think.
Maybe I am the one missing something here, but even if "the connection code" were to learn to try to establish connections to all IPv4+IPv6 addresses resolved for a hostname, it could not do that on top of Protocols.DNS.async_host_to_ip, as that returns a single result.
True, but if you call Protocols.HTTP.get_url(), the way that it does its DNS lookups is an implementation detail.
Changing it to result in something other than a single result will break every single user of Protocols.DNS.async_host_to_ip.
As Protocols.DNS.async_host_to_ip is thus constrained to providing a single response, changing it to respond with an IPv6 IP preferred at this stage of IPv6 deployment is breaking more than it fixes. I would assume there are still much more IPv4-only than IPv6 hosts in the world, and the amount of IPv6-only hosts will be severely limited - at least some 6to4 translation should normally be available. Responding with IPv6 addresses first, esp. without applying any heuristics whatsoever to determine if a system might be IPv6-enabled, can thus only be seen as a severe regression.
Responding with an IPv6-address if no IPv4-address can be resolved might not be a regression, but is neither terribly consistent or elegant, so it might be best to not do that either.
All in all, I think the solution - beyond restoring useful and expected behaviour of Protocols.DNS.async_host_to_ip - would be to introduce a more suitable API - if necessary - and to start using that instead.
Okay. Here's a proposal:
1) Have a Protocols.DNS.prefer_ipv6() that chooses whether IPv6 is preferred over IPv4 for simple lookups. Calling prefer_ipv6(1) will result in the current behaviour, calling prefer_ipv6(0) will return to previous behaviour. This just changes the order of responses, nothing else.
2) Create host_to_ips() which returns (or in the case of a promise, yields) an array with all of the results. By definition, host_to_ip() == host_to_ips()[0] or "".
3) Change Protocols.HTTP.Query() to use async_host_to_ips(). This will change its hostname_cache to carry arrays instead of strings - is that used externally? I found one reference in Protocols.HTTP.Session and one in Web.Crawler, but they're just synchronizing caches, and they don't care what's actually stored in it.
4) Possibly change Protocols.HTTP.dns_lookup to use Protocols.DNS for consistency?? Currently that one uses gethostbyname, so there's a notable behavioural difference between sync_request and thread_request (which use dns_lookup and gethostbyname) and async_request (which uses dns_lookup_async and Protocols.DNS). Otherwise, just tweak dns_lookup to store an array in the cache.
5) Add another phase to asynchronous HTTP connection after obtaining DNS results: it maintains an array of IPs and attempts to connect to them sequentially. The success callback will be the same, the failure callback will attempt the next IP.
I'm part-way inclined to start with #4, since there's a weird inconsistency there already. If you mix and match sync_request/thread_request and async_request, the behaviour may depend on which one populates the cache.
Is this worth working on?
ChrisA
Okay. Here's a proposal:
- Have a Protocols.DNS.prefer_ipv6() that chooses whether IPv6 is
preferred over IPv4 for simple lookups. Calling prefer_ipv6(1) will result in the current behaviour, calling prefer_ipv6(0) will return to previous behaviour. This just changes the order of responses, nothing else.
No, add new methods for new behavior. The old function continues to return IPv4 and the new one does IPv6+IPv4. Or have some new parameter to control preference on a per call basis. We don't want global states in modules.
On Thu, Jun 10, 2021 at 6:49 AM Stephen R. van den Berg srb@cuci.nl wrote:
Chris Angelico wrote:
few months. It's not a simple fix. The default behaviour of Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP, changed from "only look up A records" to "look up AAAA and A records, and return the first AAAA if found, otherwise the first A".
In accordance to the rule that if a connection over IPv6 is possible, it should be preferred.
Unfortunately everything is built to assume that there's at most one IP address for that host name.
I'd say, the way forward should be changing the connecting functions to iterate through the IP addresses and find the first that actually connects.
Branch: rosuav/http-multi-connect
New APIs in Protocols.DNS.async_client - host_to_ips in both callback and Promise variants.
Change of behaviour in Protocols.HTTP.Query and the high level functions that call on it (eg Protocols.HTTP.get_url_data and do_async_method): All forms will now look up the name with potentially multiple results, and will attempt connections on IPv6 before IPv4 (previously, synchronous requests would prefer IPv4 unless unavailable, and async would prefer IPv6 unless unavailable).
Not merged into master as yet, but I will be using this on my own system.
Code review would be appreciated.
ChrisA
On Fri, Jun 11, 2021 at 1:31 PM Chris Angelico rosuav@gmail.com wrote:
Branch: rosuav/http-multi-connect
New APIs in Protocols.DNS.async_client - host_to_ips in both callback and Promise variants.
Change of behaviour in Protocols.HTTP.Query and the high level functions that call on it (eg Protocols.HTTP.get_url_data and do_async_method): All forms will now look up the name with potentially multiple results, and will attempt connections on IPv6 before IPv4 (previously, synchronous requests would prefer IPv4 unless unavailable, and async would prefer IPv6 unless unavailable).
Not merged into master as yet, but I will be using this on my own system.
Code review would be appreciated.
This looks good to me, but we still need to undo the host_to_ip() changes of course. I also wonder if implementers of Happy Eyeballs[1] really are just too careful, or if we ultimately would want to implement Happy Eyeballs, too; using a "get me a connection to TARGET please" type API.
On Thu, Jun 17, 2021 at 6:57 AM Tobias S. Josefowitz t.josefowitz@gmail.com wrote:
On Fri, Jun 11, 2021 at 1:31 PM Chris Angelico rosuav@gmail.com wrote:
Branch: rosuav/http-multi-connect
New APIs in Protocols.DNS.async_client - host_to_ips in both callback and Promise variants.
Change of behaviour in Protocols.HTTP.Query and the high level functions that call on it (eg Protocols.HTTP.get_url_data and do_async_method): All forms will now look up the name with potentially multiple results, and will attempt connections on IPv6 before IPv4 (previously, synchronous requests would prefer IPv4 unless unavailable, and async would prefer IPv6 unless unavailable).
Not merged into master as yet, but I will be using this on my own system.
Code review would be appreciated.
This looks good to me, but we still need to undo the host_to_ip() changes of course. I also wonder if implementers of Happy Eyeballs[1] really are just too careful, or if we ultimately would want to implement Happy Eyeballs, too; using a "get me a connection to TARGET please" type API.
Should host_to_ip be put completely back how it was (IPv4 only), or should it return IPv4 if available, IPv6 else?
Happy Eyeballs, if implemented, should probably be an option - it's a tradeoff between faster connections and more load. Might be worth actually pushing that one down in the stack a bit, maybe a Stdio or Protocols function that establishes a connection and returns it?
ChrisA
On Wed, Jun 16, 2021 at 11:08 PM Chris Angelico rosuav@gmail.com wrote:
Should host_to_ip be put completely back how it was (IPv4 only), or should it return IPv4 if available, IPv6 else?
In my opinion returning IPv6 if no IPv4 can be found would neither be something that even new users (i.e. people calling the method after that behaviour would have been introduced) would expect, or for that matter, desire. I suppose the proposed utility in behaving like this is in the case where some code blindly passing on the result to e.g. connect() or similar would magically and/or accidentally simply keep working with IPv6-only hostnames. I would think that that is not worth the confusion, surprise and wholly unexpected system states (in user programs never expecting this to happen).
Happy Eyeballs, if implemented, should probably be an option - it's a tradeoff between faster connections and more load. Might be worth actually pushing that one down in the stack a bit, maybe a Stdio or Protocols function that establishes a connection and returns it?
I would really like that.
On Thu, Jun 17, 2021 at 7:49 AM Tobias S. Josefowitz t.josefowitz@gmail.com wrote:
On Wed, Jun 16, 2021 at 11:08 PM Chris Angelico rosuav@gmail.com wrote:
Should host_to_ip be put completely back how it was (IPv4 only), or should it return IPv4 if available, IPv6 else?
In my opinion returning IPv6 if no IPv4 can be found would neither be something that even new users (i.e. people calling the method after that behaviour would have been introduced) would expect, or for that matter, desire. I suppose the proposed utility in behaving like this is in the case where some code blindly passing on the result to e.g. connect() or similar would magically and/or accidentally simply keep working with IPv6-only hostnames. I would think that that is not worth the confusion, surprise and wholly unexpected system states (in user programs never expecting this to happen).
Fair enough. The change is a very simple one. My recommendation is: Use host_to_ips (in the plural) for most purposes.
I think this is ready for master branch now. Last call for comments or objections.
Happy Eyeballs, if implemented, should probably be an option - it's a tradeoff between faster connections and more load. Might be worth actually pushing that one down in the stack a bit, maybe a Stdio or Protocols function that establishes a connection and returns it?
I would really like that.
Shouldn't be too hard. I'll just do the asynchronous version, but if a threaded version is wanted, that might be of value too.
That'll be a separate project, but I'll see about creating Stdio.establish_connection(name_or_ips) to do this.
ChrisA
On Thu, Jun 17, 2021 at 8:29 AM Chris Angelico rosuav@gmail.com wrote:
On Thu, Jun 17, 2021 at 7:49 AM Tobias S. Josefowitz t.josefowitz@gmail.com wrote:
On Wed, Jun 16, 2021 at 11:08 PM Chris Angelico rosuav@gmail.com wrote:
Should host_to_ip be put completely back how it was (IPv4 only), or should it return IPv4 if available, IPv6 else?
In my opinion returning IPv6 if no IPv4 can be found would neither be something that even new users (i.e. people calling the method after that behaviour would have been introduced) would expect, or for that matter, desire. I suppose the proposed utility in behaving like this is in the case where some code blindly passing on the result to e.g. connect() or similar would magically and/or accidentally simply keep working with IPv6-only hostnames. I would think that that is not worth the confusion, surprise and wholly unexpected system states (in user programs never expecting this to happen).
Fair enough. The change is a very simple one. My recommendation is: Use host_to_ips (in the plural) for most purposes.
I think this is ready for master branch now. Last call for comments or objections.
Happy Eyeballs, if implemented, should probably be an option - it's a tradeoff between faster connections and more load. Might be worth actually pushing that one down in the stack a bit, maybe a Stdio or Protocols function that establishes a connection and returns it?
I would really like that.
Shouldn't be too hard. I'll just do the asynchronous version, but if a threaded version is wanted, that might be of value too.
That'll be a separate project, but I'll see about creating Stdio.establish_connection(name_or_ips) to do this.
ChrisA
Took me a couple of months to get around to making sure this was good, but there's now a Protocols.TCP.HappyEyeballs class for establishing connections.
It's currently promise-only, but a callback version would be easy enough to add if needed.
I haven't changed Protocols.HTTP to use this, will try to make more use of Happy Eyeballs in my own code before making that change. Also, not sure how much inconsistency between the sync, thread, and async versions is acceptable. Maybe this should be a feature of Protocols.HTTP.Promise only?
ChrisA
pike-devel@lists.lysator.liu.se