Switches use binary search on a sorted array. Using floating point NaN either as a case entry or as a lookup value therefore does not work properly. I think the current situation can lead to bugs easily, e.g. when using case ranges with floats as in this rather artificial example:
int is_in_range(float f) { switch (v) { case 0.0..1.0: return 1; default: return 0 } }
I think we should (1) issue a warning when Math.nan is used inside a case and (2) make sure that switch (Math.nan) does not match any case.
Any comments?
arne
I think we should (1) issue a warning when Math.nan is used inside a case and (2) make sure that switch (Math.nan) does not match any case.
I believe that the compiler should:
- Special case NaN if there's an explicit case-label with NaN.
- Map NaN to the default label (if any) otherwise.
This should be rather simple to do with a minor modification of switch_svalue_cmpfun().
/grubba
On Wed, 8 Jul 2015, Henrik Grubbstr�m (Lysator) @ Pike (-) developers forum wrote:
I believe that the compiler should:
Special case NaN if there's an explicit case-label with NaN.
Map NaN to the default label (if any) otherwise.
We should definitely do the second, the current behavior is just a bug. I am personally not sure about the second, because it would make the switch incompatible with standard equality.
Container types have similar problems with nan. Trivial example:
> ([ Math.nan : 1 ]); (1) Result: ([ /* 1 element */ nan: ** gone ** ])
Of course, this could be fixed, too, by fixing describe_mapping. Similar, but more subtle:
> mapping m = ([ Math.nan : 1 ]); > equal(m, m+([])); (1) Result: 1 > equal(m+([1:2]), m+([1:2])); (2) Result: 0
What about arrays?
> ({ 1, 2, 3, Math.nan }) ^ ({ Math.nan }); (1) Result: ({ /* 3 elements */ 1, 2, 3 }) > ({ 1, 2, 3, Math.nan }) - ({ Math.nan }); (2) Result: ({ /* 4 elements */ 1, 2, 3, nan })
uh??
Of course all this is nothing new. Python afaik tries to treat NaN in containers as if it compared equal to itself. Is that the best one can do?
arne
Of course all this is nothing new. Python afaik tries to treat NaN in containers as if it compared equal to itself. Is that the best one can do?
What about the following radical proposal: Define Math.nan == Math.nan to be true. It would immediately fix all the inconsistencies with container types. The only purpose of NaN != NaN in the ieee standard seemed to have been to make it simple to detect NaN without having isnan().
Would this be a compatibility problem? Do people detect Math.nan using x != x?
On Thu, Jul 9, 2015 at 2:09 AM, Arne Goedeke el@laramies.com wrote:
Of course all this is nothing new. Python afaik tries to treat NaN in containers as if it compared equal to itself. Is that the best one can do?
What about the following radical proposal: Define Math.nan == Math.nan to be true. It would immediately fix all the inconsistencies with container types. The only purpose of NaN != NaN in the ieee standard seemed to have been to make it simple to detect NaN without having isnan().
Would this be a compatibility problem? Do people detect Math.nan using x != x?
Violating IEEE without a good reason would introduce other problems, and I'm sure there'll be discussions around the place of exactly why nan has to be unequal to itself. Incidentally, Python's rule about NaN in containers isn't that it compares equal to itself, but that container membership is based on a two-part check of identity and then equality. Among other benefits, that allows automatic optimization of the common case of iterating over the keys and retrieving the values, since the keys will identity-match when you look them up. I think it'd be a good model for Pike to imitate.
ChrisA
Violating IEEE without a good reason would introduce other problems, and I'm sure there'll be discussions around the place of exactly why nan has to be unequal to itself. Incidentally, Python's rule about NaN in containers isn't that it compares equal to itself, but that container membership is based on a two-part check of identity and then equality. Among other benefits, that allows automatic optimization of the common case of iterating over the keys and retrieving the values, since the keys will identity-match when you look them up. I think it'd be a good model for Pike to imitate.
The reason for NaN != NaN in IEEE is afaik the one I mentioned. Thanks for the clarification of what python does, I am not much of a python programmer.
I am not so sure about the semantics of id(). It seems to be basically the pointer value of the storage location of the variable. Are floating point values in python passed by reference? Are they objects? This would not work in pike, because floats (as ints) are passed by value.
On Thu, Jul 9, 2015 at 2:37 AM, Arne Goedeke el@laramies.com wrote:
Violating IEEE without a good reason would introduce other problems, and I'm sure there'll be discussions around the place of exactly why nan has to be unequal to itself. Incidentally, Python's rule about NaN in containers isn't that it compares equal to itself, but that container membership is based on a two-part check of identity and then equality. Among other benefits, that allows automatic optimization of the common case of iterating over the keys and retrieving the values, since the keys will identity-match when you look them up. I think it'd be a good model for Pike to imitate.
The reason for NaN != NaN in IEEE is afaik the one I mentioned. Thanks for the clarification of what python does, I am not much of a python programmer.
I am not so sure about the semantics of id(). It seems to be basically the pointer value of the storage location of the variable. Are floating point values in python passed by reference? Are they objects? This would not work in pike, because floats (as ints) are passed by value.
The semantics of id() in Python are deliberately nonspecific about any precise meaning for that number; in CPython (the most common interpreter) it's the address, but other Pythons use arbitrary sequential numbers, or other schemes. All that matters is:
1) Every object has an identity. 2) If "x is y", then "id(x) == id(y)" 3) If "x is not y", then "id(x) != id(y)", as long as x and y exist concurrently.
And yes, Python floats are objects - everything in Python is an object. In Pike, with floats being value types, the notion of "identity" might have to be expanded to "bit-pattern", but that's slightly less ideal, as it could result in two separately-generated NaNs matching (which otherwise shouldn't happen). But stuffing two different NaNs into a single mapping is going to be pretty rare.
ChrisA
On Thu, 9 Jul 2015, Chris Angelico wrote:
The semantics of id() in Python are deliberately nonspecific about any precise meaning for that number; in CPython (the most common interpreter) it's the address, but other Pythons use arbitrary sequential numbers, or other schemes. All that matters is:
- Every object has an identity.
- If "x is y", then "id(x) == id(y)"
- If "x is not y", then "id(x) != id(y)", as long as x and y exist
concurrently.
And yes, Python floats are objects - everything in Python is an object. In Pike, with floats being value types, the notion of "identity" might have to be expanded to "bit-pattern", but that's slightly less ideal, as it could result in two separately-generated NaNs matching (which otherwise shouldn't happen). But stuffing two different NaNs into a single mapping is going to be pretty rare.
I think the most useful way to define identity in containers is x == y || isnan(x) && isnan(y) because the NaN payload is not visible from pike.
On Thu, Jul 9, 2015 at 4:08 AM, Arne Goedeke el@laramies.com wrote:
I think the most useful way to define identity in containers is x == y || isnan(x) && isnan(y) because the NaN payload is not visible from pike.
The intention of nan!=nan is that any two calculations that yield NaN are guaranteed to compare unequal, even if they happen to produce the same payload. (As IEEE floating point formats have finite storage, yet there are infinite non-numbers, collisions can occur.) Container handling in Python presumes upon having another reference to the exact same NaN object, even if two of them happen to have the same bit-pattern; Pike can approximate to this by requiring that the payloads be identical, which gives roughly one chance in 2**53 that arbitrarily-generated NaNs will errantly match in a container. Allowing _any_ NaN to match _any_ other NaN seems to be an unnecessary violation of IEEE principles, while not giving any benefit in terms of container handling. Example:
mapping m=([]); float f1=get_a_number(); float f2=get_a_number(); m[f1] = "f1"; m[f2] = "f2"; foreach (indices(m), float key) write("m[%O] = %O\n", key, m[key]); write("Expecting size %d, actually %d\n", 1 + (f1!=f2), sizeof(m));
In the absence of NaNs, this should always produce sane results. Either the two numbers are equal and one overwrote the other, or they're not. If both are NaN and their payloads happen to collide, then it'll produce odd results (unequal but overwritten). But if they're both NaN and their payloads do not collide, then everything should happen sanely - you look up the two NaNs and get back "f1" and "f2" from the mapping, which has a length of 2, which is expected (as the two are unequal). Yes, the payload itself may not be visible from Pike, but you can pull a key out of the mapping and then use it to look the value up (which guarantees that the payload hasn't changed, since the value hasn't changed in any way), and payload checking gives at least a chance that NaNs will behave properly. Of course, it's entirely possible that the payloads aren't random, so they'll collide frequently; but that probably depends on the specific hardware, and unless Pike specifically invents a concept of NaN identity, it's a limitation that can't be broken.
ChrisA
On 07/08/15 20:22, Chris Angelico wrote:
The intention of nan!=nan is that any two calculations that yield NaN are guaranteed to compare unequal, even if they happen to produce the same payload. (As IEEE floating point formats have finite storage, yet there are infinite non-numbers, collisions can occur.) Container handling in Python presumes upon having another reference to the exact same NaN object, even if two of them happen to have the same bit-pattern; Pike can approximate to this by requiring that the payloads be identical, which gives roughly one chance in 2**53 that arbitrarily-generated NaNs will errantly match in a container. Allowing _any_ NaN to match _any_ other NaN seems to be an unnecessary violation of IEEE principles, while not giving any benefit in terms of container handling. Example:
mapping m=([]); float f1=get_a_number(); float f2=get_a_number(); m[f1] = "f1"; m[f2] = "f2"; foreach (indices(m), float key) write("m[%O] = %O\n", key, m[key]); write("Expecting size %d, actually %d\n", 1 + (f1!=f2), sizeof(m));
In the absence of NaNs, this should always produce sane results. Either the two numbers are equal and one overwrote the other, or they're not. If both are NaN and their payloads happen to collide, then it'll produce odd results (unequal but overwritten). But if they're both NaN and their payloads do not collide, then everything should happen sanely - you look up the two NaNs and get back "f1" and "f2" from the mapping, which has a length of 2, which is expected (as the two are unequal). Yes, the payload itself may not be visible from Pike, but you can pull a key out of the mapping and then use it to look the value up (which guarantees that the payload hasn't changed, since the value hasn't changed in any way), and payload checking gives at least a chance that NaNs will behave properly. Of course, it's entirely possible that the payloads aren't random, so they'll collide frequently; but that probably depends on the specific hardware, and unless Pike specifically invents a concept of NaN identity, it's a limitation that can't be broken.
If we would define NaN floats to compare equal, all the inconsistencies would go away. So your above example would work just fine with NaNs. However, it would be quite a radical choice, I will not lobby for it any further ;)
On the other hand, we do not have the choice to implement an identity similar to what python has. So we _do_ have to invent a concept of NaN identity based on the value of the payload. We agree that it has to be restricted to containers. I personally don't like the idea of having something that breaks sometimes. To me it would seem that it makes NaN floats inside of container no more useful than they currently are. Of course, other people might have different priorities. I am then only left with one option, namely having the key comparison treat all NaNs equal.
As a side note: ECMAScript 6 Map and Set types seems to treat key equality roughly as I propose.
Arne
pike-devel@lists.lysator.liu.se