Crosspost from user mailing list.
This is the outcome of the technical side of this years Pike conference. We came to the conclusion that once the following list of items are fixed we are ready to release Pike 7.8.
1. New syntax for getters and setters. The previous syntax where
mixed `->foo() {} void `->=foo(mixed v) {}
created a virtual symbol foo in the class are now replaced with
mixed `foo() {} void `=foo(mixed v) {}
I say replaced, but the old syntax remains. Perhaps it should be changed to return syntax error before the release.
Also there was a discussion about how modifiers should be handled. Currently the actual getter and setter functions are made private and the "union" of them are used as the visibility for the virtual symbol. This was argued against at the conference and I think the consensus was that the modifiers should apply to the getter and setter functions themselves and a "variable prototype" should be used to change the visibility of the virtual symbol (with the assumtion that typically you would want the virtual symbol public anyway). One (bizarre) example would be
protected void foo; public int `foo() { return x; } private void `=foo(int y) { x = y*2; }
The new syntax was implemented promptly, but as I said the old one still remains and nothing has been done w.r.t. the modifiers.
2. Adding a method for deprecating methods and symbols. This topic sparked a debate of how new modifiers can be introduced without having to reserve more keywords, with lots of ideas from other languages tossed around. These syntaxes (syntacies? :) were brought up
[weak, shared] int f(); @weak @shared int f(); weak shared int f();
where the last one (keyword solution look alike) was most discussed for several reasons. It would not modify the look of the language by introducing extra syntax, but it would also be complicated to implement was the entire prototype has to be parsed backwards. It was also noted that parametrization of modifiers would be desirable, which would complicate the grammer even more.
weak shared(Top) int(1..3) f(string s);
The conclusion was that further studies needed to be done to solve the new keyword issue. This was not considered a problem for introducting a new deprecation method though, as it was not something anyone was concerned about the looks for. The agreed upon new keyword is __deprecated__, which when added to a symbol should generate a warning when this symbol is used.
To supress deprecation warnings a new pragma directive, no_deprecated_warnings, should be added.
3. To pave the way for "real" static (or "shared" as we would call it to avoid confusion) the keyword "static" will be deprecated. While it technically has nothing to do with the __deprecated__ mechanism, its warnings will also be supressed by the new pragma.
The (hopefully) not used keyword "nomask", which is an alias for "final" should also be deprecated.
It was agreed that a semantic change in protected to make protected symbols accessible in other objects from the same class, was a good ides.
4. The iterator API should be changed to not use LFUNs, but instead more user friendly symbols. No compatibility layer that would enable both old and new iterators at the same time will be implemented due to the complexity (4 combinations of old/new iterator objects being looped over by code using the old/new API). There will of course be a #pike 7.6 mode to use the old API.
5. Support for mutliple backends to be selected between at runtime is something that is needed, since a Pike compiled with epoll support will not work on a system with only poll.
6. In some situations the Stdio.File and file.fd objects reference each other and leak objects/file descriptors. This should be fixed before Pike 7.8.
7. Martin Stjerholm wanted his work with Locale.Charset.EncodeError / DecodeError to be finished before 7.8. It was also suggested that Charset was moved to a top level module.
8. We need to ensure that there are no performance regressions in Pike 7.8. A quick profiling shows that we do have some performance problems which needs to be addressed. Some work begun immediatly by automatically finding between which versions performance dropped. Another action item was to make Pikefarm parse and save the benchmark data performed by xenofarm clients into a database and provide an interface to be able to query and graph the status.
9. The HTTPS regressions since Pike 7.6 should be fixed before Pike 7.8. This was most likely fixed during the conference.
10. The precompiler should move to Tools.Standalone so that cmods works as external modules properly.
11. One item that was discussed during the last day, and thus didn't make the "official" list of 7.8 items was how zero_type/UNDEFINED worked in Pike. After lot of discussions this was agreed upon:
class A { int x = 0; int y = UNDEFINED; }
A a = A();
zero_type( a->x ); // This should be 0 zero_type( a->y ); // This should be 1 zero_type( a->z ); // This should be 1
has_index( a, "x" ); // This should be 1 has_index( a, "y" ); // This should be 1 has_index( a, "z" ); // This should be 0
(So far this is how Pike currently behaves.)
mapping m = ([ "x" : 0, "y" : UNDEFINED, ]);
zero_type( m->x ); // This should be 0 zero_type( m->y ); // This should be 1 zero_type( m->z ); // This should be 1
has_index( m, "x" ); // This should be 1 has_index( m, "y" ); // This should be 1 has_index( m, "z" ); // This should be 0
Here the expression m->y will return 0 with zero type 0. This is due to all subtype flags being cleared by the mapping. For symmetry with with the object case this was agreed to to be changed.
On Sun, Nov 18, 2007 at 05:45:02PM +0000, Martin Nilsson (Opera Mini - AFK!) @ Pike (-) developers forum wrote:
Also there was a discussion about how modifiers should be handled. Currently the actual getter and setter functions are made private and the "union" of them are used as the visibility for the virtual symbol. This was argued against at the conference
could someone please repeat the rationale for changing that? martin stjernholm?
i can't think of a case that would benefit from that.
if a symbol is readable or writable, like in this example:
protected void foo; public int `foo() { return x; } private void `=foo(int y) { x = y*2; }
then it should be visible, seperating that would only add the ability to make invisible public symbols or publicly visible private symbols. both seem to only create confusion without gain. what i am i missing?
Adding a method for deprecating methods and symbols. [weak, shared] int f(); @weak @shared int f(); weak shared int f();
The conclusion was that further studies needed to be done to solve the new keyword issue. This was not considered a problem for introducting a new deprecation method though, as it was not something anyone was concerned about the looks for. The agreed upon new keyword is __deprecated__,
i had the impression that this was just another syntax suggestion or a suggestion for a convention. i don't mind either way. the __ make the keyword stand out and make me feel like not wanting to use it, which is probably a good thing. deprecating things should be done rarely and with a lot of consideration only.
- Martin Stjerholm wanted his work with Locale.Charset.EncodeError / DecodeError to be finished before 7.8.
could you elaborate what needs to be done here? it sounds like something that doesn't require indepth experience with the pike internals.
One item that was discussed during the last day, and thus didn't make the "official" list of 7.8 items was how zero_type/UNDEFINED worked in Pike.
mapping m = ([ "y" : UNDEFINED ]);
Here the expression m->y will return 0 with zero type 0. This is due to all subtype flags being cleared by the mapping. For symmetry with with the object case this was agreed to to be changed.
does this mean that UNDEFINED becomes a generally storable value?
how will the behaviour be with arrays or multisets?
zero_type(({ UNDEFINED })[0]); // this is 1 array a = ({ UNDEFINED }); zero_type(a[0]); // this is 0
zero_type(indices((< UNDEFINED >))[0]); // this is 1 multiset m = (< UNDEFINED >); zero_type(indices(m)[0]); // this is 1
seems like multisets can store UNDEFINED already.
greetings, martin.
if a symbol is readable or writable, like in this example:
protected void foo; public int `foo() { return x; } private void `=foo(int y) { x = y*2; }
then it should be visible, seperating that would only add the ability to make invisible public symbols or publicly visible private symbols. both seem to only create confusion without gain. what i am i missing?
The virtual symbol foo is not publicly readable or writable. It is protected. The function `foo is publicly visible, while the function `=foo is private.
how will the behaviour be with arrays or multisets?
zero_type(({ UNDEFINED })[0]); // this is 1 array a = ({ UNDEFINED }); zero_type(a[0]); // this is 0
Well, it's not really an interesting case, because when you try to index something that has no index in an array you get an exception (like a[1] in this case).
zero_type(indices((< UNDEFINED >))[0]); // this is 1 multiset m = (< UNDEFINED >); zero_type(indices(m)[0]); // this is 1
seems like multisets can store UNDEFINED already.
Well, zero_type(m[UNDEFINED]) gives you 0.
how will the behaviour be with arrays or multisets?
zero_type(({ UNDEFINED })[0]); // this is 1 array a = ({ UNDEFINED }); zero_type(a[0]); // this is 0
Well, it's not really an interesting case, because when you try to index something that has no index in an array you get an exception (like a[1] in this case).
This seems like a rather pike centric view arguing that UNDEFINED is a technical storage property of something (_the index foo_ of an array is not defined, rather than _the value stored at index foo_ is the undefined value often called null -- where the latter is more useful behaviour in practice).
At least this quoted example led me into believing that we are indeed upgrading the UNDEFINED value to a storable NULL value:
mapping m = ([ "x" : 0, "y" : UNDEFINED, ]);
zero_type( m->x ); // This should be 0 zero_type( m->y ); // This should be 1 zero_type( m->z ); // This should be 1
has_index( m, "x" ); // This should be 1 has_index( m, "y" ); // This should be 1 has_index( m, "z" ); // This should be 0
...where zero_type, in a scenario where this instead held:
has_index( m, "z" ); // This should be 1
would have let us use ->= to do m_delete in scenarios like m->z=foo(); (which could be useful at times, if probably more rarely).
If the aim is not to make UNDEFINED a proper storable null value, what is the goal with these changes? I'm not sure I've figured it out yet.
On Mon, Nov 19, 2007 at 02:25:02PM +0000, Johan Sundstr�m (Achtung Liebe!) @ Pike (-) developers forum wrote:
At least this quoted example led me into believing that we are indeed upgrading the UNDEFINED value to a storable NULL value:
right, that's what i thought too.
greetings, martin.
On Mon, Nov 19, 2007 at 11:35:02AM +0000, Martin Nilsson (Opera Mini - AFK!) @ Pike (-) developers forum wrote:
zero_type(({ UNDEFINED })[0]); // this is 1 array a = ({ UNDEFINED }); zero_type(a[0]); // this is 0
Well, it's not really an interesting case, because when you try to index something that has no index in an array you get an exception (like a[1] in this case).
since indexing here gives an exception then there is no harm to return UNDEFINED otherwise.
i am partly confused about the difference in the above results though.
greetings, martin.
Guess I was a bit tired when I wrote that.
The idea behind chaning the mapping case was to have a consistent behaviour between mappings and objects. With virtually no time spent considering the issue I see no reason to disallow subtypes in array entries either.
Oh, and the setter/getter syntax is `foo=(), not `=foo().
On Mon, Nov 19, 2007 at 04:50:02PM +0000, Martin Nilsson (Opera Mini - AFK!) @ Pike (-) developers forum wrote:
Oh, and the setter/getter syntax is `foo=(), not `=foo().
wasn't there a suggestion to change that from `foo=() to `=foo()?
I don't remember that. There was a suggestion (now implemented) to change from `->foo=() to `foo=().
One other thing to tag on to the end of the UNDEFINED discussion was the idea that we should add undefinedp() or similar, as zero_type can be tough to get comfortable with.
Bill
I've already implemented undefinedp() and destructedp(), but I feel they are a bit clumsy to write. Perhaps remove the "p"?
Another option would be "is_undefined()" and "is_destructed()", as we already have e.g. has_index(). OTOH, the "p" notation is already used for other type testing functions.
or reversing it? definedp() is_defined()
maybe even valuep() or typep() like is it a string, here asking is it any value or type at all.
greetings, martin.
We should probably stop with the "p"-ing already. It doesn't make any sense unless you are familiar with the concept from another programming language.
or maybe isdestructed(), etc? not having a qualifier might tend to suggest to the initiated that undefined() generates an undefined value rather than test for it. but, that could just be an educational point.
bill
On Nov 20, 2007, at 4:45 AM, Martin Nilsson (Opera Mini - AFK!) @ Pike (-) developers forum wrote:
I've already implemented undefinedp() and destructedp(), but I feel they are a bit clumsy to write. Perhaps remove the "p"?
!DSPAM:4742addd14298724344031!
Also there was a discussion about how modifiers should be handled. Currently the actual getter and setter functions are made private and the "union" of them are used as the visibility for the virtual symbol. This was argued against at the conference
could someone please repeat the rationale for changing that? martin stjernholm?
Ok. The reason is mainly to be consistent:
All other lfuns (`+, create, __hash, etc) are normal functions in this sense; they define an identifier with a normal declaration and the modifiers apply to that function, not the special characteristic it also enables. E.g:
class X (int n) { private X `+ (int n2) {return X (n + n2);} }
Here the function `+ is made private, but not the ability to use the object in an addition; you can still do X(3) + 4. I think this should apply to these functions too.
The argument has been raised that it never would be interesting to call the setter and getter functions with their function syntax (i.e. `foo=(17) or x->`foo=(17)), but to that I respond: Prove it. All other special functions have been useful in function form, so it's not safe to say these would never be useful that way. And if they are used in function form then it follows that it's interesting control access to them too.
Another reason is that other modifiers of a completely different nature could be added in the future. In that situation we could get into the murky situation that some modifiers apply to the functions themselves while others apply to the symbol, or maybe both.
A third reason is that it completely avoids the ambiguity when the getter and the setter are given different visibility. Which visibility should apply to the "virtual variable" in that case?
Following this reasoning, modifiers should apply to the getters and setters themselves and the "virtual variable" should always be public. That I think would work well, but in case the virtual variable needs protection, it could be solved using the void declaration syntax. I don't consider that very important to fix right away, though.
- Martin Stjerholm wanted his work with Locale.Charset.EncodeError / DecodeError to be finished before 7.8.
could you elaborate what needs to be done here? it sounds like something that doesn't require indepth experience with the pike internals.
The job is mainly to fix the Charset module to actually throw these errors. It's just a SMOP.
class X (int n) { private X `+ (int n2) {return X (n + n2);} }
Here the function `+ is made private, but not the ability to use the object in an addition; you can still do X(3) + 4. I think this should
Actually, you can't. lfuns that are private won't be used.
apply to these functions too.
This works in both 7.6.117 and 7.7.39 (although 7.7 gives a warning about an unused private function):
class X (int n) { private X `+ (int n2) {return X (n + n2);} static string _sprintf() {return "X(" + n + ")";} }
int main() { werror ("%O\n", X(3) + 5); }
I think changing it would have some annoying compatibility impact.
This works in both 7.6.117 and 7.7.39 (although 7.7 gives a warning about an unused private function):
[...]
Strange; program.c:low_find_lfun() does a lookup with SEE_STATIC, and no SEE_PRIVATE. The reason seems to be that really_low_find_shared_string_identifier() only cares about SEE_PRIVATE if the symbol has been inherited (ie comes from a private inherit)..
I think changing it would have some annoying compatibility impact.
Ok. Maybe it's buggy but it'd still be unwise to change, I believe.
Besides, I don't consider it any less odd that static lfuns still alter public behavior. I.e. the following illustrates the point I'm arguing just as well:
class X (int n) { static X `+ (int n2) {return X (n + n2);} static string _sprintf() {return "X(" + n + ")";} }
int main() { werror ("%O\n", X(3) + 5); }
mixed `->foo() {} void `->=foo(mixed v) {}
created a virtual symbol foo in the class are now replaced with
mixed `foo() {} void `=foo(mixed v) {}
The setter syntaxes above are incorrect. They should read `->foo= and `foo=, respectively.
mapping m = ([ "x" : 0, "y" : UNDEFINED, ]); zero_type( m->x ); // This should be 0 zero_type( m->y ); // This should be 1 zero_type( m->z ); // This should be 1 has_index( m, "x" ); // This should be 1 has_index( m, "y" ); // This should be 1 has_index( m, "z" ); // This should be 0 Here the expression m->y will return 0 with zero type 0. This is due to all subtype flags being cleared by the mapping. For symmetry with with the object case this was agreed to to be changed.
Agreed upon, indeed? That's news to me. I object to this strongly, since it effectively and completely defeats the purpose of UNDEFINED. We could just as well remove it completely instead. That'd make one less zero type, at least.
The object case might be justified to keep since one can just as well use has_index, but not this.
A possibility is maybe that
a->b = UNDEFINED;
actually removed the index "b" from the mapping a, but even that is dubious for compatibility reasons (places where this change would break stuff are very difficult to locate).
Agreed upon but those within earshot anyway. I guess that is the problem when sessions spill over into dinner. I don't have strong enough opinion to argue in either way. What is your view of how UNDEFINED should work in objects, mappings, multisets and arrays?
I think the prevailing opinion was that one should use has_index() to determine if a mapping contains an index, and that undefinedp() would be used to determine if the value of the specified index is UNDEFINED. The rationale provided by those in favor is that currently, one must use a special object to represent an undefined value. This is mostly useful for the case where you want to know whether an integer 0 was passed, or no value at all, particularly when mapping between languages (SQL or JSON for example) where there is a difference between 0 and "no value".
Can you elaborate on how you envision this would break things?
Bill
Agreed upon, indeed? That's news to me. I object to this strongly, since it effectively and completely defeats the purpose of UNDEFINED. We could just as well remove it completely instead. That'd make one less zero type, at least.
The object case might be justified to keep since one can just as well use has_index, but not this.
A possibility is maybe that
a->b = UNDEFINED;
actually removed the index "b" from the mapping a, but even that is dubious for compatibility reasons (places where this change would break stuff are very difficult to locate).
I agree that has_index is a neater way to check existence than zero_type, but that's beside the point.
The original intention with UNDEFINED (from long before it even had that name) was to be able to check the existence of an index and to get its value if it exists, both with a single indexing operation. UNDEFINED is the special result that signifies nonexistence, otherwise the value is returned.
For this to work it follows that it must be impossible to store UNDEFINED in anything indexable. In the mapping case, the choice was made to silently strip off the subtype. Another more stringent approach would have been to throw an error instead. In the object case I suspect the problem was simply overlooked, but I don't know.
I think UNDEFINED should keep this use. If the goal is to add nil, NULL, UNINITIALIZED or whatever then that's another value and another discussion.
This is mostly useful for the case where you want to know whether an integer 0 was passed, or no value at all, particularly when mapping between languages (SQL or JSON for example) where there is a difference between 0 and "no value".
In cases like that you really ought to have a different "no value" value on each abstraction level. Pike is one level, the sql/json module is another on top of it. A higher level should have its own "no value" which actually is a value on lower levels, so you can handle it in a natural way on the level where you are implementing the new level.
In the general case there can be an arbitrary amount of abstraction levels and hence an arbitrary amount of "no value" values. E.g. in Roxen there can be three: Pike (with UNDEFINED), rxml (with RXML.nil) and the sql module (with SqlNull, coming in 5.0). All attempts to avoid this has only lead to tricky code, confusion, and bugs (I regret making RXML.nil and UNDEFINED interchangeable in many of the variable support functions in the rxml framework).
I strongly disagree about enforcing (which is what we do, as a language, by not offering one neutral or canonic one) different NULLs on different levels. It's useful when you want it, and then nothing is there to prevent you from crafting one, but mostly not. I argue that the same applies to values like NaN, Infinity, -Infinity, et cetera.
But I agree that adding a NULL works best as a separate discussion, and that zero_type is a well defined and rooted function that would be a can of worms to change incompatibly. (And it's not like anyone would find it suddenly intuitive or descriptive under the new behaviour; its name and mode of functioning is much too esoteric and unguessable.)
I don't agree, perhaps not strongly, but still not. Having the same null for different context is no better than having the same 0 in different contexts. Was the result from the SQL query null, or wasn't it in the cache I stored it in? But that is not the important point of this discussion.
Looks like we are facing one of three non-appealing alternatives.
1. Keep things as they are. Mappings and objects doesn't behave symmetrically.
2. Allow mappings to store UNDEFINED. This will be abused as a null value with ill defined behaviour.
3. Don't allow UNDEFINED to be indexed from a object variable. This will allow for direct determination if a variable exists or not and get its value at the same time. This would make variables have different zero type depending on how they were accessed. Overridable by index lfun and getters (I would assume).
I don't agree, perhaps not strongly, but still not. Having the same null for different context is no better than having the same 0 in different contexts.
My point is that it's exactly as good (neither better nor worse): zero is zero, so you should not define a cache API that is supposed to be able to store a zero and assign zero any out of bounds semantics, as is the case with zero_type adding an out of bounds data property you can ostensibly inspect. To see if it's in the cache, add a test method returning a bool. For mappings, the name of this method is has_index.
Substitute "null" for zero above and strike the bit on zero_type, and the same argument holds. Null is null, no more, no less.
Context switching back to the zero_type topic from the unrelated null topic, I think my response to the difficult problem is that 3 is best, 1 acceptable and (after hearing Mast's thoughts) 2 senseless.
I guess the conversation has come full circle... I think it comes down to a decision between a) doing nothing and having to work around the problem in awkward ways and b) implimenting something that either breaks existing behavior (true null) or acts inconsistently (null value that is comparable to itself but acts like a zero in all other ways).
My biggest problem in all of this is that there's no way to differentiate between an integer that hasn't been specified (UNDEFINED or nil) and one whose value is simply zero. I think the introduction of UNDEFINED was bound to cause it to be misused, especially since a) there's a gap here in the language that (right or wrong) many other languages don't have and b) the discussion introducing UNDEFINED suggested it had a nil-like quality when used as an argument to methods.
It's further complicated by the multiple identities of zero: both 0 == UNDEFINED and 0 == 0 are all true... that's why I suggested that there be a new zero type to signify a value that's not been set, and have it be NULL or something. It would also be nice if UNDEFINED == UNDEFINED were true and 0 == UNDEFINED were false, but I haven't thought through all of the implications of that. It certainly would break some existing behavior that rely on that initially set value acting as a zero. It's probably enough to be able to query specifically the "intent" rather than to have comparisons look at the intent (undefined, index not present, object destructed).
Additionally, as I think some has mentioned, it's fine to have abstractions use their own entities for things like NULL and true/ false, but a big problem is that pike doesn't have a way to express these things that don't overlap with another datatype (is a zero 0, or null or false?).
Anyhow, I guess I don't have anything more to contribute.
Bill
On Nov 24, 2007, at 12:20 AM, Johan Sundström (Achtung Liebe!) @ Pike (-) developers forum wrote:
I don't agree, perhaps not strongly, but still not. Having the same null for different context is no better than having the same 0 in different contexts.
My point is that it's exactly as good (neither better nor worse): zero is zero, so you should not define a cache API that is supposed to be able to store a zero and assign zero any out of bounds semantics, as is the case with zero_type adding an out of bounds data property you can ostensibly inspect. To see if it's in the cache, add a test method returning a bool. For mappings, the name of this method is has_index.
Substitute "null" for zero above and strike the bit on zero_type, and the same argument holds. Null is null, no more, no less.
Context switching back to the zero_type topic from the unrelated null topic, I think my response to the difficult problem is that 3 is best, 1 acceptable and (after hearing Mast's thoughts) 2 senseless.
!DSPAM:4747b5a7258865209328925!
I strongly disagree about enforcing /.../ different NULLs on different levels. /.../
I wasn't talking about NULL, I was talking about UNDEFINED. Granted, in the sql case I mentioned, the newly added SqlNull is indeed a NULL and not an UNDEFINED, and that's because there's never any case in sql where a column conditionally exists or not.
However, both UNDEFINED and RXML.nil are of the same nature which is inherently bound to an abstraction level (in hindsight the name RXML.nil I chose was a bit unfortunate; the distinction between an UNDEFINED and a NULL value haven't always been clear to me either).
If a proper NULL value was added to pike, it could indeed be used on different abstraction levels, and this new SqlNull value could be made an alias for it.
Ah, mixing up discussions again. :)
I think what ecmascript does is kind of nice, though a bit late to add to pike: there is a "null" singleton and an "undefined" singleton (and "true" and "false"), and all variables start their life as undefined, unless initialized:
assert( null == null ); assert( null == undefined ); assert( null != false ); assert( null != 0 );
assert( undefined == undefined ); assert( undefined == null ); assert( undefined != false ); assert( undefined != 0 );
Changing not explicitly defined variables to be such an "undefined" value in Pike is bound to break lots of code and do more harm than good for the forseeable future. But I think both of these values would be useful. They are storeable and testable, as seen above (it's a bit crufty, in how that language needs a === comparison operator too, as a few values type coerce, much like how `+ et al behave in Pike, but we would of course not borrow te bad ideas :-).
pike-devel@lists.lysator.liu.se