(misdirected to pike@roxen.com .. updated version)
hi there pikers.. we're currently in the process of making the lpc code of the psyc muve server executable for both lpc and pike drivers. that is we're making a hybrid language with a stack of #define macros which compile to the respective syntaxes.
there is one thing that however cannot be solved with current pike compilers. lpc has a syntax for runtime generated bytecode lambdas (something pike doesn't provide by the way.. so what), which use a syntax that goes #'symbol.
all we hope for is to be able to keep this stuff in an #ifdef LPC, but that doesn't work currently. martin nilsson has submitted a cvs patch which tells the preprocessor.h to ignore unknown #' when in !OUTP(). that looks good and works fine, but a new problem is produced by the ' in #':
the lexer will additionally want to see a closing tick for an apparent char definition. that can be circumvented by appending a //' to the line, as in input_to(#'bla); //'
it will be a mess of circumventions in our code if we keep it like this, so i tried to change the behaviour. unfortunately the tick gobbler is deep in calcC() and i would have to pass the flags variable all the way down the chain of calcN() functions. any suggestions for this one?
i was surprised the preprocessor is parsing the ifdef'd out code at all, it could run ahead looking for the next ^(\s*)#e(lse|ndif) and thus improve compilation time. then tobij pointed out that multiline string syntaxes like #" and "\ could theoretically mask an #endif in a string definition. ugh.
the other issue i was trying to address today is to support the LPC syntax for ranges "from behind": string[2..<2]. looks like i would have to add a new opt_int_relative_range to the basic_type definition in language.yacc, then copy the code for opt_int_range with maybe a negated 2nd argument and then handle the case when the string is at our hands. funny problem is, i couldn't find the code where the interpreter actually applies the range to the string. can anyone give me a hint where that happens?
the virtues of [x..-y] and [x..<y] syntaxes have been discussed before, so i'll simply presume the majority supports the string[2..<2] plan. i just realized however, that it doesn't stop at that.. i would have to add all variants of [..<] [<..] and [<..<] to the grammar with or without arguments. maybe a bunch of define macros in our code are less work then.. ;-(
sorry for rushing in and going straight for the language core.. but that's what we need
I don't think it's wrong to require that deactivated #if blocks still are syntactically correct at the token level. If it's a pike file it's still pike code in those parts too - things like comments and string literals must be considered to correctly decide whether something that looks like a cpp directive really is one or not. The preprocessor is not language independent.
The problem you run in to is due to that you want to use two languages with different tokenization rules in the same file. The imho logical way to cope with that is to run some kind of transformation on them before they are ever fed to the language specific preprocessor and compiler. Maybe you can use a language independent preprocessor like m4 or a custom loader that rewrite the files before feeding them to cpp() and compile(). Make a custom master and override compile_file and compile_string, perhaps?
I'm not sure the "fix" that Nilsson made to ignore invalid cpp directives in deactivated blocks is a good one. It can make errors harder to detect, e.g. if you happen to write "#elsif" in there.
the virtues of [x..-y] and [x..<y] syntaxes have been discussed before, so i'll simply presume the majority supports the string[2..<2] plan.
I'm for it.
i would have to add all variants of [..<] [<..] and [<..<] to the grammar with or without arguments.
I can't think of anything useful for those variants to do. What do they do in lpc?
On Tue, Oct 26, 2004 at 09:10:05PM +0200, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
are syntactically correct at the token level. If it's a pike file it's still pike code in those parts too - things like comments and string
Sometimes #ifdefs used to isolate not finished (thus probably syntactically incorrect) code from compiler. Once I was surprised when got warning like "Unfinished string", when I used #ifdef for some text which was not Pike code (a string was like `blabla'), but I couldn't use regular comments to isolate it (there were some */ inside).
IMHO, preprocessor must be language independent, unless it is completely integrated into language, or, at least, it should be possible to turn off syntaxt checks in some cases (#pragma or #comment __delimiter__).
Regards, /Al
The code doesn't have to be completely syntactically correct, only tokenization-wise correct. It's perfectly fine to have syntactic errors like unbalanced parens and missing semicolons in there.
Anyway, I don't want the preprocessor to be language independent because I don't want to watch out for cpp-directive-like things in comments and string literals.
I've also run into the situation you describe a couple of times in all my years with Pike, but for me it's not nearly often enough to warrant some kind of language support.
well, if we introduce counting from the end, why not allow a range that starts relative to the end too?
a[x..<y] == a[x..sizeof(a)-y] a[<x..y] == a[sizeof(a)-x..y] a[<x..<y] == a[sizeof(a)-x..sizeof(a)-y]
may be off by one, but you get the idea.
I've gathered that. What I don't understand is what the variants with "<" but without arguments are supposed to do. I.e. a[<..x], a[x..<] and a[<..<].
And for the record I think the construct should behave like this:
a[x..<y] == a[x..sizeof(a)-1-y] a[<x..y] == a[sizeof(a)-1-x..y] a[<x..<y] == a[sizeof(a)-1-x..sizeof(a)-1-y]
It's that way in lpc, I hope?
The remaining problem when it comes to implementing this is how to call lfuns in a good way, whether it's some extension to `[] or perhaps one or more completely new lfuns.
I think the easiest would be completely new lfuns, so you get an error if it's not implemented. Or that you can fallback to do the sizeof (in predef::`[]) before calling the lfun if it isn't implemented.
Yes, I'm thinking along the same lines. Right now this is my best shot:
A new lfun `[..] to handle all range operations. In the long run `[] no longer will be used for that, so that it can concentrate on single element indexing (which really is a fairly different operation).
`[..] would get this type:
mixed `[..] (int lower_bound_type, int lower_bound, int upper_bound_type, int upper_bound);
The bound type arguments say what type of bound was given, either a normal from-the-start bound, a from-the-end bound, or unspecified (i.e. as the first bound in a[..x]). This gives complete control to the object so that it's possible to implement data types with indeterminate lengths and/or with "real" indexing below zero.
lower_bound_type and upper_bound_type could be combined to a single bit field, but it's probably overall more expensive to do the necessary bit operations on that instead of passing them separately.
For compatibility, there'll be a fallback to `[] which then is called like this for all types of range operations:
o[a..b] => o->`[] (a, b) o[a..<b] => o->`[] (a, o->_sizeof()-1-b) o[a..] => o->`[] (a, Pike.NATIVE_MAX) o[<a..b] => o->`[] (o->_sizeof()-1-a, b) o[<a..<b] => o->`[] (o->_sizeof()-1-a, o->_sizeof()-1-b) o[<a..] => o->`[] (o->_sizeof()-1-a, Pike.NATIVE_MAX) o[..b] => o->`[] (0, b) o[..<b] => o->`[] (0, o->_sizeof()-1-b) o[..] => o->`[] (0, Pike.NATIVE_MAX)
The unfortunate thing with this fallback is that one can't define a class with element indexing but without range indexing, which is a very common case. Maybe there could be a runtime type check that `[] takes more than one argument before trying this fallback.
Is there a problem in not having a fallback for <-cases? If `[] for ranges is to be phased out, it would make sense to not have it called for situations it wasn't called before.
And I rather have Int.Inf than Pike.NATIVE_MAX.
Yes, it would make the from-the-end functionality not work for old data types (assuming they implement array-like behavior, but with the current `[] calling convention they more or less have to do that).
As for Pike.NATIVE_MAX, it's how it already behaves. It'd of course be better to use an infinity symbol if there were one.
Is Int.inf added as a constant for that value too? Otherwise we should do that.. ;)
I see that you've added an Int.inf. Nice, but isn't it quite severely restricted by the fact that it still isn't taken as a "real" int by the type system?
void foo (int x) {} foo (Int.inf);
Compiler Error: 1:Bad argument 1 to foo. Compiler Error: 1:Expected: function(int : void) Compiler Error: 1:Got : function(object(is 65695) : void | mixed)
When I contemplated how this could be done, I was thinking along the lines of adding a couple of special bignum instances in mpz_glue.c or thereabouts.
I wanted somewhere to begin. Ideally I would like to have integer inf support as low as in GMP itself.
i suppose "Done." refers to Int.inf but not the whole [x..<y] thing.
could you explain to us what needs to be done to implement [x..<y] in small steps?
- adding the `[..]() function - create all the fallbacks to `[]() - add the <x syntax to the parser
is there anything else? can you break these up even more?
adding the <x syntax seems to be the hardest part, any hints?
greetings, martin.
It refers to the whole thing. Read the CVS. http://pike.ida.liu.se/development/cvs/combined.xml
No, actually the opposite. Int.inf is still on the to-do list. ;)
wow, you guys never cease to amaze me!
greetings, martin.
I'm not quite sure if it's right to use sizeof(a)-1 as the end bias anymore. It's consistent in one way since 0 is the first element from the start and <0 the first from the end.
On the other hand, it's off-by-one wrt single indexing (not considering the sign). I.e. the last element through indexing is -1 (a[-1]), while it's <0 through subranging (a[<0..]). Seen that way it appears to be more appropriate to let <1 refer to the last element.
Which way is better? How is it in lpc? Should it perhaps even be that < only selects the negative range so that the index still have to be negative, i.e. to select the last index one would have to write <-1?
I think we should leave this open for now. Please play around with it, but be advised that it might very well change.
What, we don't get to have the index-from-the-end-discussion every year any more? Excellent! Well done!
On the other hand, it's off-by-one wrt single indexing (not considering the sign). I.e. the last element through indexing is -1 (a[-1]), while it's <0 through subranging (a[<0..]). Seen that way it appears to be more appropriate to let <1 refer to the last element.
It appears to me that <-1 should refer to the last element, if you want consistency.
On the other hand, maybe a[<0] should refer to the last element too?
If you can index from the end with both positive and negative numbers then you'll get the difficult special cases near the bounds. It's the same reason why we don't use negative indices in ranging in the first place.
I don't understand. Of course you can use negative index in range operators in the first place. arr[-17..42] works quite well, why wouldn't you be able to address the end of the array with both positive and negative indexes?
Sorry, I didn't read carefully; ignore the previous answer.
Yes, it wouldn't be bad to extend the indexing operator with < too, regardless of off-by-one issues. For consistency, if nothing else. The problem with that is that one then would expect an index_type argument to `[] too, i.e. the reasonable prototype would then be
mixed `[] (int index, int index_type)
where index_type is a flag that says whether the < variant is used or not. That clashes a bit with the current `[] operator since the second argument means a range operation.
Well, I think the solution can be the same as for the range operator - introduce a new lfun callback for item indexing with or without the end flag, and fallback to `[] (as usual) if it isn't present.
What I meant was that arr[-17..42] == arr[0..42], which is a good thing. The same should apply to the from-the-end variant too, of course.
Yes, but you can also see < marking the size of the array (even if not totally syntactically true), and then <-1 is logical, and <1 would simply mean "to the end of the array" and the direction would be the same as addressing from the start of the array.
Ie,
arr[ ..<-2] == arr[ ..sizeof(arr)-2] arr[ ..<-1] == arr[ ..sizeof(arr)-1] arr[ ..< 0] == arr[ ..sizeof(arr)+0] arr[ ..< 1] == arr[ ..sizeof(arr)+1]
No, please don't change the meaning of the lfuns.
I meant like a mixed `[<](mixed index,int end_flag)
just as the
mixed `whateveritwas(mixed index1,int end_flag1, mixed index2,int end_flag2)
you already made?
You mean like a `[<] or something? Perhaps, but then there would by extension be `[..<], `[<..] and `[<..<] too, and I think that would be unwieldy. I think it'd be unwieldy to have to write `[] and `[<] as two separate functions too, for that matter.
In that case I think it's better to simply change the `[] calling convention incompatibly. It's afterall something that #pike can cope quite well with.
The lfun name doesn't make sense in that case. It's tough when the only natural name already is taken. In this case I'm more inclined to introduce an incompatibly to stay with the natural name, since it's cleaner in the long run.
I think changing `[] is the best solution, since the current API is neither well used nor good. But I could live with having e.g. `[.] instead.
Either that, or `[,] :)
Changing APIs is *bad*, and #pike isn't an excuse.
Besides, it's *very* convenient to have `[] if you are too lazy to implement the '<'-operator parts yourself.
On Sat, Oct 30, 2004 at 03:05:02PM +0200, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
How is it in lpc?
the psyc guys are working on a test case, from what i can gather it may be getting more complex.
from ldmud lang.y:
/* This is used to parse and return the indexing operation * of an array or mapping. * .inst gives the type of the operation: * F_INDEX: [x] * F_RINDEX: [<x] * F_AINDEX: [>x] * F_RANGE: [ x.. y] * F_RN_RANGE: [<x.. y] * F_NR_RANGE: [ x..<y] * F_RR_RANGE: [<x..<y] * F_AN_RANGE: [>x.. y] * F_AR_RANGE: [>x..<y] * F_NA_RANGE: [ x..>y] * F_RA_RANGE: [<x..>y] * F_AA_RANGE: [>x..>y] * F_NX_RANGE: [ x.. ] * F_RX_RANGE: [<x.. ] * F_AX_RANGE: [>x.. ] * .start and .end are the bytecode limits of the whole * operation. * .type1 and optionally .type2 are the types of the * index values. */
[>x] appears to be the same as [x], so unless full lpc compatibility is the goal, this can be ignored.
greetings, martin.
In this case I actually think #pike is a reasonable excuse. It's not _that_ big a deal to put a couple into your old code:
#pike 7.6 mixed `[] (int a, void|int b) #pike __REAL_VERSION__ { // Feel free to use fancy new features here. }
A point in case is that the necessary places to change are found easily with a grep.
I think conveniency argument is a more compelling one: If you got a _sizeof in your class you don't have to deal with pesky from-the-end flags at all.
Another thing is that `[] wouldn't really be the convenience alternative in the future either: One would expect sizeof magic in the single argument case too, and that isn't compatible. Consider old style code calling a new style conveniency `[] that tries to operate on that principle.
In this case I actually think #pike is a reasonable excuse. It's not _that_ big a deal to put a couple into your old code:
To me, #pike is for emergency use only. If possible, I write programs that are compatible with any pike (at least upwards). "You can solve that with grepping and inserting #pike" is never ever a good argument, in my opinion.
I think conveniency argument is a more compelling one: If you got a _sizeof in your class you don't have to deal with pesky from-the-end flags at all.
Yes.
Another thing is that `[] wouldn't really be the convenience alternative in the future either: One would expect sizeof magic in the single argument case too, and that isn't compatible. Consider old style code calling a new style conveniency `[] that tries to operate on that principle.
*thinks* Explain.
I never said it was a _good_ argument, only a reasonable alternative in this case. I'm certainly not the guy that casually throws compatibility away justifying it with #pike, as anyone who has followed this forum should be able to attest.
*thinks* Explain.
An `[] lfun still can't keep it simple by assuming all indices are between 0 and _sizeof()-1 since there's a lot of code that uses negative indexing and it wouldn't be compatible to start converting negative indexes using sizeof.
I'm actually getting doubtful if it's a good idea to try to meddle with the negative indexing paradigm at all. Seems like it'll add too much confusion.
Don't meddle with it then; assume that `[] is capable of handling negative indices as usual and transform
a[<x] -> a[-1-x]
and skip making another lfun.
Besides the restricted-to-number issue, that'd also make it effectively impossible to make a well working data type with a negative index base. Kind of a pity since it's perfectly reasonable to want to implement e.g. a coefficient vector which is centered around zero.
That restriction could almost alleviate the flags in the range case too, btw: Map from-the-end indices from -1 and downwards there too, and cap from-the-start indices so that they never go below zero, and from-the-end indices never go above -1. The remaining cases are a[..-1] and a[sizeof(a)..] which should produce an empty sequence. But all of them could summarily be represented as a[1..0] or something.
But is the convenience worth such a restricted system? It'll undoubtedly cause headaches for some people in the future.
First of all, thanks Martin(s), for supporting our efforts to port psycMUVE to Pike.
Our testcase (as mentioned by Martin and running under LDMud):
http://psyc.rebelsex.com/syntaxtest (source at http://psyc.rebelsex.com/syntaxtest?source - quick&dirty)
a[..*] is of course equal to a[0..*].
LPC does a[x..<y] == a[x..sizeof(a) - y] so a[0..<1] is the whole string/array.
fippo looked through the psycMUVE and found us using cases 1, 2, 4, 6 and 13 more or less frequently. The others... well... arcane :)
Case 2 (a[<1]) is already achieved using a[-1] in Pike and as we use it approximately 15 times we can handle it using a macro ... but if someone should think that a[<positive_integer] is nice to have... we'd make use of it :).
Cheers
You rule! I thought it wouldn't ever see the light of day...
Sure we get to have them again.
So far, I don't think anyone has opposed the concept as such. The big problem has always been syntax and obscure semantics. And those can resurface at any time... ;)
Thanks for the info.
LPC does a[x..<y] == a[x..sizeof(a) - y] so a[0..<1] is the whole string/array.
Do you know the rationale behind this? I can see the logic behind letting <-1 be the last element (at least in Pike where the single-element indexing operator works the way it does) and <0, but counting from 1 and upwards seems odd.
a[x..<y] means a[x..sizeof(a) - y] and as "sizeof(str) - 1" is the last element of str, <1 is the last element.
There's nothing more to it than that? What I'm looking for are reasons more like that it generally makes common operations simpler, less off-by-one adjustments, etc.
For comparison, in the positive array index case, experience has shown that it's generally more convenient to start from zero, and to use python style ranges where the bounds are specified as low <= index < high, i.e. the high bound is one off the end of the range.
So I thought there might be similar experience in the from-the-end indexing case that is behind the choice to start at 1.
One such use case: Consider that you have a string x and another string y that is a suffix to x. You want to get the substring of x that ends where the suffix starts.
o If from-the-end indexing starts at -1 going downwards, you have to write like this:
x[..< -sizeof (y) - 1]
I.e. both a negation and an off-by-one adjustment are necessary.
o If from-the-end indexing starts at 1 going upwards:
x[..< sizeof (y) + 1]
o If from-the-end indexing starts at 0 going upwards:
x[..< sizeof (y)]
So in this specific case, the last alternative leads to the simplest code. There must be more such common use cases.
I'm not sure if I'm commenting the correct part of the thread. Anyway, this is a comment regarding negative indices to `[], with one index given.
I see many uses of objects where negative indices are fully apropiate. Thus, if the standard `[] lfun is changed to take an index type argument, it must must not clamp the range to 0<=x<sizeof(Object), as it can be fully valid with an object having sizeof(Object)==10, while the indices ranges -5..4.
May I suggest that no change is made to the LFUN API, and that the < > stuff is handled with wrappers.
Something like:
class FromEnd(mixed x) { int `(mixed y) { return sizoef(y) - y; } }
o[x..y] == o->`[](x,y) o[<x..y] == o->`[](FromEnd(x),y) o[<x..<y] == o->`[](FromEnd(x),FromEnd(y)) o[x..<y] == o->`[](x,FromEnd(y)) o[x] == o->`[](FromEnd(y))
This should be 100% compatible with old code. The drawback is that all `[] functions will have to be updated to support the < syntax.
Yes, and so it can hardly be considered 100% compatible; it's afaics only in the cases where the arguments are passed straightly on to some internal `[] that it works without any change.
The main benefit of your alternative is afaics that the ugly separate flag arguments are avoided; the flags are there nevertheless and needs to be handled. The price is more object cloning. I'm not so sure that won't have a noticeable impact.
pike-devel@lists.lysator.liu.se