Is it possible to fing the systems size of an int from within pike? I have to do some parsing of binary data and dont want to do it inside a cmod.
arne
Ints in pike have arbitrary size. If you put in a larger number, they will grow. So using e.g. sscanf("%8c") will work fine even on a 32 bit machine.
That said, Int.NATIVE_MAX should give you the largest number you can put in a native integer on the platform.
Yes, but it's not useful for parsing binary data. (I'm kind of sceptical that it is useful ever... ;)
It's useful if you want to limit your data to data that doesn't require objects (e.g. to reduce memory footprint).
If some other application running on the local machine is feeding you arch depentent binary data, I'd expect it's pretty useful information.
It's useful as "something larger than anything I'll process here". Ideally Int.inf should be used for that, but it's an object and doesn't work as input to most C functions.
Well, such a value should be derived from the domain of the values you will process. Otherwise you have no guarantees that the value will be large enough anyway.
The domains are always considerably smaller, of course. There's typically no need to make an effort to choose a value that's just narrowly higher, and Pike.NATIVE_MAX works reasonably well as an idiom for this.
Use the value 100000. Since the domains are "of course" always considerably smaller, there's no need to make an effort to chose a different value.
That's silly. One can look at it this way too: The domains have no fixed limits, but if Pike.NATIVE_MAX would be reached in some exceedingly rare circumstance then there will be other problems than my choice of a "sufficiently large" value. Typically said C functions will cease to work. I'm content with using a value that is large enough to leave the problem with a limited integer range somewhere else.
You're the one being silly. The whole point of the auto bignums is that you should _not_ automatically get "other problems" just because your integers get large. Using an internal constant to _make sure_ that you'll get (much more difficult to detect) problems seems rather retarded to me.
[I attributed "silly" to your rather nonconstructive argument, not to you.]
Seems like you choose to ignore the key issue here, namely C functions which are limited to native integers. There are still plenty of them in Pike.
For example, one important class of bounded native integers are the sizes of things in memory. If I want a number that in practice always is larger than (and in theory never less than) the size of any pike object, then Pike.NATIVE_MAX is a good choice. Even if it never is passed to any non-bignum C function, using a larger value or Int.inf would just incur unnecessary runtime overhead.
[I attributed "silly" to your rather nonconstructive argument, not to you.]
And my point was that I just turned your own argument around, substituting one arbitrary constant for another (incidentally an invariant one, which is better)...
Seems like you choose to ignore the key issue here, namely C functions which are limited to native integers.
Not really, but I was a bit tired yesterday, so I guess my point didn't get across very well.
Let me make another, more structured attempt. :-)
The assumption made here is that you have
1) Some input set I, the elements of which are truly unbounded.
2) An algorithm A, operating on I, which requires an element î which is larger than any element in I.
3) A function (implemented in C or otherwise) f which has a limited input range [..L], which is fed the elements of an output set O produced by A.
If A produces outputs used for multiple functions with bounded inputs, simply generalize the above to O1, O2, f1, f2, L1, L2 etc.
Your argument was that since elements in I are unbounded, you can get elements in O which are also unbounded, which may cause f to fail, and so you are doomed anyway.
Well, there are two ways you can cope with unsuitable input. You can either detect it early, and handle it gracefully, or you can ignore it and just let the code crash and burn. The appropriate choice depends on the situation of course (how likely are the unstuitable inputs, is the code intended for production use, etc). So I'll cover both alternatives separately.
I'll take the "crash and burn" approach first, since that's the one you appeared to be proposing.
If your algorithm A is capable of working with large numbers, you'll get an output set O which is always correct, but which may contain elements o' not supported by f. The result of such an element will be that f throws an error, saying that the input parameter is out of bounds. This makes it easy to see exactly what went wrong, and since the o':s are from the correctly computed O, their values should come as no surprise.
If the algorithm on the other hand uses an î which is actually less than one of the elements in I, an assumption upon the algorithm itself is based has been violated, and the contents of O might be anything. It could still contain values larger than L, in which case you'd still get the backtrace, but now potentially with a mystery value which wasn't supposed to be in O, or it could just be wrong, while still containing only legal values, in which case your program would keep running and you might not discover the error until much later, when it will require much head scratching to discover.
So the use of an incorrect î here will only make errors more difficult to find. A modified algorithm which does not require an î larger than all values in I is clearly preferable, and also easy to construct from A with simple transformations (although putting some more thought to it can produce something more efficient).
For the second option, you need to do a little more work. Firstly, you need to find out what L really is. It may be both larger and smaller than Pike.NATIVE_MAX. For example, the argument to the seek() function is bounded, but on some systems the bound is much larger than Pike.NATIVE_MAX. On the other hand, the first argument to the bind() function has a bound which is always smaller than Pike.NATIVE_MAX. Having constants for the actual bounds of built-in functions might be nice, actually. Secondly, you need to consider the transfer functor T, which is used to construct the elements in O from the elments in I. It might be identity, but it might also be something else. The crux is that the input values need to be checked against the value of L under T^-1, not L itself. Let's call this value L'. Now that you know L', you can test the inputs against L', and do some appropriate error recovery (print a message or whatever, possibly proceeding with the offending element removed from I). By putting this check before A, you can safely use L' as î, because the now the input of A isn't unbounded anymore, but rather has a domain-specific bound. Again, note that L' may very well be different from Pike.NATIVE_MAX, especially since you need to consider the Ln:s and Tn:s of all fn:s.
For a quick hack, you may of course chose to ignore the above and use an incorrect î (Pike.NATIVE_MAX, 100000, time() or whatever) fully knowing that it may cause weirdness, but plugging it as "an idiom" is in rather bad taste, IMO.
I realize I have been vague. I should have been more clear about that the input values have typically been through C functions limited to native integers to begin with. The canonical situation is the one I already mentioned, namely the size of something in memory, where Pike.NATIVE_MAX works well as an upper bound since it scales with the addressable memory (unless possibly if someone compiles his pike with very odd configure options, but I couldn't care less about that).
When I said "no fixed limits" I meant in that case that I won't make an effort to limit the input so that no overallocation of memory can occur. I just conclude that memory will run out long before Pike.NATIVE_MAX size is reached, and hence that choice of upper bound value is safe.
Doing a quick search through a large chunk of my code shows that the memory size limited domain pretty much covers my use of Pike.NATIVE_MAX. There is however one more case when I've chosen to use it: As a value larger than any maximum cache time value (in seconds) for things in a RAM cache.
That is a case where I think it serves well as an idiom: Anything larger would incur unnecessary runtime overhead, and anything smaller would raise a suspicion that there could be some relevance to the value as an actual limit in itself.
It's an example of a domain that "of course" is considerably smaller. Sure, I am strictly speaking making an error in that case: If someone would manage to the same process running for about one life time then cache entries could time out prematurely on a 32-bit architecture. Even so I think the clarity of the idiom outweighs that theoretical problem.
The size of a memory object is a bounded variable, yes (an example of a domain specific bound). However, even without "very odd configure options", the bound is not Pike.NATIVE_MAX, but more likely Pike.NATIVE_MAX*2, due to the sizes being unsigned. On a 64-bit system the difference is rather academic, but not so on a 32-bit system. Creating an object larger than 2GB is possible on many 32-bit systems.
It's still not really a problem in practice. Especially since Pike currently can't represent such sizes due to the integers used internally for size fields (INT32).
Hm. The size of a string is not a INT32 but a ptrdiff_t, but that's also a signed type. I wonder if that should be considered a bug... What will happen if you attempt to create a larger string?
For arrays, the fact that the size type is signed is not a problem in itself, since it counts elements and each element is more than 2 bytes large. However, the fact that it is hard-coded as 32 bit seems a little bit problematic. You should be able to create larger arrays than that on a 64-bit system...
Pike v7.8 release 116 running Hilfe v3.5 (Incremental Pike Frontend)
array(int) a = allocate(1<<32);
Bad argument 1 to allocate(). Integer too large to use as array size. Unknown program: allocate(4294967296)
Well, the problem is caught at least, but it would have been better if it had worked instead...
Yes, I agree the size types should be fixed. But I actually see some merit in limiting sizes to half the maximum address space: It's more likely to catch bugs that otherwise could cause the whole system to grind to a halt, and it avoids bignum overhead in the memory size domain.
I started working on an inotify module. I have to parse an inotify event struct which contains an int (aside from some uint32). for that int I need to know the size in bytes. thats why I was asking. It would maybe be helpful if the pike sscanf/printf had a modifier for the systems native int (32/64 bit).
Anyhow, I decided to do it inside the cmod now, its just much simpler in this case to use a cast.
best
arne
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum schrieb:
Ints in pike have arbitrary size. If you put in a larger number, they will grow. So using e.g. sscanf("%8c") will work fine even on a 32 bit machine.
I'm not sure if you should use Pike's int size for that - it's a compile time setting, and most systems can have Pike with 32 or 64 bit ints.
Isn't "int" on all linux systems 32-bit anyway?
Mirar @ Pike developers forum schrieb:
I'm not sure if you should use Pike's int size for that - it's a compile time setting, and most systems can have Pike with 32 or 64 bit ints.
Isn't "int" on all linux systems 32-bit anyway?
I guess you are right. I actually just dont know. Isnt there some 64bit hardware out there actually using 64bit int? If not, even better.
It's not so much a hardware thing as a compiler thing. Linux uses LP64*, so int's should be 32bit.
Is it possible to fing the systems size of an int from within pike? I have to do some parsing of binary data and dont want to do it inside a cmod.
An int on most platforms nowadays is 32 bits/4 bytes.
To determine some of the characteristics of the Pike runtime you can call Pike.get_runtime_info():
32-bit Pike 7.8 on x86: | > Pike.get_runtime_info(); | (1) Result: ([ /* 6 elements */ | "abi": 32, | "auto_bignum": 1, | "bytecode_method": "ia32", | "float_size": 32, | "int_size": 32, | "native_byteorder": 1234 | ])
64-bit Pike 7.8 on sparc: | > Pike.get_runtime_info(); | (1) Result: ([ /* 6 elements */ | "abi": 64, | "auto_bignum": 1, | "bytecode_method": "sparc", | "float_size": 64, | "int_size": 64, | "native_byteorder": 4321 | ])
64-bit Pike 7.8 on x86_64: | > Pike.get_runtime_info(); | (1) Result: ([ /* 6 elements */ | "abi": 64, | "auto_bignum": 1, | "bytecode_method": "default", | "float_size": 64, | "int_size": 64, | "native_byteorder": 1234 | ])
int_size and float_size above indicate the sizes of the C-level types INT_TYPE and FLOAT_TYPE.
thanks. Didnt find this one. I was looking for constants in Int module and the global scope.
arne
schrieb:
Is it possible to fing the systems size of an int from within pike? I have to do some parsing of binary data and dont want to do it inside a cmod.
An int on most platforms nowadays is 32 bits/4 bytes.
To determine some of the characteristics of the Pike runtime you can call Pike.get_runtime_info():
32-bit Pike 7.8 on x86: | > Pike.get_runtime_info(); | (1) Result: ([ /* 6 elements */ | "abi": 32, | "auto_bignum": 1, | "bytecode_method": "ia32", | "float_size": 32, | "int_size": 32, | "native_byteorder": 1234 | ])
64-bit Pike 7.8 on sparc: | > Pike.get_runtime_info(); | (1) Result: ([ /* 6 elements */ | "abi": 64, | "auto_bignum": 1, | "bytecode_method": "sparc", | "float_size": 64, | "int_size": 64, | "native_byteorder": 4321 | ])
64-bit Pike 7.8 on x86_64: | > Pike.get_runtime_info(); | (1) Result: ([ /* 6 elements */ | "abi": 64, | "auto_bignum": 1, | "bytecode_method": "default", | "float_size": 64, | "int_size": 64, | "native_byteorder": 1234 | ])
int_size and float_size above indicate the sizes of the C-level types INT_TYPE and FLOAT_TYPE.
pike-devel@lists.lysator.liu.se