unsigned int optimizations?

List overview All Threads
Download

newer

older

optimization bug?

pike 7.8 cvs browser broken?

Martin Bähr

4 Oct 2008 4 Oct '08

1:50 p.m.

hi,

i just wondered if pike does any optimization on int(0..x) i mean any case where x is below 4294967296 should fit into an unsigned int, so there would be no need to transition to bignum for these. and int(0..255) fits into a byte. does or can pike make use of that?

greetings, martin.

Show replies by date

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

4 Oct 4 Oct

2:05 p.m.

Integers which can fit in an INT_TYPE are stored as such, not as bignums. This applies also to results of computations involving bignums. Smaller integer types are not used, as there would be no point; it would not save any memory because they are put in a svalue which has a fixed size, and it would generate slower code on most systems. (In strings smaller integer types are used as appropriate, because there it actually saves memory.)

Martin Baehr

2:17 p.m.

On Sat, Oct 04, 2008 at 02:05:02PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

Integers which can fit in an INT_TYPE are stored as such, not as bignums.

yes, but there is a difference between signed and unsigned int. an integer larger than 2147483648 and smaller than 4294967296 does not fit into INT_TYPE

but it would very well fit into an unsigned int type.

if pike handles an array of a few thousand of these the use of bignum vs unsigned int type could make a difference.

...

(In strings smaller integer types are used as appropriate, because there it actually saves memory.)

right, i was thinking of something like array(int(0..255)) because this could be saved as a string.

greetings, martin.

-- cooperative communication with sTeam - caudium, pike, roxen and unix offering: programming, training and administration - anywhere in the world -- pike programmer working in china community.gotpike.org unix system- iaeste.(tuwien.ac|or).at open-steam.org administrator caudium.org is.schon.org Martin Bähr http://www.iaeste.or.at/~mbaehr/

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

2:25 p.m.

...

yes, but there is a difference between signed and unsigned int. an integer larger than 2147483648 and smaller than 4294967296 does not fit into INT_TYPE

It does on a 64-bit machine.

In short, adding a new pike type for "unsigned INT_TYPE" would create lots of work for rather small benefits. You can still use 4294967296 distinct integer values on a 32-bit machine without involving bignums. If you have a use-case where you would benefit from unsigned INT_TYPEs, I can show you how to rewrite it to benefit from signed INT_TYPEs instead.

Martin Baehr

2:53 p.m.

On Sat, Oct 04, 2008 at 02:25:03PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

In short, adding a new pike type for "unsigned INT_TYPE" would create lots of work for rather small benefits.

you mean on the C level? since on the pike level that type already exists. anyways, that makes sense and is good enough for me.

...

If you have a use-case where you would benefit from unsigned INT_TYPEs, I can show you how to rewrite it to benefit from signed INT_TYPEs instead.

if i can rewrite it why can't i let pike do it for me?

just to keep it in line with: "All so that you morons can write crappy code and still have it running fast as a breeze.... "

how would you rewrite an array of ipaddresses for example?

something like this?

class IPAddressList { array(int(-2147483648..2147483648)) ipaddresslist = ({}); int `[](int index) { return ipaddresses[index]+2147483648; } }

greetings, martin.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

3:05 p.m.

...

if i can rewrite it why can't i let pike do it for me?

Because then we would have to rewrite all of pike as opposed to some specific operations in your program. Integers are used in a lot of places.

...

just to keep it in line with: "All so that you morons can write crappy code and still have it running fast as a breeze.... "

It still won't run fast as a breeze if you go over 4294967295 (on a 32-bit machine). There are rather few cases where numbers in the range 2147483648..4294967295 are substantially frequent in the domain.

...

how would you rewrite an array of ipaddresses for example?

something like this?

class IPAddressList { array(int(-2147483648..2147483648)) ipaddresslist = ({}); int `[](int index) { return ipaddresses[index]+2147483648; } }

Your example is incomplete. Why would you want the index operator of an "IPAddressList" to return an integer in the range 0..4294967295? That doesn't make any sense. Without seeing the whole program, I can't know what the best way to rewrite it is, but I imagine a more natural and useful version of the class would be this:

class IPAddressList { array(int) ipaddresslist = ({}); string `[](int index) { int n = ipaddresses[index]; return sprintf("%d.%d.%d.%d", (n>>24)&255, (n>>16)&255, (n>>8)&255, n&255); } }

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum

3:05 p.m.

...

if pike handles an array of a few thousand of these the use of bignum vs unsigned int type could make a difference.

The tradeoff is with the added complexity of all parts on the C level that handle integers of any sort visible to the language (thousands of places in the code). You don't really want that added burden everywhere for the (relatively) small feature in conserved storage.

Some custom applications that handle tons and tons of numbers in that range and don't want to adapt their logic to use signed integers to get the range right may well benefit from using C, for instance, and unsigned integers, but that does not make it the right decision for a much larger language like Pike.

Martin Baehr

3:47 p.m.

On Sat, Oct 04, 2008 at 03:05:02PM +0000, Johan Sundstr�m (Achtung Liebe!) @ Pike (-) developers forum wrote:

...

Some custom applications that handle tons and tons of numbers in that range and don't want to adapt their logic to use signed integers to get the range right may well benefit from using C, for instance, and unsigned integers

i like that answer, thank you :-)

greetings, martin.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

3:35 p.m.

Adding builtin support for unsigned native integers would mean adding another core runtime type, and then add another special case to every single place that handles native integers. That's a lot of places. I'll bet pike as a whole would get significantly slower due to the general code bloat.

Btw, I think we've had this discussion before, when Alexander Demenshin proposed the same thing. (Where is he now?)

Martin Baehr

3:42 p.m.

On Sat, Oct 04, 2008 at 03:35:01PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

Adding builtin support for unsigned native integers would mean adding another core runtime type

isn't int(0..4294967295) good enough for that?

...

Btw, I think we've had this discussion before, when Alexander Demenshin proposed the same thing. (Where is he now?)

we had the discussion in terms of adding a new type, yes, but i am specifically asking about the feasability of internaly optimizing int(0..4294967295) without changing anything on the pike level.

i am happy with no as an answer, i am just trying to understand all the details and alternative solutions so that i can pass them on to people interested in pike who are asking about this.

greetings, martin.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

3:50 p.m.

...

isn't int(0..4294967295) good enough for that?

That is a compiletime type, not a core runtime type. The runtime values are stored as one of the core runtime types T_INT or T_OBJECT, depending on if it fits in an INT_TYPE or not. What you suggest would require a new core runtime type T_UINT, and support for it in basically all C code which supports T_INT.

Martin Baehr

8:50 p.m.

ahh, that explains a lot, thank you!!

greetings, martin.

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum

4 p.m.

...

...
Adding builtin support for unsigned native integers would mean adding another core runtime type

isn't int(0..4294967295) good enough for that?

You think on the pike language "user" level, but pike has to implement it on the C level, where compile time types are much more of an issue.

If you want to understand it better, you might try working in C a bit, which is a completely different reality, devoid of the dynamic typing that makes int(0..255), int(-128..127), and so on, feel more like the mere changing of a single declaration line rather than something that has big implications on every operation on the value. In pike code we never have to do any bounds checking to cater the type containers -- which is one of the comforts not offered the poor C level programmers.

(Changing languages once in a while is a good reminder of what you like in your favourite one. And, occasionally, what it lacks. :-)

Martin Baehr

8:59 p.m.

On Sat, Oct 04, 2008 at 04:00:02PM +0000, Johan Sundstr�m (Achtung Liebe!) @ Pike (-) developers forum wrote:

...

(Changing languages once in a while is a good reminder of what you like in your favourite one. And, occasionally, what it lacks. :-)

indeed it is, unfortunately it takes a lot of work, even after 4 months in a python job i haven't written more than a handfull of lines of code so it will take a while before i get some reasonable experience.

greetings, martin.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

4:40 p.m.

...

i am happy with no as an answer, i am just trying to understand all the details and alternative solutions so that i can pass them on to people interested in pike who are asking about this.

To view this from a more general perspective, Pike offers a limited set of runtime data types: int, float, string, array, mapping, and multiset(*). It doesn't provide a plethora of different integer types like C, just as it doesn't provide a multitude of hash table variants like HashMap, TreeMap, LinkedHashMap, etc in Java. There is only one integer type and one mapping type. They are designed to be reasonably efficient for general use, but you can't modify them for your specific needs.

Since these types are built directly into the language with special syntax (e.g. "([ ])"), special types, special behaviors in operators, etc, they are very convenient to use. That's both positive and negative, the negative side being that you can't add new builtin types without a lot of effort. You can make objects that mimics them in various situations by implementing `+, __hash, etc, but those objects never become quite the real thing, and they'll be slower.

*) program and type are also runtime types, but they aren't data containers in the sense I'm talking about here, and object is a "catch-all" for everything else.

Martin Baehr

9:05 p.m.

On Sat, Oct 04, 2008 at 04:40:03PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

just as it doesn't provide a multitude of hash table variants like HashMap, TreeMap, LinkedHashMap, etc in Java.

those are not classes and thus comparable with pike classes?

greetings, martin.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

5 Oct 5 Oct

2:10 p.m.

Well, if you want them in pike they have to be objects.

My point is that people might argue that mapping should have all those variants, just as they argue that int should have an unsigned variant (or that string should have an unshared variant, or that float should have a longer variant, or whatever).

All such arguments don't fit very well with the very deep integration of the basic data types. One could possibly argue for modifying (slightly) the behavior of one of these types, but adding another one is basically a no-no.

Worth noting that it wouldn't necessarily have to be that way, though. The pike internals could have been built with a different architecture so that all referenced data types (i.e. everything except native integers and floats) are basically handled as objects, and all the special behaviors of the type system and operators etc would be controlled by properties in those objects (like in any pure object oriented language).

That would have been pretty, and possibly lessen the code bloat in the core parts. But what we're talking about then is such a fundamental change that almost nothing would remain the same on the C level.

Martin Baehr

6 Oct 6 Oct

4:24 a.m.

On Sun, Oct 05, 2008 at 02:10:02PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

Well, if you want them in pike they have to be objects.

i meant: aren't they objects in java too?

...

My point is that people might argue that mapping should have all those variants, just as they argue that int should have an unsigned variant (or that string should have an unshared variant, or that float should have a longer variant, or whatever).

All such arguments don't fit very well with the very deep integration of the basic data types. One could possibly argue for modifying (slightly) the behavior of one of these types, but adding another one is basically a no-no.

ahh, i see where this is going. essentially i think it can be argued that the basic types are designed to be simple and optimized, and having more such types would reduce that.

...

Worth noting that it wouldn't necessarily have to be that way, though. The pike internals could have been built with a different architecture so that all referenced data types (i.e. everything except native integers and floats) are basically handled as objects, and all the special behaviors of the type system and operators etc would be controlled by properties in those objects (like in any pure object oriented language).

a pure object pike would indeed be nice :-) but wouldn't that cost optimization?

greetings, martin.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

9:20 p.m.

...

i meant: aren't they objects in java too?

Yes, but the reason I mentioned them is that they are all mapping-like data containers. In Java, C++ and many more you can choose which mapping implementation you want. In Pike you can't (unless you're prepared to take the cost of emulating a mapping with an object, of course).

...

a pure object pike would indeed be nice :-) but wouldn't that cost optimization?

Maybe, maybe not. It's hard to say. It's mostly a matter of doing the same things but in different places: Instead of having a big global predef::`+ which knows how to add all these types in various combinations, it'd instead be the job of each type to know how to get itself added.

Mirar ＠ Pike developers forum

5 Oct 5 Oct

6:55 p.m.

x<18446744073709551616 on my machines...

6115

Age (days ago)

6117

Last active (days ago)

pike-devel@lists.lysator.liu.se

19 comments

6 participants

tags (0)

participants (6)

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Martin Baehr
Martin Bähr
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Mirar ＠ Pike developers forum