I noticed the checkin of a new module the other day and was reminded that we're long overdue for a discussion of Pike distribution bloat. This has come up at the last two conferences, and we always "table" the discussion for later.
I think now is a good time to have it since we're not coming up on an impending new stable release. It seems to me that there are a fairly large number of modules distributed with Pike that have a corresponding low number of potential users. I'm not suggesting that the modules don't have merit, but am suggesting that perhaps the audience is so small that it becomes clutter to others. Does it really make sense to keep adding these modules to the core distribution when they could just as easily be distributed separately?
As it stands now, Pike distributions are 10MB, and take a painfully long amount of time to compile. I'm sure I'm not the only one who wonders why they have to configure and compile things like HTTPLoop and SDL... (just picking two that I'd remove, not that they're at the top of my list) I realize that there was a time when including things in the core was a necessary evil due to compilation difficulty, but that's not been the case for at least a year or two...
Any thoughts?
Bill
HTTPLoop* and SDL are two of the most used modules, So other examples are in order. Exclusions are a 7.7/7.8 question though, so it has no impact on the 7.6 release.
* Unless Roxen is using something else now.
It could be used by Protocols.HTTP.Server when it's availble. It's much faster than the above mentioned code.
I see no point in making the dist smaller just to make it smaller. Most modules are very small and quite useful (SDL belongs in this category).
Others are very small but perhaps not useful (HTTPLoop is a prime example).
My idea of a point would be to make it possible to have different release cycles of modules and the main pike, not to make the distribution smaller.
In the last episode (May 01), Mirar @ Pike developers forum said:
My idea of a point would be to make it possible to have different release cycles of modules and the main pike, not to make the distribution smaller.
How about something like GCC's distribution, where you have
Pike-7.6.tar.gz Pike-core-7.6.tar.gz Pike-modules-7.6.tar.gz
, where the first file is everything, the second would be just enough to build pike itself, and the third is the rest of the modules?
My vision is that Pike-modules wouldn't exist at all... if you wanted all of the modules, you could either start with Pike-full-7.6.tar.gz, which is pike plus all of the modules (similar to the way pike is currently distributed), or you could start with Pike-core-7.6.tar.gz or Pike-7.6.tar.gz, which would be the middle of the road that we've been batting around, with some of the more esoteric modules removed, and use monger to add the rest in...
Bill
On Sun, 1 May 2005, Dan Nelson wrote:
In the last episode (May 01), Mirar @ Pike developers forum said:
My idea of a point would be to make it possible to have different release cycles of modules and the main pike, not to make the distribution smaller.
How about something like GCC's distribution, where you have
Pike-7.6.tar.gz Pike-core-7.6.tar.gz Pike-modules-7.6.tar.gz
, where the first file is everything, the second would be just enough to build pike itself, and the third is the rest of the modules?
-- Dan Nelson dnelson@allantgroup.com
there are two degrees of seperation: one is whe way things are distributed and the other is the way things are maintained.
the main pike cvs tree is about code that is maintained by the pike team and whos copyright is owned by IDA.
modules.gotpike.org and other places have stuff mainained elsewhere and owned by others.
currently what is maintained together is also distributed together. however this does not have to be the case. the current debian packages are split up, and show that seperation is possible without changing the way things are maintained.
following this, seperation of the pike core would simply mean to split the pike download into seperate files where a user can pick and choose what he/she needs or wants. a minimal pike that only includes the bare necessities to run pike code would certainly be nice to have.
otoh, pike developers rely on a lot of things being available. in the #pike channel on irc i lately hear a lot of people who like pike exactly because its distribution is so inclusive and they do not have to bug the user to get certain modules installed befor their code can run.
as a result, if the distribution gets split up it will be important to make it easy to identify missing components and make it easy to install them (preferably though the monger)
in the short run splitting up things can only mean to provide additional download targets. which means additional work. not that i want to discurage anyone, perhaps the whole thing is as easy as adding a few make targets that will build several pike packages instead of a single one.
to conclude: instead of arguing which modules should be in the core, the more interresting question is, how to we actually implement the split. only after that is solved we can start asking how should the split packages on the pike site look like.
the following would be nice: a make target that creates split packages according to a simple config file which defines the packages and what goes inside:
could also be in makefile format: all: (well, everything :-) core: the pike binary itself minimal: core, stdin(the bare core) default: minimal, gtk, ... reccomended: suggested: ...
and so on, then everyone can group packages to their liking and the question of what does into the default is limited to those packages which are actually distributed from pike.ida.liu.se and which go into distributions.
(later, some standard for that could say: if you package pike for the general public it should contain the following if the package wants to call itself a full pike install, if you do not have these you must warn the user that your package is incomplete, but that's future, lets solve the technical problems first)
greetings, martin.
Good analysis.
The idea of distribution in tiny bits isn't unwelcome, as has been seen, it just needs some work, care and developer love. Providing that is most welcome, aspiring to inject it into the non-converts will most likely fail, judging by evidence of historical similar efforts. A few µPike make targets, a few Pikefarm clients who run them and some eager developers who monitor them will probably do wonders. Other notes (not arguments, just factors):
Development support:
* From a maintenance point of view, base Pike has rather well setup framework for testing, detection and follow-up on problems that creep up. This makes it very comfortable for core developers to stay with the infrastructural support (perhaps most prominently Pikefarm and Code Librarian; Bug Crunch might qualify too) provided for the main repository.
Driving forces:
* Reasoning from a core developer's perspective, bugfixes and adding capabilities is powered much by self interest in various forms. I'm sure similar forces exist to power separating Pike into tiny bits of distribution (Debian does by power of policy, with some success), but little of it seems present in present Pike maintainers.
Retracing the other aspect of the issue (no real news brought up here, but I think there still is some merit to the subject):
to conclude: instead of arguing which modules should be in the core, the more interresting question is, how to we actually implement the split.
The question "What is (constitutes) Pike?" actually _is_ crucial to which choices we make in distributing Pike (ourselves), for the same reasons you mentioned - as long as we mind our image of how you need to go about to run a Pike application. Some core developers still do, some don't. I like Bill's angle of naming the close-to-today's version "Pike", the ultra-slim version "core" and a massive be-all-end-all "full".
A Debian installation (of the package "pike") does not meet the same standards a new pike.ida.liu.se Pike release does today (since some of the people who perform the release cycle work won't consider the work done until it supports SDL, mysql and to have a working pike -x module to name a few criteria). Users working solely within the Debian package management system when running Pike applications won't notice this, as long as they only run packaged pre-built applications who pull in the parts of Pike they lack by power of debian sub-package dependencies (pike-mysql et al).
Running an application that requires some aspect of Pike that the Pike on your system doesn't have does not fill (pike -x module is broken, there is no GTK or SDL module, for instance) would ideally fail quite gracefully, pointing out what is wrong. Core Pike may not excel in this department, but it is very easy to make that situation worse when stripping out hereditarily assumed-always-present modules. (Especially since we don't have the Python tradition of starting every file with a manually maintained list of component dependencies. OTOH our compiler is probably good enough not to need any such help.)
The fact that more or less _everyone_ that installs xemacs installs the sumo package too should tell us something about how useful separately distributes modules are, though.
The main advantage of pike is that pike is pike, given a specific pike version you can be fairly certain that the modules are included (such as bignum, image handling etc).
I can see that it might be advantageous from a maintenance perspective to split up pike in multiple modules. Doing so requires a module owner for each module, that actually maintains it, though.
Currently the core developers are the only ones developing pike actively. I don't see that changing just because each module is split into it's own CVS repository with separate releases.
I don't believe that anyone's suggested moving anything into their own CVS repositories, at least not right away. And I think you're missing the point. There won't be a separately distributed "modules" package. It's only a matter of how many modules are included with the pike you download.
Thought needs to be given to the fact that you (I don't presume to include myself in this group, yet :)) are all a very special class of Pike users. As someone has already pointed out, you write the code because you have a particular need for it. Certainly then, the 10 or 15 users that have contributed all of the code will find their contributions indispensable. However, the non-developer users have told us repeatedly, that they could care less about some of these modules. Why should we (I include myself here because I've added bloat to the modules myself in the past) be so self absorbed to think that they really do want these things even though they'll never use them? There are users who'd like to use pike in embedded applications. They don't need (and don't have room for) all of the modules that ordinary users have use for. We could take proactive steps to make life easier for them by giving them a download that's "pre-neutered" rather than forcing them to figure out which modules are absolutely necessary.
Along a similar line, my bet is that a general user will probably not have a use for (aside from the pike-required modules like Gmp and Crypto) much beyond database access, XML parsing and maybe the Image module and GTK. That's more than a lot of other languages bundle, and I if a user really wants to write their own BitTorrent client, or work with their Satellite video, they can use monger to add that functionality in about 20 seconds.
As to the "pike users expect all this stuff", I would respond by saying that if they're smart enough to use DVB and GL and all that other stuff, they're certainly capable of reading the download page for 7.8 to see that they'll have to either download -full or use monger to install those things they need.
As to the module maintenance sorts of issues, I think they're completely managable, even without a designated maintainer. Even before that, the "first step" should be to make modules available separately outside of the Pike distribution, as well as through Monger. This will also provide a solution to the currently clumsy solution to recompiling a module when it doesn't catch your libraries (the mysql glue comes to mind here). Similarly, we should be able to start making -core distributions without too much heartache right away.
Making life easier for users will enhance the image, usability and user population of Pike. I realize that's not something some here care about (which is fine,) but for those that do, we should be doing everything reasonable to satisfy users.
Bill
On Mon, 2 May 2005, Per Hedbor () @ Pike (-) developers forum wrote:
The fact that more or less _everyone_ that installs xemacs installs the sumo package too should tell us something about how useful separately distributes modules are, though....
I really think that modules written in pike can be always included, as they don't really use any space on disk or resources at compilation time at all. Nor does they slow anything done while running pike.
And pike is not _really_ bloated, unless you refer to the installed size, and that is mainly due to the default -g compile of pike.
Compare it to the other scripting languages. And where is the disk-space crisis anyway?
python: /usr/lib/python2.3: 102Mb Executable: 1M (including libpython.so)
perl: /usr/lib/perl5: 45M Executable: 1.5M (including libperl.so)
pike: /usr/local/pike/7.6.6/lib: 18M (stripped -g from .so-files) Executable: 1.7M (-g gone)
We need to work on configure times, though. The largest gain (about 400% or so) can be had by generating configure from an old version of autoconf.
On Mon, May 02, 2005 at 07:20:01AM +0000, Johan Sundström (Achtung Liebe!) @ Pike (-) developers forum wrote:
A Debian installation (of the package "pike") does not meet the same standards a new pike.ida.liu.se Pike release does today (since some of the people who perform the release cycle work won't consider the work done until it supports SDL, mysql and to have a working pike -x module to name a few criteria).
debians packaging problem do not relate to the issue what modules get packaged.
what's wrong with debians mysql support?
Running an application that requires some aspect of Pike that the Pike on your system doesn't have does not fill (pike -x module is broken, there is no GTK or SDL module, for instance)
huh? please check again! GTK and SDL are there.
i was citing debian to show that seperating out the core is a packaging and not a development issue. so this is going off on a tangent.
there was mention of someone else packaging pike for debian. where are those packages? i'd like to use and test them (they don't need to be official debian packages to be usable)
greetings, martin.
If we do distribute modules separately pike -x module _must_ work. It does not work on debian.
And SDL and GTK is not included by default.
i did not say anything about pike -x module, i am talking about how the modules are packaged.
and for GTK and SDL not being installed by default, that is a completely different thing from them not being available.
why are we now bashing debians packaging?
i am NOT saying that pike should generally be packaged like on debian. i am trying to point out that it is possible to do finegrained packaging without changing the current pike repository.
pikes default packaging is one extreme, and debians packaging is the other. please try to see what we can learn from this instead of dismissing it just because you feel that GTK and SDL should be in the default.
which packages are in the default is quite academic as long we don't even have the tools to even split up default and non-default while debian has them, and solving the problem on debians side is as simple as adding GTK and SDL to the pike default dependencies while changing caudium and roxen to depend on pike-core and not on pike. (because they do not need GTK or SDL)
greetings, martin.
Actually, debian does not go to the extreme in the other direction, it's possible to split off the pike-level modules too, as has been discussed here, and all C-modules. Debian only have some 10 packages (I don't remember exactly how many).
I don't mind the debian way of packaging, btw, it's quite usuful. What I dread is that, say, Protocols.HTTP will be removed (and not nessesarily that specific module, perhaps Calendar instead) from the default distribution, and suddenly most 'cool' small pike-hacks that can be downloaded will also require a secondary download of pike-modules, just as is already the case for perl and python.
It really raises the bar for 'normal' users as far as running pike-scripts is concerned.
And actually, one of the top-ten widely distributed pike-applications I know of (AIDO) requires SDL, GL, MySQL, HTTP and much more.
I'm not particularly worried about Debian (or any other distribution) if only the packages that they package worked. They haven't split things up in that many packages, and if a Pike application is installed as a package the missing modules will be automaticly pulled.
In my nightmares we have a system like perl where you have to pull five modules from CPAN to run any random application, and then discover that the modules are not compatible with each other and the application can't handle the new versions you get by default. Much like modules included in the Linux kernel get fixed if they are included in Linus tree and die in ABI hell if not; separate releases of modules is going to add pain.
yes, that is a very good point, that incidently has recently brought us some new pike users who are rather active perl users.
if you want to learn what perl users like about pike then join us on irc. :-)
greetings, martin.
it doesn't? try a few more times, nameserver rotation should get you a working one. or try irc.freenode.net or ve.symlynx.com, they have gateways.
greetings, martin.
I don't mind the debian way of packaging, btw, it's quite usuful. What I dread is that, say, Protocols.HTTP will be removed (and not nessesarily that specific module, perhaps Calendar instead) from the default distribution, and suddenly most 'cool' small pike-hacks that can be downloaded will also require a secondary download of pike-modules, just as is already the case for perl and python.
Believe me, I hate having to download 10 perl modules just to get anything done. I'm not suggesting that we should gut the module tree by default. Rather, I'm suggesting that perhaps we need to clean house a little bit.
It really raises the bar for 'normal' users as far as running pike-scripts is concerned.
What about an enhancement to the resolver that would ease identifying and fetching missing modules:
pike --module_check myapp.pike
You seem to be missing the following modules:
Foo.bar Gazonk.client
Would you like to try to locate them?
...
And actually, one of the top-ten widely distributed pike-applications I know of (AIDO) requires SDL, GL, MySQL, HTTP and much more.
I have trouble buying that argument. Besides, is it so hard to add "pike -x monger" commands to your installation script? I get that you're convinced that everyone needs SDL and GL. I'm not. Perhaps we can just move on?
Bill
Believe me, I hate having to download 10 perl modules just to get anything done. I'm not suggesting that we should gut the module tree by default. Rather, I'm suggesting that perhaps we need to clean house a little bit.
Well, that's more or less the same thing, isn't it?
pike --module_check myapp.pike
You seem to be missing the following modules:
Foo.bar Gazonk.client
Would you like to try to locate them?
This is the situation i dread, yes.
I have trouble buying that argument. Besides, is it so hard to add "pike -x monger" commands to your installation script? I get that you're convinced that everyone needs SDL and GL. I'm not. Perhaps we can just move on?
No, it's not needed by everyone. On the other hand, as an example SDL is a whopping total of 2800 lines of CMOD-code, most of which is documentation, and it takes about 1.5 seconds to compile on my computer with optimizations (just checked, arguments -O3 -mcpu=pentiumpro).
My point of view is that it's pointless to remove things just because it's 'clean' to have a small distribution. It will not make pike easier to maintain if we move things from the main repository, quite the oposite, actually, and the download will be some 100K smaller if we remove SDL _and_ GL.
It is OK not to include them in binary packages (such as the debian packages) since the dependencies is extreme for those modules (SDL depends on GL and GTK, GTK depends on half the world), but removing them from source distributions seems rather pointless.
Most of the module size (for almost all modules) is the refdoc and configure scripts, btw. Removing the per-C-module 'configure' scripts would shrink the pike distribution quite handily. And having a separate 'pike-refdoc' package (can be done by post-processing) would shrink pike a lot too.
I can agree that a cleanup might be needed, though, but I really don't want a split.
I only add things I consider to be at least marginally useful to a random program writer to Pike. I have added the AIDO modules to aido directly, as an example (MPlayer bindings etc), not to Pike.
And then some modules are very server or client oriented.
But I think it's not a good idea to have a pike-server and pike-client distribution, though. As an example, client applications generally don't need Shuffler, Databases (mostly), Java, Kerberos, Yp or Fuse.
Fuse is useful for server-applications on Linux, btw, you can very easily provide a normal filesystem view of your data, with editing and all. Saves a lot of time otherwise spent writing UI:s (eg, edit your data and metadata for your files in your HTTP-based content manager with emacs or word or whatever)
(It's also very pleasing to write a new filesystem in 10 minutes. :-))
Server applications tend to avoid GTK, Gnome, SDL, GL and the other UI modules.
As for cleaning up, I think these modules no longer have to be kept in Pike: Pipe (Shuffler replaces pipe, pipe kept for compatibility) GLUT (SDL is much better, and provides similar services) Perl (But it's sort of cute to emped Perl in Pike) SSleay (who uses this when the built-in SSL module works better)
And then we have the modules with very interresting names: _Roxen (this is mostly HTTP utility functions) spider (XML parser, compat HTML parser)
Running an application that requires some aspect of Pike that the Pike on your system doesn't have does not fill (pike -x module is broken, there is no GTK or SDL module, for instance)
huh? please check again! GTK and SDL are there.
The example was of the situation where you install a pike and did not pick the sumo variant, so you ended up with a pike minus some parts in other packages (in this case, installing a pike sans SDL/GTK/Mysql).
Debian is relevant to the discussion mostly as an example of what can be done and which problems might arise. The issue above is that the resolver that the Debian smallish pike uses does not help the user to add stripped components. We might want to consider that to make Monger better equipped to handle the issue.
Monger and distribution specific packaging is not really related to one another.
On debian you want to run apt-get install pike-mysql, not pike -x monger install mysql.
Regardless of how parts of Pike were stripped away though, you want to help users find out about what their pike lacks to get an application running.
Regardless of how parts of Pike were stripped away though, you want to help users find out about what their pike lacks to get an application running
Yes. That's rather unrelated, though.
On debian (and hypotetical other systems using similar splitting) we want to know about the names of the packages, otherwise we want to print the name of the missing module and then abort (and not write 10000 other follow-up error messages)
I think we actually agree on the subject here. My only point was that debian does less to help than it ought to, and that we might try to do better.
pike-devel@lists.lysator.liu.se