In the poll() branch, files/file.c:file_peek() raises the minimum poll time from zero to one millisecond. This is wreaking havoc on poll-performance. Is there any reason to do so?
I.e. I was trying to refactor _PGsql and downsize it is much as possible, i.e. trying to use Pike native modules instead of direct system calls, and it seems to work rather nicely, except for this poll-mess. I.e. I could take out the special poll code, if the standard files/file.c module would not introduce that extra delay.
This is what we're talking about:
diff --git a/src/modules/files/file.c b/src/modules/files/file.c index 6daf5c3..642f951 100644 --- a/src/modules/files/file.c +++ b/src/modules/files/file.c @@ -844,14 +844,20 @@ static void file_peek(INT32 args) struct pollfd fds; int timeout; timeout = (int)(tf*1000); /* ignore overflow for now */ +#if 0 if (!timeout) timeout = 1; +#endif fds.fd=FD; fds.events=POLLIN; fds.revents=0;
- THREADS_ALLOW(); - ret=poll(&fds, 1, timeout); - THREADS_DISALLOW(); + if(timeout) { + THREADS_ALLOW(); + ret=poll(&fds, 1, timeout); + THREADS_DISALLOW(); + } + else + ret=poll(&fds, 1, 0);
if(ret < 0) {
It's a 40% speed difference caused by extra latency in the pgsql driver case.
This is the patch in question, it is simple, reduces latency *a lot* (and improves pgsql.pike performance by a factor of 2 or so; I could imagine the boost is similar in other I/O-type of applications):
From 7601bd31847562755a1688a99aa2bdc2a340414f Mon Sep 17 00:00:00 2001
From: Stephen R. van den Berg srb@cuci.nl Date: Sat, 26 Jul 2008 12:51:23 +0200 Subject: Decrease latency in zero-timeout polls/selects
--- src/modules/files/file.c | 23 ++++++++++++++--------- 1 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/src/modules/files/file.c b/src/modules/files/file.c index 6daf5c3..a3e830e 100644 --- a/src/modules/files/file.c +++ b/src/modules/files/file.c @@ -844,14 +844,17 @@ static void file_peek(INT32 args) struct pollfd fds; int timeout; timeout = (int)(tf*1000); /* ignore overflow for now */ - if (!timeout) timeout = 1; fds.fd=FD; fds.events=POLLIN; fds.revents=0;
- THREADS_ALLOW(); - ret=poll(&fds, 1, timeout); - THREADS_DISALLOW(); + if(timeout) { + THREADS_ALLOW(); + ret=poll(&fds, 1, timeout); + THREADS_DISALLOW(); + } + else + ret=poll(&fds, 1, 0);
if(ret < 0) { @@ -876,8 +879,6 @@ static void file_peek(INT32 args) fd_set tmp; struct timeval tv;
- tv.tv_usec=1; - tv.tv_sec=0; fd_FD_ZERO(&tmp); fd_FD_SET(FD, &tmp); ret = FD; @@ -887,9 +888,13 @@ static void file_peek(INT32 args)
/* FIXME: Handling of EOF and not_eof */
- THREADS_ALLOW(); - ret = fd_select(ret+1,&tmp,0,0,&tv); - THREADS_DISALLOW(); + if(tv.tv_sec || tv.tv_usec) { + THREADS_ALLOW(); + ret = fd_select(ret+1,&tmp,0,0,&tv); + THREADS_DISALLOW(); + } + else + ret = fd_select(ret+1,&tmp,0,0,&tv);
if(ret < 0) {
Stephen R. van den Berg wrote:
This is the patch in question, it is simple, reduces latency *a lot* (and improves pgsql.pike performance by a factor of 2 or so; I could imagine the boost is similar in other I/O-type of applications):
I have to correct myself. Apparently my refactoring of the pgsql driver made the difference less.
I.e. without the patch there is an 80% chance of being 22% slower, and a 20% chance of having roughly the same performance. With or without the patch, the pgsql driver still beats the old postgres driver by a landslide.
Yes, it sounds like a very good idea to get rid of a deliberate sleep in peek(). But it must have been put there for a reason - some digging(*) reveals that it was introduced long ago:
revision 1.109 date: 1998/07/10 18:58:55; author: grubba; state: Exp; lines: +5 -5 Made file_peek() somewhat more paranoid.
Not the most illuminating message. Maybe Grubba can recall something more about this? Otherwise I suggest we remove the timeout in the next dev branch to let it brew there for a while. It was probably a kludge to work around a bug in some old poll implementation on a strange OS that's no longer used anyway.
*) Tried out git for this. git diff and blame are fast and works nicely, but is there some gui tool to do this kind of thing even more conveniently? I tried to use "git-gui blame" but couldn't make it go past the latest change.
*) Tried out git for this. git diff and blame are fast and works nicely, but is there some gui tool to do this kind of thing even more conveniently? I tried to use "git-gui blame" but couldn't make it go past the latest change.
What is the repository URL? I'd like to try this a little myself.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
*) Tried out git for this. git diff and blame are fast and works nicely, but is there some gui tool to do this kind of thing even more conveniently? I tried to use "git-gui blame" but couldn't make it go past the latest change.
What is the repository URL? I'd like to try this a little myself.
git clone git://git.cuci.nl/pike
to get started, the repository trails pikefarm by an hour or so at most.
git clone git://git.cuci.nl/pike
Ok, that seems to give me 7.7. How do I get for example nt-tools or 7.0?
How do I get bin/rsqld.pike from 2004-04-24?
How do I use log to find that src/modules/_math was originally called src/modules/math?
chiyo:/tmp/pike% git log --full-diff -M -C --until=1999-02-01 src/modules/_math commit abca4daf0d9e7fd4965f63e5dc45c3173cc74230 Merge: ab24694... c35e2fe... Author: Fredrik Hübinette (Hubbe) hubbe@pike.ida.liu.se Date: Fri Jan 1 01:03:35 1999 +0000
merged some fixes from 0.6 chiyo:/tmp/pike%
Ok, but what was actually changed in this commit? And what about the split 1998-12-21 which also modified src/modules/_math (by renaming it from src/modules/math)? That is also before 1999-02-01...
chiyo:/tmp/pike% git log --full-diff -M -C --until=1998-12-20 src/modules/math fatal: ambiguous argument 'src/modules/math': unknown revision or path not in the working tree. Use '--' to separate paths from revisions chiyo:/tmp/pike%
So how do I specify a repository path rather than a working tree path then?
On Mon, Jul 28, 2008 at 01:40:03PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
chiyo:/tmp/pike% git log --full-diff -M -C --until=1999-02-01 src/modules/_math commit abca4daf0d9e7fd4965f63e5dc45c3173cc74230 Merge: ab24694... c35e2fe...
Ok, but what was actually changed in this commit?
you hit a merge commit, for those the diff display is usually supressed, you can get it with git show abca4daf0d9e7fd4965f63e5dc45c3173cc74230
not sure why there is only one result.
chiyo:/tmp/pike% git log --full-diff -M -C --until=1998-12-20 src/modules/math fatal: ambiguous argument 'src/modules/math': unknown revision or path not in the working tree. Use '--' to separate paths from revisions
So how do I specify a repository path rather than a working tree path then?
as the message says, use --: git log --full-diff -M -C --until=1998-12-20 -- src/modules/math
greetings, martin.
you hit a merge commit, for those the diff display is usually supressed, you can get it with git show abca4daf0d9e7fd4965f63e5dc45c3173cc74230
I assume "++" in the diffs mean "added with history"? How do I see where it is added from? The "---" side is shown as "/dev/null".
Also, that it should list this commit at all is strange, because the commit "merged some fixes from 0.6" by hubbe on 1999-01-01 did not touch the math module at all. It modified the following files:
NT/tools/lib.pike NT/tools/rntcc NT/tools/sprshd NT/init_nt src/modules/Perl/configure.in src/modules/files/efuns.c src/modules/files/file.c src/modules/spider/xml.c src/configure.in src/fdlib.c
Is this a result of misinterpreting the first CVS commit after the split as the split itself? I think this should be fixed. I have a list of good dates to insert synthetic commits for all the splits.
as the message says, use --: git log --full-diff -M -C --until=1998-12-20 -- src/modules/math
Ok, that works. And if I omit the --until, it stops on the same commit as when I log _math (but from the other direction, so to speak), which kind of makes sense. But I still can't get log or show to actually display the rename as such.
Blame output seems ok for src/modules/_math/math.c at least.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Is this a result of misinterpreting the first CVS commit after the split as the split itself? I think this should be fixed. I have a list of good dates to insert synthetic commits for all the splits.
What is the point of inserting synthetic commits for the splits? I.e. it's not clear to me if it solves anything.
The splits contain changes to the repository (renames and deletes) which are not CVS commits but done directly in the cvsroot. So if you check out 7.2 and 7.3 from immediately after the 7.1 split, they will differ (7.2 will contain lib/modules/String.pmod, but 7.3 will contain lib/modules/_String.pmod instead).
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
The splits contain changes to the repository (renames and deletes) which are not CVS commits but done directly in the cvsroot. So if you check out 7.2 and 7.3 from immediately after the 7.1 split, they will differ (7.2 will contain lib/modules/String.pmod, but 7.3 will contain lib/modules/_String.pmod instead).
I see. Well, I'll look into that, if I find differences, I'll create fake splitcommits.
On Mon, Jul 28, 2008 at 01:00:02PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
git clone git://git.cuci.nl/pike
Ok, that seems to give me 7.7. How do I get for example nt-tools or 7.0?
git branch -a to see all branches,
git makes a difference between remote and local branches. remote branches are used to track the branches from the remote repo (designated as origin), local branches are where you make your own changes. to update (akacvs/svn up) you merge the remote branch.
git by default checks out the active branch hence you get a local branch for that. to get other local branches run git checkout -b nt-tools origin/nt-tools
greetings, martin.
How do I get bin/rsqld.pike from 2004-04-24?
How do I use log to find that src/modules/_math was originally called src/modules/math?
Does the current export at srb's server correctly represent the 7.4/7.5/7.6/etc branches? I haven't been able to go back to e.g. the 7.6/7.7 split. Thought it would be possible using gitk.
I guess a good start for a git export would be your svn export where you've sorted all that out. Just using git instead wouldn't magically make all those issues dissappear, would it?
Btw, I still haven't been able to get a good overview of the branches and how they connect to each other. Is there a view for that in gitk? (Or some other tool?)
i guess you started gitk without arguments. by =default it will only show the active branch. to see all branches use --all -d is also nice to sort commits by date
gitk -d --all & is the way i run gitk every time.
greetings, martin.
Thank you, that helped. Is there some way to collapse the view so that only branches and their splits and merges are shown?
such a feature would be really nice indeed, but at the moment it does not exist. i wonder how hard it would be to add it to gitk.
greetings, martin.
Martin Baehr wrote:
such a feature would be really nice indeed, but at the moment it does not exist. i wonder how hard it would be to add it to gitk.
Try:
gitk --first-parent --all
--first-parent seems to only reduce part of the history of merges but not collapse all linear sequences of commits. (only commits that have more than one parent or child should be shown.)
greetings, martin.
Martin Baehr wrote:
--first-parent seems to only reduce part of the history of merges but not collapse all linear sequences of commits. (only commits that have more than one parent or child should be shown.)
I see what you mean. Wouldn't know the magic incantations for that, not quite sure if it's not possible though, try "man git-rev-list". If you can get "git log" or "git rev-list" to show the relevant commits, then the same arguments supplied to gitk will give the same treeview.
In any case, the tags like 7.1 7.3 7.5 allow you to jump to the right spot in gitk.
well, i asked in the #git irc channel, and such afeature does not yet seem to exist. i think it could be added to gitk though, as it loads every commit it could check and then hide the irrelevant ones.
greetings, martin.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Does the current export at srb's server correctly represent the 7.4/7.5/7.6/etc branches? I haven't been able to go back to e.g. the 7.6/7.7 split. Thought it would be possible using gitk.
That already is correct.
The branches are: 0.5 0.6 7.0 7.2 7.4 7.6 7.7 extra_tests nt-tools
There are tags for other old branches that never lived as separate branches, e.g.:
v0 ulpc.old ulpc 0.7 7.1 7.3 7.5
I.e. git log 7.5 shows the commit exactly prior to the 7.6/7.7 split.
I guess a good start for a git export would be your svn export where you've sorted all that out. Just using git instead wouldn't magically make all those issues dissappear, would it?
I have used the blessed SVN export as a base, but it still contained numerous errors which I had to fix (some missing files, CR/LF mistakes *including* in binary files). At the moment I consider the git repo to be more accurate than the SVN export. I will do an automated backcheck with *all* CVS checkouts though, and find (and fix) all remaining differences.
Btw, I still haven't been able to get a good overview of the branches and how they connect to each other. Is there a view for that in gitk? (Or some other tool?)
gitk --all or gitk 0.5 0.6 7.0 7.2 7.4 7.6 7.7 nt-tools extra_tests (provided that you created local branches for all origin/... remotes).
should go a long way. However, for a more complete view, you need the rsynced version of my git repo; since it contains the majority of the backports as merges. Nevertheless, the git://git.cuci.nl/pike repository contains a lot of backports and all branches in full correctly.
I have used the blessed SVN export as a base, but it still contained numerous errors which I had to fix (some missing files, CR/LF mistakes *including* in binary files).
It would be nice if you could report any such errors you find. Which missing files, for example?
CR/LF mistakes might be due to the lack of a svn:eol-style (or svn:content-type, int the case of binaries) property. All files are stored as BLOBs in the repository, but can get CR/LF converted upon checkout. Please indicate the files and I'll add the properties.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
I have used the blessed SVN export as a base, but it still contained numerous errors which I had to fix (some missing files, CR/LF mistakes *including* in binary files).
It would be nice if you could report any such errors you find. Which
I did make some noise about that in the beginning of my svn/cvs -> git odyssee on this list.
missing files, for example?
I'll have to look it back up in the mailinglist logs, although I didn't mention every missing file; I'll see what I can hand you.
CR/LF mistakes might be due to the lack of a svn:eol-style (or svn:content-type, int the case of binaries) property. All files are stored as BLOBs in the repository, but can get CR/LF converted upon checkout. Please indicate the files and I'll add the properties.
Well, from memory, please add it to *all* binary files (graphics mostly), and all files which have CR/LF endings because they were created or intended for DOS/Windows. Especially the graphics files (png/gif) were a PITA to correct.
I'll have to look it back up in the mailinglist logs, although I didn't mention every missing file; I'll see what I can hand you.
Thanks.
Well, from memory, please add it to *all* binary files (graphics mostly), and all files which have CR/LF endings because they were created or intended for DOS/Windows.
I see now that I've forgotten to give the script a --mime-types file. That would mean it can't guess filetypes correctly for graphics etc. That's probably the root of the problem. Without any content-type information, it appears to set eol-style "native". This means that it will convert LF to CR/LF if you check out on a DOS/Windows machine.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Well, from memory, please add it to *all* binary files (graphics mostly), and all files which have CR/LF endings because they were created or intended for DOS/Windows.
I see now that I've forgotten to give the script a --mime-types file. That would mean it can't guess filetypes correctly for graphics etc. That's probably the root of the problem. Without any content-type information, it appears to set eol-style "native". This means that it will convert LF to CR/LF if you check out on a DOS/Windows machine.
I did the conversion on Linux, so the files were damaged on import. I actually verified several times, and manually, that the corruption was in the SVN repository and didn't occur on checkout (you don't want to know how many times I ran the import scripts, everytime with slightly different parameters).
Ah, I see that if the eol-style is set to "native", the import script also tries to normalize the files wrt EOL. So that should also be fixed once mime-detection is ok.
Were there any non-binary files where this was a problem? In that case it might be necessary to select a different eol-style for those files.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Were there any non-binary files where this was a problem? In that case it might be necessary to select a different eol-style for those files.
Some files intended for or create on DOS/Windows (batch files, or C-source intended for the Windows compiler, if I recall correctly; one config/definition file (also in the Windows region), I forgot which exactly.
Some files intended for or create on DOS/Windows (batch files, or C-source intended for the Windows compiler, if I recall correctly; one config/definition file (also in the Windows region), I forgot which exactly.
Well, for a C-source it shouldn't make any difference. And as I said, if you check it out on a W*ndows machine, you'll get CR/LF back. For the config file it might matter.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Some files intended for or create on DOS/Windows (batch files, or C-source intended for the Windows compiler, if I recall correctly; one config/definition file (also in the Windows region), I forgot which exactly.
Well, for a C-source it shouldn't make any difference. And as I said, if you check it out on a W*ndows machine, you'll get CR/LF back. For the config file it might matter.
True. But when you actually try to verify correctness of the import against the CVS checkout, it is kind of a dealbreaker.
As for other files, I'm not sure, but *any* binary file is suspect. I suggest you briefly run through the repository and look for anything binary. I only remember the graphics files as a recurring event, not quite sure if there weren't any other binary files in there which got corrupted.
True. But when you actually try to verify correctness of the import against the CVS checkout, it is kind of a dealbreaker.
Well, yes, but it also a feature which is nice to have. It's much easier for people to edit the files if they get them with their native EOL format.
Then again, we _could_ set eol-style "crlf" on all files which are obviously Windows-related. That shouldn't reduce the usability much, and would allow an automatic verification to be made, as long as you run it on UNIX.
As for other files, I'm not sure, but *any* binary file is suspect. I suggest you briefly run through the repository and look for anything binary. I only remember the graphics files as a recurring event, not quite sure if there weren't any other binary files in there which got corrupted.
As I said, I had forgotten to give a MIME database to the script. When running without a MIME database, the script relies on the -kb indicator in CVS to determine if the file is binary or not. But as you have discovered, that's not very reliable at all. When it _has_ a MIME database, it will instead treat all files as binary unless they have a text/* MIME-type.
I wonder, if we were to start to use either of your svn or git exports as the live repository, and it later turns out that the history is flakey somewhere, is it then possible to amend it without affecting the recent history?
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
I wonder, if we were to start to use either of your svn or git exports as the live repository, and it later turns out that the history is flakey somewhere, is it then possible to amend it without affecting the recent history?
Yes. At the moment you're talking to the person with more experience in juggling repository history (most notably git's, but I have tried forcing SVN in the distant past) than probably any other (at least on the (immensely active) git mailinglist, I haven't met anyone with more rewriting experience yet).
Yes, thank you very much for your effort with this. The git repo is indeed nice for personal use to juggle with temporary branches and to dig around in all the history (I've been practicing a bit in the weekend).
Still, with git it appears to me that it would be more complicated to fix a problem in the history, since if a commit is split in two or something like that then it's unavoidable that the hashes for all later commits change, right? Then everyone synching against it would have to migrate their local branches to the new repo, and hashes posted in email messages etc would no longer be usable either.
I'm only asking because I was curious about how important it is to get it right from the start. If it's that difficult in the git case I guess one would never bother to fix the problems after the repo is taken into active use.
Come to think about it, problems would be similar for svn if the revision series change.
Come to think about it, problems would be similar for svn if the revision series change.
If you need to insert a new revision, then all following revisions need to be renumbered, yes. (Deleting a revision is simpler, since the import tool will renumber all revisions anyway, so it doesn't matter if there are "holes".) I have pike tools to handle this though.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Still, with git it appears to me that it would be more complicated to fix a problem in the history, since if a commit is split in two or something like that then it's unavoidable that the hashes for all later commits change, right? Then everyone synching against it would have to migrate their local branches to the new repo, and hashes posted in email messages etc would no longer be usable either.
Well, let's put it this way, one proposed way forward would be the following:
a. Setup an official git repository somewhere (basically it involves copying my current tree to some place official). b. Turn on the git-cvs compatibility server for those still wanting to use CVS based tools. c. Declare the git repository to be official and turn the old CVS repositories into read-only archives for reference only. d. Start committing to the new git repository (either through git, preferred, or through the CVS emulation layer). e. In parallel we verify the correctness of the current history, and make amends where needed. No impact to the current operation. f. At some point in time, we declare history to be near perfect. At that time we archive the current repository, and refactor it entirely to get all the historical fixes in. g. Anyone having local branches which are based off of commits in the old git repository will have to rebase those branches onto the correct new hashes of the old commits (this actually is quite easy to do). h. Any old hashes mentioned in emails will only refer to the old git repository and are likely not to be found in the new one. i. Proceed with development as usual. j. If necessary, collect late fixes to history which have been forgotten, if deemed necessary, at some point in the future, repeat steps f through h to get the fixes in.
I've personally exercised all these steps myself in the past few months, several times, so did Martin, in order to keep up with my fixes. It's not difficult to do.
b. Turn on the git-cvs compatibility server for those still wanting to use CVS based tools.
That doesn't work. It doesn't present the proper RCS files for our tools and no one using pserver. If CVS was no longer the main repository that might not matter, but I'm just saying that gits CVS-compatibility is next to useless.
Peter Bortas @ Pike developers forum wrote:
b. Turn on the git-cvs compatibility server for those still wanting to use CVS based tools.
That doesn't work. It doesn't present the proper RCS files for our tools and no one using pserver. If CVS was no longer the main repository that might not matter, but I'm just saying that gits CVS-compatibility is next to useless.
For the tools, no, perhaps. But I could imagine that some still would like to use CVS command line?
I have a hard time seeing who that would be. Resistance to git is not based (a lot) on command-line familarity.
g. Anyone having local branches which are based off of commits in the old git repository will have to rebase those branches onto the correct new hashes of the old commits (this actually is quite easy to do).
How do you go about that? Manually rebasing every branch, or is it easier? (Guess I'll get some experience on that too when you rebuild the repo next time. ;)
Even if it's simple to rebase, I think it'd be best to avoid steps f-h as much as possible. If nothing else, it's inconvenient (and confusing) to have to keep several old repositories around just to be able to follow old mail threads.
My experience with git so far:
It's a very good tool to work locally with. In the long run this is clearly the way to go to organize one's own hacking, imo. Even if we don't go for git on the server for some time yet, I think I'll be using it for myself.
It shows quickly that it isn't a mature system yet, though. As some kind of measurement, I can mention the time until I found it necessary to pull home the source to fix the flaws that started to annoy me: For git it was two days, which I think is a record. (Btw, I've got home built Ubuntu Hardy packages for 1.5.6.4 if anyone's interested.)
I don't agree very much with the intended collaboration model, though. Apparently there should be no central repository, only developers that pulls each other's changes in some ad-hoc way. We're supposed to appoint a maintainer who everyone else has to ask to manually pull in their patches. This has "so many advantages" over letting developers just update a shared repository by themselves. Well..
But there's no need to import this model, of course; git can work just fine with a shared repository too. I wouldn't like to see the plethora of branches and merges on the server, though. A nice linear sequence of commits on each version branch keeps things simple. We've been working that way for years now and I haven't seen any real reason to change that. But that shouldn't be a big problem to accomplish just by dictating policy, I guess.
Zino has good points that svn fits better on the server: It's more mature, it's closer to cvs so that the server tools around it don't need much change, and it's afterall built with that sharing model in mind.
Maybe a good way to accomplish both ends is to use the bidirectional git-svn bridge? Afaict it's made for the case when the primary repo is in svn. It advices against doing git-style heavily branched development, which is good on the server side (see above). And using svn for storage allows properties, which git lacks. I will play around a bit with git-svn.
On Mon, Jul 28, 2008 at 11:15:02PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
g. Anyone having local branches which are based off of commits in the old git repository will have to rebase those
How do you go about that? Manually rebasing every branch
yes.
Even if it's simple to rebase, I think it'd be best to avoid steps f-h as much as possible.
absolutely.
If nothing else, it's inconvenient (and confusing) to have to keep several old repositories around
you don't need several repositories, you can keep them all in one repo. but thaty doesn't ease the confusion, especially once you get multiple such branches.
I don't agree very much with the intended collaboration model, though.
there is no intended collaboration model, you can choose any model you like.
Apparently there should be no central repository, only developers that pulls each other's changes in some ad-hoc way.
that is just one way to do it, and this model is most often explained as it is a new model that dcvs makes possible, in other words, if you want this model, then dcvs is the only way, but that doesn't mean that other models are less well supported or inferior or anything.
git can work just fine with a shared repository too. I wouldn't like to see the plethora of branches and merges on the server, though.
agreed.
A nice linear sequence of commits on each version branch keeps things simple. We've been working that way for years now and I haven't seen any real reason to change that. But that shouldn't be a big problem to accomplish just by dictating policy, I guess.
i think for us a mixed model makes the most sense. people with write access push into the core branches, but other potential contributers can still be pulled from by a core developer and then pushed from there.
Zino has good points that svn fits better on the server: It's more mature,
more mature in what sense? if everyone uses git on the client side then the maturity of svn on the server does not muy anything because you still have to deal with git. it only adds hassle which wouldn't exist otherwise.
it's closer to cvs so that the server tools around it don't need much change,
the largest change should be in the fact that the rcs files are not accessible anymore but have to be replaced with calls to the svn module. at the same time calls to git can be added.
and it's afterall built with that sharing model in mind.
it has been built with a "this is all we know, so this is all you get" mindset. git is being built with a "there are many ways to do it, and we want to support all of them" mindset.
Maybe a good way to accomplish both ends is to use the bidirectional git-svn bridge? Afaict it's made for the case when the primary repo is in svn. It advices against doing git-style heavily branched development, which is good on the server side (see above). And using svn for storage allows properties, which git lacks. I will play around a bit with git-svn.
the only thing that would accomplish is to enforce limitations that svn has. with a small group as pike devs are, i think such enforcement is not necesary. policy should be enough.
git-svn is the most featureful dcvs-svn bridge i came accross (only bzr might be a bit better, but it is less mature), but it has it's limitations. the largest one is lack of merge tracking (you can't merge into a branch that is to be commited to svn, even though it could be made possible (the reason for that is actually because git can't rebase merges (yet)), another is that it is limited to a certain branch layout (not a problem for the pike repo)
greetings, martin.
Martin B?hr wrote:
On Mon, Jul 28, 2008 at 11:15:02PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
g. Anyone having local branches which are based off of commits in the old git repository will have to rebase those
How do you go about that? Manually rebasing every branch
yes.
Say you have the following two aliases in /etc/gitconfig: [alias] makepatch = format-patch -k --stdout applypatch = am -k --whitespace=nowarn
Say you have a local branch called mybranch with commits: A-B-c-d-e Where A-B are part of the public repository. Then I'd do a: git checkout mybranch git makepatch B >/tmp/mybranchinpatchformat Then resync the repository/branches to the history-changed master repository. Then: git checkout mybranch git applypatch </tmp/mybranchinpatchformat
And you'd end up with: A'-B'-c'-d'-e'
Even if it's simple to rebase, I think it'd be best to avoid steps f-h as much as possible.
I don't think there is any arguing here. I was actually planning to fix the repository once, and be done with it, we just have to get all hens on deck for people to check most of it. And only if we missed a large enough amount of things, we'd ever repeat it, but most likely not.
git can work just fine with a shared repository too. I wouldn't like to see the plethora of branches and merges on the server, though.
agreed.
Let me emphasize this: IMO the best model would be to have a central git repository with just (currently) the main branches:
0.5 0.6 7.0 7.2 7.4 7.6 7.7 nt-tools extra_tests
Nothing more, nothing less. Commits on those branches are linear, but backports or forward ports/merges can be done directly and hence are visible when viewed in gitk (unless you use --first-parent which strips all the merge/porting links).
A nice linear sequence of commits on each version branch keeps things simple. We've been working that way for years now and I haven't seen any real reason to change that. But that shouldn't be a big problem to accomplish just by dictating policy, I guess.
i think for us a mixed model makes the most sense. people with write access push into the core branches, but other potential contributers can still be pulled from by a core developer and then pushed from there.
Quite. On the central repository new and/or temporary branches should not be created. People can do that on their own repos as much as they want; temporary branches on the central repo should be the exception rather than the rule.
Zino has good points that svn fits better on the server: It's more mature,
more mature in what sense? if everyone uses git on the client side then the maturity of svn on the server does not muy anything because you still have to deal with git. it only adds hassle which wouldn't exist otherwise.
Same question. More mature in which way? Git is maturing at an amazing rate (codebase wise), most open source projects either start with git or move from CVS or SVN to git these days.
it's closer to cvs so that the server tools around it don't need much change,
the largest change should be in the fact that the rcs files are not accessible anymore but have to be replaced with calls to the svn module. at the same time calls to git can be added.
Indeed, for every CVS/SVN command, there is an equivalent git command. So rewriting the tools to SVN or git is just as much effort.
and it's afterall built with that sharing model in mind.
it has been built with a "this is all we know, so this is all you get" mindset. git is being built with a "there are many ways to do it, and we want to support all of them" mindset.
I dare say that most deployments of Git use the central repository model (like CVS/SVN). It's just that Linux development uses the more purely distributed model due to its sheer size and depth of the project.
Maybe a good way to accomplish both ends is to use the bidirectional git-svn bridge? Afaict it's made for the case when the primary repo is in svn. It advices against doing git-style heavily branched development, which is good on the server side (see above). And using svn for storage allows properties, which git lacks. I will play around a bit with git-svn.
Which properties are you missing (git has properties, like file modes, BTW)?
git-svn is the most featureful dcvs-svn bridge i came accross (only bzr might be a bit better, but it is less mature), but it has it's
Having SVN as a central repository would work, of course, but it would be inconvenient at best, because SVN is not as good with merges (both the actual process, as well as recording the merge history).
Also, a rather large drawback (I think) for SVN is that it is immutable. Case in point: if someone (by mistake) checks in a large 50MB Yahoo-UI binary blob, then this blob will be part of the SVN repository forever and cannot easily be removed. The only thing you can do is painstakingly dump the whole repository, filter out the bad commits (which sometimes is complicated enough) and read it back in again.
Git gives you tools to actually fix that with a small price to pay: anyone who already synced from that branch, will have to rebase, but other than that, there is no downtime, no complicated dump-editing; it's all less-filling and easy to use.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Same question. More mature in which way? Git is maturing at an amazing rate (codebase wise), most open source projects either start with git or move from CVS or SVN to git these days.
[citation needed] :-) What statistical material do you base this on?
FWIW, the Debian popularity contest shows:
http://people.debian.org/~igloo/popcon-graphs/index.php?packages=darcs%2Cgit...
that: - The percentage of people picking svn nowadays is roughly constant (after a sharp drop as git came along). - The percentage of people picking git is rising steadily. - The percentage of people picking CVS is dropping steadily. - The other VCSes play not role of significance.
At this rate, git will overtake SVN in percentage by the end of 2009 (provided the rate doesn't accelerate more). If you view the charts without the percentage view, you get absolute numbers, which is difficult to interpret, since the number of voters constantly rises over time.
I'm not saying that this graph says everything, but I do know that the rate of development of git is an order of magnitude larger than that of any of the other VCSes.
If anyone needs pointers with regard which cvs sequences substitute with which git commands/sequences, let us know. Incidentally there are several packages out there (some rough, some polished) that give you more traditional interfaces to git. In any case, if you want to appreciate the day-to-day capabilities of git, you should learn what the index/staging area means and how it works.
On Tue, Jul 29, 2008 at 02:58:01PM +0200, Stephen R. van den Berg wrote:
- The percentage of people picking svn nowadays is roughly constant (after a sharp drop as git came along).
that drop is actually in all packages, so i think it is just a case of changing the counting
In any case, if you want to appreciate the day-to-day capabilities of git, you should learn what the index/staging area means and how it works.
git add -p my favourite command!
greetings, martin.
Well, the user base of the tools doesn't necessarily reflect the distribution of repository types among projects. There only has to be one project which many are interrested in (say, the Linux kernel, a project which would be of special interrest to users of a Linux distibution such as Debian) using a specific tool for people (and this includes also those who are not activeley participating in the development of the project) to install the client tool. It doesn't mean that they use that tool for their own projects (if any). I, for one, have CVS, svn, git, darcs and brz installed, and a popularity context would have no way of knowing which I use for starting new projects, or moving existing projects to.
Even if we were to infer that the relative number of repositories is changing in favour of git, I don't see how you can draw the conclusion that _most_ projects start with git or move to git. If we were to draw any conclusion from this graph it would be that _most_ projects are shutting down their repositories completely, since the number of users who have any VCS at all has decreased from >=45% to <=20% (the exact figure depends on the overlap caused by people haveing more than one installed), meaning that the number of VCS installations (and by your logic the number of open-source projects) has been cut in half. So _most_ the projects (>50%) must have been discontinued?
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Well, the user base of the tools doesn't necessarily reflect the distribution of repository types among projects. There only has to be
Very true. The statistic shown is just that, a statistic, and since I don't have any others, that's all I can show. I know it's just as bad as most other statistics; the other information I have is even more vague than this silly graph.
that _most_ projects start with git or move to git. If we were to draw any conclusion from this graph it would be that _most_ projects are shutting down their repositories completely, since the number of users who have any VCS at all has decreased from >=45% to <=20% (the exact figure depends on the overlap caused by people haveing more than one installed), meaning that the number of VCS installations (and by your logic the number of open-source projects) has been cut in half. So _most_ the projects (>50%) must have been discontinued?
Well, like I said, don't overweight this graph... But since you are starting to pick it apart, we might as well do it properly... What it probably means, is that the number of people installing Debian are increasingly non-developers. I.e. the growth rate of Debian installs is higher than the growth rate of the number of developers amongst them. This would be expected, not everyone has a need to use VCS systems.
The statistic shown is just that, a statistic, and since I don't have any others, that's all I can show. I know it's just as bad as most other statistics; the other information I have is even more vague than this silly graph.
Fair enough. I was hoping for something more substantial since you were able to quantify your claim, but I guess that was just a case of accidental wording. The part I have a hard time believing is that most existing projects are switching to git. Even if a lot of projects are, there are tons of open source projects. sourceforge alone contains 290808 projects, and they don't provide git repositories (only CVS and svn).
Well, like I said, don't overweight this graph... But since you are starting to pick it apart, we might as well do it properly... What it probably means, is that the number of people installing Debian are increasingly non-developers.
Yup. This is a hypotheses which the graph _does_ give support to at least. :-) Or possibly that there are now more non-developers using popularity contest. (Didn't they change so that it always asks during installation now, instead of you having to enable it manually?)
i think yes, and i think also ubuntu added it probably in april last year (which would explain the sudden drop around that time)
greetings, martin.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
accidental wording. The part I have a hard time believing is that most existing projects are switching to git. Even if a lot of projects are, there are tons of open source projects. sourceforge alone contains 290808 projects, and they don't provide git repositories (only CVS and svn).
True. What I probably mean by "most open source projects", is that of the people that I know from the Open Source community which have been around since at least 1994, the majority speaks highly of git and is moving all their projects into it or will do so in a short while.
You're completely right that in sheer numbers, that doesn't account for "most" open source projects. But it's the projects I care about, and they are the people who's opinion I value, hence the somewhat rapid and probably biased conclusion (I guess I consider them trendsetters, and the rest of the projects trendfollowers).
E.g. most GNU projects use git as their master repositories these days.
Yup. This is a hypotheses which the graph _does_ give support to at least. :-) Or possibly that there are now more non-developers using popularity contest. (Didn't they change so that it always asks during installation now, instead of you having to enable it manually?)
I think yes. But I rarely perform raw installations these days, I simply copy a similar system, and then remove/add packages to taste.
I pulled up the list of official GNU projects on Savannah (349 projects), and manually checked the first 25.
CVS: 18 Git: 1 Both CVS and Git: 3 Both CVS and Svn: 1 No repository at savannah: 2
Still looks like CVS is in majority to me.
I also checked the latest 5 created official GNU projects:
CVS: 1 Both CVS and Mercurial: 1 Both CVS and Subversion: 2 Both CVS and git: 1
A closer race, but only one for git and two for svn... :-)
Am I the only one who thinks this discussion really belongs elsewhere? Pros and cons about various VC systems as well as tutorials about their command-line flags feels very off-topic to me.
Well, we can of course make the repository switch without prior discussion. In that case I suppose we'd go a head with the (now several years old) plan to switch to svn.
I think you can learn about git in other places and later bring facts here instead of asking all git newbie questions to the Pike developer audience.
Well, when "facts" which turn out to be unsubstantiated are brought here, it seems appropriate to scrutinize them, no?
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Well, we can of course make the repository switch without prior discussion. In that case I suppose we'd go a head with the (now several years old) plan to switch to svn.
Switching to SVN is always better than continuing to use CVS. The point just is that Git does everything SVN does already, but adds more at no additional cost. So why settle for less?
What? Last time I checked it did not have any version controlled meta data handling at all.
Peter Bortas @ Pike developers forum wrote:
What? Last time I checked it did not have any version controlled meta data handling at all.
True. Personally I don't miss it, keeping the metadata accurate usually was more of a hassle than a feature to me (in SVN). I tended to create rules to autotag based on content or name of the file. Once you do that it is just as easy to actually use the rules on the fly, without ever storing the attributes.
Another note of interest (not necessarily my view, but interesting enough to read, I'd think) with respect to metadata is:
http://plasmasturm.org/log/487/
Another note of interest (not necessarily my view, but interesting enough to read, I'd think) with respect to metadata is:
He's got a point that it's a good concept to track content without regard to the file that contains it, but that applies only to metadata for file identity. At least content type is inherently file bound and is best expressed as a property.
Peter Bortas @ Pike developers forum wrote:
What? Last time I checked it did not have any version controlled meta data handling at all.
Well, then it must be christmas :-)...
"man gitattributes" reveals version controlled data handling (version controlled, because it is easily managed inside a .gitattributes file which can be put in every subdir(tree) you'd like).
You can actually chose to have them version controlled or not, i.e. they can go into .git/info/attributes or they can go into .gitattributes files that are placed inside the repository.
Since the discussion is for/prior to switching repository, I actually do think it belongs in the developers forum... :p
It's a bit TLDR though, can we have an executive summary when you're done?
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
E.g. most GNU projects use git as their master repositories these days.
Is there some statistics on e.g. Savannah to support this?
Well, same thing here, in this case when I say "most", I actually mean that I was looking at the page that lists the git projects, and most of the core GNU tools I still remember from long ago were on it.
Look here to see what they moved to git:
http://git.savannah.gnu.org/gitweb/
Well, same thing here, in this case when I say "most", I actually mean that I was looking at the page that lists the git projects, and most of the core GNU tools I still remember from long ago were on it.
Ok, so you actually mean that most of the projects that use git are GNU projects?
Looking at a page that lists git projects you are of course going to see 100% git. :-)
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Well, same thing here, in this case when I say "most", I actually mean that I was looking at the page that lists the git projects, and most of the core GNU tools I still remember from long ago were on it.
Ok, so you actually mean that most of the projects that use git are GNU projects?
No :-).
Looking at a page that lists git projects you are of course going to see 100% git. :-)
Well, basically, when looking at that list, and I see that the maintainers of the following GNU tools thought git to be good:
autoconf, automake, bison, coreutils, dejagnu, diffutils, diffutils, findutils, gnugo, gsasl, guile, libtool, mailutils, mcron, procmail-lib, radius, screen, sed, smalltalk, tar, tpop3d, w3, z80asm
That means something to me.
I see that the maintainer of diffutils is extra influential on you, since you list his/her project twice. :-)
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
I see that the maintainer of diffutils is extra influential on you, since you list his/her project twice. :-)
Must be a woman then :-). Sorry, didn't notice that mistake until after sending.
more mature in what sense? if everyone uses git on the client side then the maturity of svn on the server does not muy anything because you still have to deal with git. it only adds hassle which wouldn't exist otherwise.
Same question. More mature in which way? Git is maturing at an amazing rate (codebase wise), most open source projects either start with git or move from CVS or SVN to git these days.
We're clearly getting into a hand-waving area here. I don't have any solid evidence where either git or svn might cause integration problems or breakage. I'd have to implement pelix-NG with both tools to provide that.
But it is as you say: It is maturing very fast, hence it's not mature yet. I found several new and semi-essential features just by moving from the latest Ubuntu package to the latest upstream release.
I think it's reasonable to say that a lot more is changing, and some of it at deeper levels, in git right now than in svn. That means both that there's still a need for such changes, and that the changes increase the risk for bugs.
Anyway, arguments like that are a bit FUDish. To get to an earthlier level it's probably required to look in more detail how the server setup would be in either case, which tools around the repo would be used, how much work there is to fix it, and how willing people are to do that. Personally I don't have much to contribute to that debate.
Indeed, for every CVS/SVN command, there is an equivalent git command. /.../
You can't honestly believe it's that simple. The commands are different, the output is different, many core concepts are different, there are subtle semantic differences in the kind of data you put into the commands and get back.
Which properties are you missing (git has properties, like file modes, BTW)?
The eol handling property is nice when the same file is edited from both unix and windows. Content type is also good to allow better diffing, annotation and merging. (In svn it's currently only used to tell text and binary files apart, basically. But the possibility for more content-specific plugins exist. A fully structural diff/blame/merge for xml files would be quite neat, for instance.)
In-file expansion of $Id$ stuff can be controlled with them too. Which, btw, is something I haven't found in git. Putting the commit hash inside the file would invariably change it, so that's not possible. I guess the best one could do is to expand it with a timestamp and the closest tag, but even so it'd be a useful feature.
Git gives you tools to actually fix that with a small price to pay: anyone who already synced from that branch, will have to rebase, but other than that, there is no downtime, no complicated dump-editing; it's all less-filling and easy to use.
I wouldn't call that price small, though. It's good that the possibility exists, but it should be used only in extraordinary cases.
Another detail in the comparison is that windows support for svn is infinitely better. Not that windows is a very important platform for us, but we are afterall attempting to change that a bit.
Btw, git-svn choked on the pike svn repo. I let it run through the night and it had only gotten to 0.7 by the morning, and each imported commit and tagging was becoming slower and slower. Something there isn't scaling properly. Haven't dug into it very much, though.
On Tue, Jul 29, 2008 at 05:15:02PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
But it is as you say: It is maturing very fast, hence it's not mature yet. I found several new and semi-essential features just by moving from the latest Ubuntu package to the latest upstream release.
yes, that's true i also keep discovering new features in new releases that i don't want to live without once i know them
just out of curiosity which features are those for you? (also what was the annoyance that caused you to get the git source to fix it?)
The eol handling property is nice when the same file is edited from both unix and windows.
i believe there is something for eol handling in git, check the hooks.
In-file expansion of $Id$ stuff can be controlled with them too.
there is an approximation for this, i believe it works by adding the id at checkout, and removing it at checkin. again, see the hooks
anyone who already synced from that branch, will have to rebase
I wouldn't call that price small, though.
smaller than the svn price.
Btw, git-svn choked on the pike svn repo.
hmm, i think i managed to get through it ayear ago. did you access the repo locally? i set up a copy of the repo on my machine to do the import.
greetings, martin.
anyone who already synced from that branch, will have to rebase
I wouldn't call that price small, though.
smaller than the svn price.
If the git fix for this was to diff your local changes to a file, checkout the repos again, and apply the patch, then the price should be the same, because you can do exactly that with svn as well.
i was including the price to actually implement the change in the repo. dump, edit, reload is way more expensive than fixing a comit and rebasing.
greetings, martin.
Yes, but what was being talked about now was what people who had the repository checked out had to do. The cost to implement the change in the repo only has to be paid once, the cost to rebase needs to be paid for each checked out tree.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Yes, but what was being talked about now was what people who had the repository checked out had to do. The cost to implement the change in the repo only has to be paid once, the cost to rebase needs to be paid for each checked out tree.
Not quite.
What I said was:
Git gives you tools to actually fix that with a small price to pay: anyone who already synced from that branch, will have to rebase, but other than that, there is no downtime, no complicated dump-editing; it's all less-filling and easy to use.
In git you can fix a commit, even if you notice the mistake an hour after checking things in. You can fix it without dumping/restoring the repository. The only problem would be people that already commited new commits on top of your commit, those commits get new hashes and need to be rebased.
just out of curiosity which features are those for you?
More subcommands to git-stash so that stashes actually get useful: Earlier the only option was to clear all of them or nothing, which means that one could only use them to very temporary things. Now I've got 12 stashes with assorted small hacks in them. For small things I think they're better than branches because you can give them longer descriptions and they don't clutter up the view in gitk.
There were some other odds and ends too which I don't quite remember.
(also what was the annoyance that caused you to get the git source to fix it?)
I haven't actually fixed anything yet. I pulled down the source to make me newer ubuntu packages. I took a look at fixing so that the pager isn't used for short output though; it's annoying that git status has started to use the pager too all the time.
Overall I mostly miss more configurability: Ways to set default arguments to various commands, better control over the blame line format. I also miss a counterpart to the -l option to cvs diff.
i believe there is something for eol handling in git, check the hooks.
They don't help for associating specific behaviors with specific files.
there is an approximation for this, i believe it works by adding the id at checkout, and removing it at checkin. again, see the hooks
Ok. That'd be nice to have by default. Would require properties or something like them, though.
Btw, git-svn choked on the pike svn repo.
hmm, i think i managed to get through it ayear ago. did you access the repo locally? i set up a copy of the repo on my machine to do the import.
No, over the net. Do you think the network traffic increases with each imported commit? That'd be even worse. My theory is that git-svn doesn't use a good indexed storage to map between hashes and svn revisions, or something like that.
On Tue, Jul 29, 2008 at 05:50:02PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
More subcommands to git-stash so that stashes actually get useful: Earlier the only option was to clear all of them or nothing, which means that one could only use them to very temporary things. Now I've got 12 stashes with assorted small hacks in them. For small things I think they're better than branches because you can give them longer descriptions and they don't clutter up the view in gitk.
hmm, i actually think they clutter up the gitk view more than a branch, but then, gitk only shows the latest stash, not all of them.
I took a look at fixing so that the pager isn't used for short output though; it's annoying that git status has started to use the pager too all the time.
oh, doesn't happen for me.
No, over the net. Do you think the network traffic increases with each imported commit?
no, come to think of it, i probably had to restart git-svn a few times. i did here at work where importing a repo over the (local) network took 10 days.
greetings, martin.
hmm, i actually think they clutter up the gitk view more than a branch, but then, gitk only shows the latest stash, not all of them.
The problem is that all those "bogus" branches are shown for every commit back to the beginning - it gets difficult to see which "real" branches a commit far back is on.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
hmm, i actually think they clutter up the gitk view more than a branch, but then, gitk only shows the latest stash, not all of them.
The problem is that all those "bogus" branches are shown for every commit back to the beginning - it gets difficult to see which "real" branches a commit far back is on.
gitk 0.5 0.6 7.0 7.2 7.4 7.6 7.7
should solve that; just name the branches you want to see. Another thing is, branches can be placed in a subdirectory; e.g. mainbranches/7.0 is a valid branchname. It allows you to group main branches or tempbranches or both.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
make me newer ubuntu packages. I took a look at fixing so that the pager isn't used for short output though; it's annoying that git status has started to use the pager too all the time.
Try: LESS=-inqMSFXx4 in your environment (or a suitable subset), it solves that problem.
Overall I mostly miss more configurability: Ways to set default arguments to various commands,
Checkout "man git-config" (if you haven't already), and "man gitattributes".
better control over the blame line format. I also miss a counterpart to the -l option to cvs diff.
git diff .
(Note the trailing dot). will limit the diff to the current directory and subdirs. It's not quite the same as -l because it *does* recurse.
i believe there is something for eol handling in git, check the hooks.
They don't help for associating specific behaviors with specific files.
"man gitattributes", crlf attribute.
there is an approximation for this, i believe it works by adding the id at checkout, and removing it at checkin. again, see the hooks
Ok. That'd be nice to have by default. Would require properties or something like them, though.
"man gitattributes", ident attribute.
Btw, git-svn choked on the pike svn repo.
hmm, i think i managed to get through it ayear ago. did you access the repo locally? i set up a copy of the repo on my machine to do the import.
No, over the net. Do you think the network traffic increases with each imported commit? That'd be even worse. My theory is that git-svn doesn't use a good indexed storage to map between hashes and svn revisions, or something like that.
git-svn keeps a lot of state in memory, maybe even has memory leaks, don't know for sure. What helps, everytime, is simply killing and restarting git-svn. Git-svn is *very* good at restarting and picking up right where it left off.
Try: LESS=-inqMSFXx4 in your environment (or a suitable subset), it solves that problem.
More precisely, -F is the relevant flag. Unfortunately it doesn't work with -c. I've bug reported that. Still I think I'd prefer to configure git status to not use the pager at all.
It's not quite the same as -l because it *does* recurse.
Yep, that's the problem.
"man gitattributes", crlf attribute.
Thanks for the tip about attributes. The svn solution for it, where the properties are directly associated with the files themselves, is still more elegant imo. Anyway, .gitattributes files are not any worse than things like .cvsignore. It's workable.
"man gitattributes", ident attribute.
Ok, good. I think this should be turned on by default in the git repo so that all those $Id$ get back in business again.
git-svn keeps a lot of state in memory, maybe even has memory leaks, don't know for sure. What helps, everytime, is simply killing and restarting git-svn. Git-svn is *very* good at restarting and picking up right where it left off.
Tried that, didn't help the speed at all. It also gc'd by itself every once in a while, and doing an extra gc didn't improve things either.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
More precisely, -F is the relevant flag. Unfortunately it doesn't work with -c. I've bug reported that. Still I think I'd prefer to configure git status to not use the pager at all.
There were elaborate discussions about this on the git mailinglist about three months ago, I'll try and look up what the consensus was.
git-svn keeps a lot of state in memory, maybe even has memory leaks, don't know for sure. What helps, everytime, is simply killing and restarting git-svn. Git-svn is *very* good at restarting and picking up right where it left off.
Tried that, didn't help the speed at all. It also gc'd by itself every once in a while, and doing an extra gc didn't improve things either.
Well, I do know that I always ran it on a local SVN repo (when doing huge imports). So for Pike, either copy the repo locally using the dumpfile, or use SVK to mirror the SVN repo, then run git-svn.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Try: LESS=-inqMSFXx4 in your environment (or a suitable subset), it solves that problem.
More precisely, -F is the relevant flag. Unfortunately it doesn't work with -c. I've bug reported that. Still I think I'd prefer to configure git status to not use the pager at all.
From the 1.6.0 relnotes:
* pager.<cmd> configuration variable can be used to enable/disable the default paging behaviour per command.
I.e. if you track the git.git sourcetree, simply install from master, the relevant patch went in on July 3rd 2008.
It's not quite the same as -l because it *does* recurse.
Yep, that's the problem.
In /etc/gitconfig: [alias] diffl = !git-diffl
And a small script called git-diffl in your path: #!/bin/sh
exec git diff "$@" -- $( git ls-files --exclude-standard . | fgrep -v / )
Should more or less take care of the problem.
"man gitattributes", ident attribute.
Ok, good. I think this should be turned on by default in the git repo so that all those $Id$ get back in business again.
It clutters the diff output, IMHO. I.e. the diff heading already mentions the two hashes the diff is between. But it's a matter of taste, of course; I'll play around with it a bit and see what is workable and/or desirable.
Tried that, didn't help the speed at all. It also gc'd by itself every once in a while, and doing an extra gc didn't improve things either.
Well, the gc is just the git gc for the repo, that will hardly help checkinspeed. Getting the SVN repo locally probably is the only thing that works.
And a small script called git-diffl in your path:
/.../
Thanks for the effort, but that was a tad too kludgy for me (doesn't work with multiple dirs, for instance). I hope they'll add an option to git-diff eventually.
It clutters the diff output, IMHO. I.e. the diff heading already mentions the two hashes the diff is between. But it's a matter of taste, of course; I'll play around with it a bit and see what is workable and/or desirable.
Would be best if git-diff didn't expand when diffing (considering that the expansion is done after checkout I reckon that's what one would get anyway, unless they actively do the extra work to expand $Id$ before diffing too).
The usefulness is in bug reports etc, so working expansions are most important in dists. Can occasionally be useful in bug reports between developers as well I guess, but of course anyone is free to disable expansion in one's own trees. Still think it should be the default, though.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
And a small script called git-diffl in your path:
/.../
Thanks for the effort, but that was a tad too kludgy for me (doesn't work with multiple dirs, for instance). I hope they'll add an option to git-diff eventually.
Well, Linus himself thinks it's a five line patch or something. So if you care to add it, it'll probably be accepted in mainstream (I can assist in getting it in the main distribution).
Would be best if git-diff didn't expand when diffing (considering that the expansion is done after checkout I reckon that's what one would get anyway, unless they actively do the extra work to expand $Id$ before diffing too).
Probably, yes.
The usefulness is in bug reports etc, so working expansions are most important in dists. Can occasionally be useful in bug reports between developers as well I guess, but of course anyone is free to disable expansion in one's own trees. Still think it should be the default, though.
You mean it's most relevant in binaries when people reveal the revision of a tool or module using a version command?
This would not include the practice of the $Id$ in the comment header of a file, or would it?
You mean it's most relevant in binaries when people reveal the revision of a tool or module using a version command?
Well, perhaps that too. Main use is that it can be included in backtraces, as we do in Roxen. Would have to shorten the hashes, of course. (Gonna miss cvs' short revision numbers. Oh well, you can't have it all..)
In this case it'll work when they are in comments too.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Same question. More mature in which way? Git is maturing at an amazing rate (codebase wise), most open source projects either start with git or move from CVS or SVN to git these days.
But it is as you say: It is maturing very fast, hence it's not mature yet. I found several new and semi-essential features just by moving from the latest Ubuntu package to the latest upstream release.
Please keep in mind that many, if not all, of those initially missing features are things which never existed before in any VCS. So it's not really fair to compare that to e.g. SVN where some of those features are simply not viable.
I think it's reasonable to say that a lot more is changing, and some of it at deeper levels, in git right now than in svn. That means both that there's still a need for such changes, and that the changes increase the risk for bugs.
In theory, you're right. In practice, however, I've seen corrupted SVN repositories more than once (myself and on the net at large), with both FSFS and BDB backends, simply spontaneously (because you killed an SVN update at an unfortunate time, or ran two updating commands simultaneously and the locking got mixed up); and also because I tried to fix history by editing the SVN-dump files, some changes in there are highly non-trivial and rather errorprone.
Contrary to that, the git core repository format layout hasn't changed since the start, is very simple and does not break even when updated concurrently (it was designed with robustness in mind). Even if it should break due to disk-sectors going south, most of the repository is still retrievable (there have been documented cases, the standard git tools usually recover it quite nicely). So the development in git concentrates mostly on the user-interface, *not* on the repository layout. I.e. the number of bugs you're going to be encountering have zero effect on your already committed history.
Even stronger: released git versions have (not to my knowlegde in the past year) never resulted in corrupted repositories (due to bugs). Development versions have in some cases resulted in corruption, however, that corruption was always in the repacking code, not in the base repository, and therefore were always 100% recoverable.
Indeed, for every CVS/SVN command, there is an equivalent git command. /.../
You can't honestly believe it's that simple. The commands are different, the output is different, many core concepts are different, there are subtle semantic differences in the kind of data you put into the commands and get back.
Git can be made to emulate SVN/CVS, the other way around is close to impossible. Let's just say that the amount of effort required to make git behave similar enough to CVS/SVN is not easily determined.
Which properties are you missing (git has properties, like file modes, BTW)?
The eol handling property is nice when the same file is edited from both unix and windows. Content type is also good to allow better diffing, annotation and merging.
"man gitattributes" addresses most of these issues.
(In svn it's currently only used to tell text and binary files apart, basically. But the possibility for more content-specific plugins exist. A fully structural diff/blame/merge for xml files would be quite neat, for instance.)
"man gitattributes" allows you to specify custom merge strategies for xml files, for example.
In-file expansion of $Id$ stuff can be controlled with them too. Which, btw, is something I haven't found in git. Putting the commit hash inside the file would invariably change it, so that's not possible. I guess the best one could do is to expand it with a timestamp and the closest tag, but even so it'd be a useful feature.
"man gitattributes" solves that too, ident attribute.
Git gives you tools to actually fix that with a small price to pay: anyone who already synced from that branch, will have to rebase, but other than that, there is no downtime, no complicated dump-editing; it's all less-filling and easy to use.
I wouldn't call that price small, though. It's good that the possibility exists, but it should be used only in extraordinary cases.
That price is small in comparison to the effort needed in SVN (editing and reimporting every time); in git you can fix immediately if the fix is in the vicinity of the tip of a branch; fixing things deeper down can be checked in immediately in git and be verified/accumulated as you go along, and then at an appropriate time the whole repository could be rebased (but the latter, I agree, should be rare if ever; in SVN though, it's not really workable).
Another detail in the comparison is that windows support for svn is infinitely better. Not that windows is a very important platform for us, but we are afterall attempting to change that a bit.
That statement would have been true december 2007. Windows support for got has come a long way since then. I'd say "infinitely better" is way too strong these days, but I admit that SVN still has a slight edge there; git is closing that gap faster than you'd think though.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Same question. More mature in which way? Git is maturing at an amazing rate (codebase wise), most open source projects either start with git or move from CVS or SVN to git these days.
But it is as you say: It is maturing very fast, hence it's not mature yet. I found several new and semi-essential features just by moving from the latest Ubuntu package to the latest upstream release.
Please keep in mind that many, if not all, of those initially missing features are things which never existed before in any VCS. So it's not really fair to compare that to e.g. SVN where some of those features are simply not viable.
Maturity does not have a lot to do with number of features. I has a lot to do with having those features for a long time to the point where no new features are added and stability has been achived. CVS is mature and stable. SVN is mature and fairly stable. git is not mature and not stable.
Peter Bortas @ Pike developers forum wrote:
Please keep in mind that many, if not all, of those initially missing features are things which never existed before in any VCS. So it's not really fair to compare that to e.g. SVN where some of those features are simply not viable.
Maturity does not have a lot to do with number of features. I has a lot to do with having those features for a long time to the point where no new features are added and stability has been achived. CVS is mature and stable. SVN is mature and fairly stable. git is not mature and not stable.
I agree. But, as explained in another post, the parts of git that deal with actually taking source-tree snapshots and storing them in the repository are already mature and stable. The rest (the user interface) is not (the higher level you go, the more unstable it is).
In the case of svn, it is possible to export the repository as a giant textfile, where you can tweak individual commits, and then read it back. You need to take the repository offline while you do this, though.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
In the case of svn, it is possible to export the repository as a giant textfile, where you can tweak individual commits, and then read it back. You need to take the repository offline while you do this, though.
Been there, done that, even got the T-shirt. Editing history like that is an excruciatingly painful experience though. The downsides are: - The svn-dump is always complete, and never partial. - The svn-dump is huge (for a repository like PIke). - In order to check if changes are good, it needs to be read into svn again (both dumping and reading take very long). - Mistakes in editing the dump are easy to make, especially when moving files around in history.
The net result is, that after trying that *real hard* for some time, I gave up in disgust, then didn't touch the project until a year later, when I latched onto git.
Does the current export at srb's server correctly represent the 7.4/7.5/7.6/etc branches? I haven't been able to go back to e.g. the 7.6/7.7 split. Thought it would be possible using gitk.
I guess a good start for a git export would be your svn export where you've sorted all that out. Just using git instead wouldn't magically make all those issues dissappear, would it?
I think that, depending on how the git import script works, some issues might actually disappear "magically". But there are others that won't. For example the fact that split points are not associated with a CVS commit, so you have to fake one (something that the current git import apparently doesn't).
Importing from the svn export should work, I expect it's mainly a question of defining how repository paths should be mapped to git branches. (Are git branch names version controlled? In the svn export the main development branch is renamed 0.5->0.6->0.7->7.0->7.1->7.3->7.5->7.7 at apropriate times.)
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
I think that, depending on how the git import script works, some issues might actually disappear "magically". But there are others that won't. For example the fact that split points are not associated with a CVS commit, so you have to fake one (something that the current git import apparently doesn't).
I have specifically and manually removed the fake split commits after importing from SVN, because they didn't add information AFAICS. Why would you want them in the repository?
Importing from the svn export should work, I expect it's mainly a question of defining how repository paths should be mapped to git branches. (Are git branch names version controlled? In the svn export the main development branch is renamed 0.5->0.6->0.7->7.0->7.1->7.3->7.5->7.7 at apropriate times.)
Git branches can be demoted to tags at the flip of a switch at anytime, and that probably is what we should be doing. I.e. in git, after the 7.8/7.9 split, I'll be changing the 7.7 branch to become a tag, and then copy that tag into 2 branches, 7.8 and 7.9. Whereas "master" is then pointing to 7.9, but that is just convention in git, and doesn't have a lot of realworld implications.
I have specifically and manually removed the fake split commits after importing from SVN, because they didn't add information AFAICS. Why would you want them in the repository?
They add the information about when the changes to the repository done at the split were made. In the current git repository, those changes seem to be erroneously reported as part of a completely unrelated commit several days later. (The exact date isn't all that important, but I don't want totally unrelated changes grouped together as a single commit.)
Git branches can be demoted to tags at the flip of a switch at anytime, and that probably is what we should be doing. I.e. in git, after the 7.8/7.9 split, I'll be changing the 7.7 branch to become a tag, and then copy that tag into 2 branches, 7.8 and 7.9. Whereas "master" is then pointing to 7.9, but that is just convention in git, and doesn't have a lot of realworld implications.
Well, that makes sense I guess. Btw, what is the difference (implied by the convention) between "master" and "HEAD"?
On Mon, Jul 28, 2008 at 03:20:04PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Well, that makes sense I guess. Btw, what is the difference (implied by the convention) between "master" and "HEAD"?
master is just a default branch name and completely irrelevant if you don't want to use it (i actually think stephen should remove the master branch from the git repo, since it doesn't make much sense for us)
HEAD is name for the current active, checked out commit. it is not a branch. if you switch branches, then your HEAD changes. (see git reflog for where HEAD has been) there should not actually be a branch with that name, i wonder what happened here. stephen?
greetings, martin.
Martin B?hr wrote:
HEAD is name for the current active, checked out commit. it is not a branch. if you switch branches, then your HEAD changes. (see git reflog for where HEAD has been) there should not actually be a branch with that name, i wonder what happened here. stephen?
I think that this is cosmetic wart in the git cloning code. Actually the repository you're cloning from has a checked out source tree here. A real official repository should probably be "bare" (git-speak), and not have a checked out source tree.
For all intents and purposes this works the same as cloning from a bare repository, you should just get a "fake" branch called HEAD, along with the branches BuGless and debian-be which are my personal development branches; just ignore the "extra" branches, and you'll be fine.
With respect to the current "master" branch... It's debatable, I left it in as a convenience for someone unfamiliar with the repository layout/branches, this way he'll get the bleading-edge development branch by default, if he clones the repository. "master" is an alias for 7.7 at the moment, that'll change to 7.9 after the split.
On Mon, Jul 28, 2008 at 03:00:03PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
For example the fact that split points are not associated with a CVS commit, so you have to fake one (something that the current git import apparently doesn't).
because git doesn't need it. only svn makes a commit for the split. in git the split is represented by the fact that a parent commit has two children. the parent is before the split, and the two children are after. what extra information would a split commit give you? the exact time when you decided to make the split? how is that relevant?
for git, a split does not become relevent until you actually make a commit to the new branch, as such the split time is the time when the second child is created, until then the history is linear, and no split actually occured before that point, regardless of when you made the decision to split.
(Are git branch names version controlled? In the svn export the main development branch is renamed 0.5->0.6->0.7->7.0->7.1->7.3->7.5->7.7 at apropriate times.)
git branches are simple references. just like symlinks, that point to the head of the branch. there is a reflog for each branch (if activated) which shows where a branch-head has been pointing to at previous times. but that information is purely local and won't be reproduced when someone clones the repo.
so when you get the repo you do not get any branch name history, only the current state. there is also nothing that tells you which branch a given commit used to be part of, there are only ways to show which branch a commit is part of at this moment (git branch --contains <commit>)
greetings, martin.
git branch -a --contains origin/HEAD which branches origin/HEAD is contained in.
that means those branches are the same or newer than origin/HEAD from there you can investigate further.
i usually just look at gitk to see the branch relationship.
greetings, martin.
It is possible to generate some kind of tree without resorting to GTK applications?
gitk is tcl/tk
you can get an ascii tree using git log --graph, everything else would involve graphics of some kind.
greetings, martin.
On Mon, Jul 28, 2008 at 03:25:03PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
you can get an ascii tree using git log --graph,
I meant of the relationsships between all the branches. Wouldn't that be "git branch" rather than "git log"?
sorry, you need of course also here specify that you want to see all branches: git log --abbrev-commit --pretty=oneline --graph --date-order --all is the most convenient way i came up with.
Either way, neither log or branch seemed to recognize the "--graph" option.
then your version of git is to old. --graph is a very new addition i believe in 1.5.6 (my version is 1.5.6.1)
greetings, martin.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
then your version of git is to old. --graph is a very new addition i believe in 1.5.6 (my version is 1.5.6.1)
1.5.4.3. I have an up-to-date Ubuntu Hardy Heron.
Try adding backports. I'm not quite sure where the latest version of git is in ubuntu. In debian it's in testing.
Or, use: git clone git://git.kernel.org/pub/scm/git/git.git and make install it from the master branch.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Try adding backports. I'm not quite sure where the latest version of git is in ubuntu.
I have backports already. And intrepid has the same version as hardy.
Then either find a more current ubuntu repository for git, wait or compile from source (it's relatively painless).
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
git clone git://git.cuci.nl/pike
Ok, that seems to give me 7.7.
It actually gives you more, you just don't see it yet.
How do I get for example nt-tools or 7.0?
git branch git branch -r
shows you the landscape you have access to.
Either access the branches directly with e.g.:
git log origin/7.7 git checkout origin/7.7
or preferably create local version for them:
git branch origin/7.7 7.7 # copy remote branch to local git fetch origin 7.7:7.7 # sync local branch with remote repository
git branch origin/7.0 7.0 git fetch origin 7.0:7.0
git branch origin/nt-tools nt-tools git fetch origin nt-tools:nt-tools
How do I get bin/rsqld.pike from 2004-04-24?
For starters try: git blame lib/modules/Tools.pmod/Standalone.pmod/rsqld.pike
You'll notice commit hash-ids in the leftmost column and original filenames in which those lines were contained back then.
E.g. to show the file at a particular point in time in the past try:
git show 559221f8:bin/rsqld.pike
How do I use log to find that src/modules/_math was originally called src/modules/math?
Try: git blame src/modules/_math/math.c
How do I use log to find that src/modules/_math was originally called src/modules/math?
Try: git blame src/modules/_math/math.c
Well, ok, I guess I underspecified the question. Yes, you can see from this output that the file has at some point had this name (but you can't know that it's the _original_ name, it might have had another name before that). What I mean was, how can I find the actual rename operation, which should contain the time of the rename, the old pathname and the new pathname.
(The blame trick would not have worked if all the lines had (eventually) changed after the rename.)
Yes, it sounds like a very good idea to get rid of a deliberate sleep in peek(). But it must have been put there for a reason - some digging(*) reveals that it was introduced long ago:
revision 1.109 date: 1998/07/10 18:58:55; author: grubba; state: Exp; lines: +5 -5 Made file_peek() somewhat more paranoid.
As far as I remember it was to avoid bugs in some early implementations of poll(2) and select(2). Note that at the time the patch was done we targetted supporting OSes like Dynix and ULTRIX.
Not the most illuminating message. Maybe Grubba can recall something more about this? Otherwise I suggest we remove the timeout in the next dev branch to let it brew there for a while. It was probably a kludge to work around a bug in some old poll implementation on a strange OS that's no longer used anyway.
Most likely.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
*) Tried out git for this. git diff and blame are fast and works nicely, but is there some gui tool to do this kind of thing even more conveniently? I tried to use "git-gui blame" but couldn't make it go past the latest change.
Personally I have used gitk a lot, but not specifically for blame/annotation traversal, maybe it supports it though. Sorry, can't offer more specific pointers here currently.
pike-devel@lists.lysator.liu.se