I'm trying to fully understand the history as presented by git. One of the things I can't work out is this:
Take a look at src/pike_memory.c on the 7.8 branch in gitk. For the last few commits it says 7.7 and 7.8 branches only, which is what I expect. But at the commit "Added DMALLOC_USE_HASHBASE mode" it suddenly claims 7.4 and 7.6 too, and that that commit precedes both tags v7.4.512 and v7.7.40. Why is that? I can't see this commit or any descendant on the 7.4 branch.
All commits after that one are also claimed to be on essentially all branches. This is something I see in general all over the repository. If the history of several branches is viewed at once then this makes it difficult to see on which branch(es) a commit is really made.
"Added DMALLOC_USE_HASHBASE mode" is fe982070cac0283b79a3a4a3a54b7865537acab7.
Also regarding src/pike_memory.c, its history starts with "memory.{c,h} renamed to pike_memory.{c,h}" (4e86f944a018c5397e3c55693b0637f63987e7d7). Regardless of -M, -C and --find-copies-harder flags, I can't see the history past that point.
On Sat, Sep 06, 2008 at 12:40:02PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
"Added DMALLOC_USE_HASHBASE mode" is fe982070cac0283b79a3a4a3a54b7865537acab7.
ah, i think i see the problem. it looks like the backports somehow get indicated as whole merges.
not sure what stephen has done here, i thought that backports were added as grafts which is extra data not part of the commits themselves.
take a look at 3ddc88c0499fc084d5ddc3a38cb2620e3cfb4a63 and you see the merge lines that cause the confusion.
stephen, can yopu elaborate on this?
greetings, martin.
Martin Baehr wrote:
On Sat, Sep 06, 2008 at 12:40:02PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
"Added DMALLOC_USE_HASHBASE mode" is fe982070cac0283b79a3a4a3a54b7865537acab7.
ah, i think i see the problem. it looks like the backports somehow get indicated as whole merges.
I know the problem. It's a flaw in the way gitk determines the (main) branch a certain commit is on. I.e. the graphical display is correct, the notification of which tags lie ahead and in the past is confused by the backport links.
The tree is essentially correct, I have a patch to gitk in the works to actually cleanup this behaviour and show the correct values.
As for now, until the fully backport-connected history is finished and the patch to gitk is in, you can reduce the clutter in the gitk view by using: gitk --first-parent 7.8 7.6 7.4 7.2 7.0 0.6 0.5
As for now, until the fully backport-connected history is finished and the patch to gitk is in, you can reduce the clutter in the gitk view by using: gitk --first-parent 7.8 7.6 7.4 7.2 7.0 0.6 0.5
That doesn't make any difference for my little test case src/pike_memory.c. I use git 1.5.4.3.
Like embee, I'm curious about those backport grafts that appear as merges. Is this only a problem of representation in gitk and git-log? To me it seems odd to describe them as merges since I reckon that content would be identical after a merge.
Also regarding src/pike_memory.c, its history starts with "memory.{c,h} renamed to pike_memory.{c,h}"
This was also the case in the Subversion repository, but I have fixed that now. It now ends with the "Initial revision" checkin of Pike/0.5 (r146). It doesn't carry over into the ulpc source though. Should I fix that?
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
As for now, until the fully backport-connected history is finished and the patch to gitk is in, you can reduce the clutter in the gitk view by using: gitk --first-parent 7.8 7.6 7.4 7.2 7.0 0.6 0.5
That doesn't make any difference for my little test case src/pike_memory.c. I use git 1.5.4.3.
Well, it should, because it removes all the backport info, but then again maybe not all commands you are using are honouring the --first-parent flag. However, instead of pursuing this...
Like embee, I'm curious about those backport grafts that appear as merges. Is this only a problem of representation in gitk and git-log? To me it seems odd to describe them as merges since I reckon that content would be identical after a merge.
I did some backchecks on the git mailinglist now, and it appears that I was a bit overly optimistic in my assumption that git is able to distinguish between backports and merges based on parenthood alone.
The recommended practice is that for back/forwardports the commit id's of the originating cherry-picked patches are mentioned at the bottom of the new commit message.
Gitweb already automatically transforms them into clickable links, gitk probably still needs patches to make them both clickable as well as draw a visual dotted line in the treegraph to show the relation.
This means that I'll have to regenerate the backport/forwardport way of linking to modify the commit messages instead of the graft/parent list.
Ok good, that should make the history more straight and simple.
Still, it would be nice if git had some builtin concept of cherry-picked patches between branches, so it'd be possible to write e.g. "git log 7.6..7.8" to see all commits that really are in 7.8 only.
actually, there is a concept that could help, in addition to the normall commit hash there is als an object hash that is only made out of the diff and maybe the comment (not sure), now all that is needed is tools to find multiple occurances of these and connect them.
that search could be expensive though, so any commits found would need to be recorded somewhere.
greetings, martin.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Ok good, that should make the history more straight and simple.
Still, it would be nice if git had some builtin concept of cherry-picked patches between branches, so it'd be possible to write e.g. "git log 7.6..7.8" to see all commits that really are in 7.8 only.
Well, since I fully agree with you, as we speak, I'm trying to convince the git community to allow me to add that builtin.
A problem is that the diff might not be byte-for-byte equivalent (considering it presumably is stored with offsets and/or context).
There would only be a "semantic" equivalence, which might be abused: People could be tempted to connect wildly different patches in the belief that they have the "same effect", which is probably not a good thing to do. But that's rather a question of policy and judgement.
Like embee, I'm curious about those backport grafts that appear as merges. Is this only a problem of representation in gitk and git-log? To me it seems odd to describe them as merges since I reckon that content would be identical after a merge.
I did some backchecks on the git mailinglist now, and it appears that I was a bit overly optimistic in my assumption that git is able to distinguish between backports and merges based on parenthood alone.
The recommended practice is that for back/forwardports the commit id's of the originating cherry-picked patches are mentioned at the bottom of the new commit message.
Unfortunately that won't help (if we) in the future use git for backports, since the new backports will get the same graphs once again.
I believe the problem lies in how gitk determines what branch(es) a commit is on rather than on our use of grafts for backports.
This means that I'll have to regenerate the backport/forwardport way of linking to modify the commit messages instead of the graft/parent list. -- Sincerely, Stephen R. van den Berg.
true, but if the backport was done with a git cherry-pick they would have at least the same author and timestamp. (ok, at that point one might as well explicitly mark the cherry pick...)
greetings, martin.
Unfortunately that won't help (if we) in the future use git for backports, since the new backports will get the same graphs once again.
I don't understand this. What graphs?
I believe the problem lies in how gitk determines what branch(es) a commit is on rather than on our use of grafts for backports.
Correct me if I'm wrong, but a graft is a way to artificially add a parent/child relation, isn't it?
If a commit has two (or more) parents, isn't that per definition a merge of them, which should cause a join of their respective branches? Cherry-picked patches are rather different since only the patch itself is taken from the other branch, not the whole content of that branch. If parent pointers are used for both of these, then how would git tell a merge from a cherry-pick?
Henrik Grubbstr?m (Lysator) @ Pike (-) developers forum wrote:
The recommended practice is that for back/forwardports the commit id's of the originating cherry-picked patches are mentioned at the bottom of the new commit message.
Unfortunately that won't help (if we) in the future use git for backports, since the new backports will get the same graphs once again.
No. The backports in Pike would be done in git using "git cherry-pick". And as-is "git cherry-pick" does not create a parent reference, but gives you the option of a textual reference in the free-form commit message. This means that the current practice of creating extra parents for the backports has to be altered into either: - Textual references at the bottom of the commit message. - Or usage of a new (native git) link which I'm trying to get included into git as we speak.
I believe the problem lies in how gitk determines what branch(es) a commit is on rather than on our use of grafts for backports.
That's what I thought first as well, and it might even be true, but it's not what the git community wants to fix. It's either the first or the second option above, and I'm pushing for the second at the moment.
This means that I'll have to regenerate the backport/forwardport way of linking to modify the commit messages instead of the graft/parent list.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Unfortunately that won't help (if we) in the future use git for backports, since the new backports will get the same graphs once again.
I don't understand this. What graphs?
I presume Grubba means the parent-pointer(s).
I believe the problem lies in how gitk determines what branch(es) a commit is on rather than on our use of grafts for backports.
Correct me if I'm wrong, but a graft is a way to artificially add a parent/child relation, isn't it?
Correct. And apparently it was/is intended for merges only, even though the diff-machinery already delivers correct results even if it's a cherry-pick instead of a merge.
If a commit has two (or more) parents, isn't that per definition a merge of them, which should cause a join of their respective branches? Cherry-picked patches are rather different since only the patch itself is taken from the other branch, not the whole content of that branch.
This is indeed how the current git developers and most of its users expect it.
If parent pointers are used for both of these, then how would git tell a merge from a cherry-pick?
Well, the difference can be inferred from the content, but it's not the way the git community wants to go, so I'm proposing the cherry-pick links instead.
The properties I'm going to propose are that the link: - Is called something like "origin". - That it contains the SHA1 of the commit we're picking from in addition to the --mainline parent-number (in case there are multiple parents). - A commit can have an arbitrary number of "origin" links. - The origin link is weak by default, i.e. once inside a repository it will keep alive any referenced commits, but it will not cause the linked to commit to be fetched or pulled from a remote repository automatically.
Yes, I know this is not Pike specific, but since we're going to make heavy use of this feature in the Pike repository (due to the frequent backports), this is the time to tell me if I missed something in that design definition, because it might make the difference to get the feature accepted into git in a way which works for all typical use cases (or not).
Correct. And apparently it was/is intended for merges only, even though the diff-machinery already delivers correct results even if it's a cherry-pick instead of a merge.
Does it? That's also a thing that confuses me: Consider commit 1cc21ef0320ecb734d4db5b85dbdc0406815e38e, which is a grafted merge from 7.8 to 7.6.
In gitk, it shows a three-way merge, where the diff from the 7.6 parent shows the patch, as expected. The diff from 7.8 apparently shows whatever differences there are between 7.8 and 7.6 that happens to be in the context of the 7.6 parent diff (e.g. the "verify_mexec_hdr" line).
That's odd imho. Assuming parent pointers mean merges, I'd expect to see all differences between the 7.8 and 7.6 versions of the file. If this was a real merge rather than a cherry-pick, the three-way diff would have a lot of essential information missing.
If, otoh, gitk can tell that the parent pointer into the 7.8 branch is a cherry-pick relation (using a property I've yet to understand), then I'd expect to see a two-way diff on the 7.6 branch only - the differences to 7.8 would just be garbage.
I also get confused with git-log: Look at the same commit with "git log -p 1cc21ef0". Then it doesn't show any diff at all, neither against 7.6 nor 7.8.
What's worse, if I use only "git log -p" on the 7.6 branch and search for the commit there then I don't see the diff either. If I look at the log on a branch, I really expect to see _all_ diffs that leads up to the current content. Maybe this can be explained with that the log due to all the backport merges skips over to the 7.8 tree after a while, but using --first-parent doesn't help.
Btw, speaking of git-log, how does one make it print out the branches like gitk does?
If parent pointers are used for both of these, then how would git tell a merge from a cherry-pick?
Well, the difference can be inferred from the content, /.../
I'm curious, how does that work? I guess this could explain the (imo) strange results I'm discussing above.
The properties I'm going to propose are that the link:
- Is called something like "origin".
- That it contains the SHA1 of the commit we're picking from in addition to the --mainline parent-number (in case there are multiple parents).
- A commit can have an arbitrary number of "origin" links.
- The origin link is weak by default, i.e. once inside a repository it will keep alive any referenced commits, but it will not cause the linked to commit to be fetched or pulled from a remote repository automatically.
Sounds sane to me, but I'm only a n00b.
Btw, my main use case for cherry-pick links are, as I've said elsewhere, to get an accurate log of the difference with "git log A..B". I see now that git-log has a --cherry-pick option which claims to detect cherry-picked patches anyway. How does it work? The commit diffs have to be byte-for-byte identical?
On Tue, Sep 09, 2008 at 04:45:02PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Btw, speaking of git-log, how does one make it print out the branches like gitk does?
i think you are looking for --decorate Print out the ref names of any commits that are shown.
greetings, martin.
--decorate helps a bit, but it still doesn't show the branch(es) each commit is made on. Thanks for the tip anyway.
I've now reconnected the history of ulpc(.old) to the main Pike development line in the Subversion reposiotry, so if you do svn log -v http://pike-svn.lysator.liu.se/Pike/7.8/src/pike_memory.c you'll go alllll the way back to r2. :-) (Also, you can use blame to find out that the hashstr() function still looks almost exactly like it did in the beginning.)
it would be nice to get an overview of how much of the original code is still left. maybe even for every version, to see the amount of change happening.
greetings, martin.
Should be possible to find out with a script. I assume the interresting metric would be "how much of the current code is original" rather than "how much of the original code is current"?
both actually, while 10% of the current code being original means that there is a lot of new code, it could be that 90% of the original code is still current, which would means that most of the original code survived but the whole codebase grew by a magnitude.
do that for each major revision and we get a nice graph of how the code evolved over time.
greetings, martin.
Ok, the results are in:
7.8 7481 / 873478 0.9% 15.9% 7.6 7965 / 690785 1.2% 16.9% 7.4 8761 / 652559 1.3% 18.6% 7.2 11612 / 466609 2.5% 24.7% 7.0 12801 / 341128 3.8% 27.2% 0.6 15764 / 210659 7.5% 33.5% 0.5 20061 / 132486 15.1% 42.6% ulpc 42087 / 91730 45.9% 89.4% ulpc.old 43881 / 52093 84.2% 93.2% ulpc.0 47101 / 47101 100.0% 100.0%
"ulpc.0" is the original checkin of ulpc into the Infovav repository, so it's 100% original by definition. :-) The numbers for each repository are:
* Number of lines which are "original" * Total number of lines * Percentage of lines which are "original" * Percentage of the "original" lines which remain
Binary files are not taken into account, but the only binary file which exists in "ulpc.0" is doc/manual/ulpc-inside3.gif, and this file obviously did not survive the transition to Pike.
Oh, and there's 13 years and 1 month of development between "ulpc.0" and today ("ulpc.0" commit date was 1995-08-09).
very nice!
do we know the date when ulpc was started?
greetings, martin.
The copyright statement says 1994, and the archived releases of LPC4 I can find date from April 1994 to September 1994, so end of 1994 sounds about right.
could you post the script you used for that? i'd like to do the same for each major version, and then make a nice <diagram> out of it.
greetings, martin.
Helper script "count_r2lines" (counts total and original lines for a particular path and revision):
--8<-- #!/bin/sh
if [ "$#" != 2 ]; then echo >&2 "Usage: $0 dir rev" exit 1 fi
dir="$1" rev="$2"
repos="file://"$HOME"/repos/Pike/"
hit=0 tot=0 svn ls -r"$rev" -R "$repos"Pike/"$dir"@"$rev" | sed -e '//$/d' | while read f; do set `svn annotate -r"$rev" "$repos"Pike/"$dir"/"$f"@"$rev" | sed -e 's/^ *([0-9]*) *.*$/\1/' | /usr/xpg4/bin/awk 'BEGIN {x=0} $1=="2" { x++ } END {print x,NR}'` hit=`expr $hit + $1` tot=`expr $tot + $2` echo "$hit" "$tot" done | tail -1 --8<--
Main script (runs the helper script for each pike branch at head, and for the original ulpc checkin):
--8<-- #!/bin/sh
doit() { dir="$1" rev="$2" name="$3" set `./count_r2lines "$dir" "$rev"` echo "$name" "$1" "$2" }
for i in 7.8 7.6 7.4 7.2 7.0 0.6 0.5 ulpc ulpc.old; do doit $i head $i done doit ulpc 2 ulpc.0 --8<--
Pike script for presentation of the raw data generated by the main script:
--8<-- #!/home/marcus/bin/pike
int lcnt0;
int main() { array(string) x = Stdio.read_file("r2lines.data")/"\n"-({""}); sscanf(x[-1], "%*s %*d %d", lcnt0); foreach(x, string y) { string n; int l, t; sscanf(y, "%s %d %d", n, l, t); write("%-8s %5d / %6d %5.1f%% %5.1f%%\n", n, l, t, 100.0*l/t, 100.0*l/lcnt0); } return 0; } --8<--
I ran this on eureka-svn, but if you change the assignment of "repos" to "http://eureka-svn.lysator.liu.se/" it should run anywhere.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Helper script "count_r2lines" (counts total and original lines for a particular path and revision):
Does it disregard empty lines in the count?
Nope. Adapting the sed command to do that is left as an execercise. :-)
a rough cut of the diagram is now here: https://pike.ida.liu.se/development/history.xml i'll leave beautification to the graphicians
i made this graph using git git diff --shortstat to compare all the branches. 0.3b and 0.4 are included because there are some changes going on that would otherwise not be visible.
greetings, martin.
Infotastic! Would be interesting to space out the x axis according to release times too, for yet another perspective on the data.
yes, i considered doing a similar graph that not only has the releases but compares changes from month to month (or even week to week, given one pixel width per week this should make for a nice graph :-)
greetings, martin.
pike-devel@lists.lysator.liu.se