This is a heads-up regarding replacing the main Pike git repository with a filtered copy.
The main reason why we'd want to do that is that there is a syntax error in the timestamp fields of the commit objects, causing "git fsck" to complain about most commits. In order to fix that, I have run "git filter-branch" on the repository, performing transforms on the commit objects to fix the timestamp issue, but also fix some other issues which ought to be addressed if we are doing a filtering anyway. These include replacing some obsolete email addresses, as well as removing empty commits and fixing some whitespace issues. See http://pike-svn.lysator.liu.se/twiki/bin/view/Main/RepoFilter for a complete list of changes made to the repository.
Please look through the list of changes, and if there is something which should be added or removed, please say so. A filtered version of the current repository can also be examined at git://pike-git.lysator.liu.se/pike-filtered.git.
If there are no objections, we could declare a flag-day (say, next Saturday) when the filtered repository replaces the current one. Of course, a new filtering would then be performed so that any commits made after today are also included. It would make things simpler if no new commits were pushed during the 5 hours it takes to run the filtering operation though.
If the repository is replaced with the filtered version, it means basically that all commits get new SHAs. If you have local branches tracking the upstreams branches with rebasing, performing a "git pull" should be able to sort things out automatically. However, if you have non-tracking branches they may need to be rebased manually. When all your work is rebased on the new repository, running "git gc" to get rid of the old duplicate commits is recommended.
Questions and comments are welcome.
Hello again.
Grubba suggested one more change to perform during the filtering: Merges of single commits are converted to cherry-picks. Technically, this means that if a commit has two parents, and only one commit C is reachable from commit^2 but not commit^1, then the link to commit^2 is removed, and the commit message and author info of commit is replaced with that of C. I have only done this of commits created after the migration to git, and the complete list (25 commits) is on the Wiki page (http://pike-svn.lysator.liu.se/twiki/bin/view/Main/RepoFilter).
A new filtered repository which contains this additional fix is at git://pike-git.lysator.liu.se/pike-filtered-unmerged.git.
As before, questions and comments are welcome.
That's good work! Thank you for your effort.
On the subject of such cherry picks, I noticed not very long ago that there is a flag -x to git cherry-pick, which is supposed to be used to record its origin:
-x When recording the commit, append to the original commit message a note that indicates which commit this change was cherry-picked from. Append the note only for cherry picks without conflicts. Do not use this option if you are cherry-picking from your private branch because the information is useless to the recipient. If on the other hand you are cherry-picking between two publicly visible branches (e.g. backporting a fix to a maintenance branch for an older release from a development branch), adding this information can be useful.
That's encouraging, but unfortunately the note added by this flag is quite ugly. It's added as a parenthesis with only a single newline as a separator, which is particularly bothersome if the original commit had a single line message since e.g. git log --oneline shows it all as the title:
git log -1 --oneline
1b1124a Some bugfix with a short message. (cherry picked from commit f1055202a823fe348a4445870a791bf395b22ba3)
This is bad enough to keep me from using this feature.
If it had a better format, I'd consider using it for these merges-turned-cherry picks, and perhaps also for earlier sufficiently identical commits that have been applied to several branches (where they have been identified). Still, in theory something like it could be added later on as well by using note objects (c.f. git-notes(1)).
Well, if it's just the formatting, one could reformat it manually with git commit amend (or even have a hook do it automatically).
In this case though, the following caveat comes into play:
| Do not use this option if you are cherry-picking from your | private branch because the information is useless to the | recipient.
The commits in question were made on a private branch. Because they are merged, the commits from the private branch are part of the repository. But once the merges are converted to cherry-picks, there will be no more references to the commits on the private branch, and they will be garbage-collected.
The thing is that the format would be some sort of standard that git core and other tools would use more extensively in the future. So there would be little point if we were to reformat it.
It'd hardly be a problem to pad with an extra newline before it, though. But even so it's an eyesore imo.
The commits in question were made on a private branch. /.../
Those private branches were made to apply the same patch onto several mainline branches, right? In that case there should be corresponding cherry picks on multiple branches in the repository, and it'd then be those that should be linked together. That linking would be in essentially arbitrary directions, but it doesn't matter much, as the point here is to record that the "same" patch occurs on several branches.
In that case there should be corresponding cherry picks on multiple branches in the repository, and it'd then be those that should be linked together.
Yes, but the commit filter doesn't know which those other commits are. I guess it could try to build up some kind of database, so that if it sees the same commit being merged a second time it would look that up in the database to find the sha of the result of the previous cherry-pick. But it's non-trivial, so I won't implement it unless it's actually decided that it's a good idea...
So, is it a good idea to add such annotations to the commit messages of the cherry-picks? Or should I generate notes instead? (Are notes copied by git fetch?)
I'd prefer use of notes. However, as far as I can see, notes are not copied by the default fetch rule:
[remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/*
AFAIK the reason that tags are copied is that they refer to commits that are reached from the heads. The easiest way to propagate a global set of notes seems to be to have them in something like "refs/heads/notes/commits", and configure core.notesRef accordingly at the clones.
Is it of any use before the git folks have decided on a format for it? With notes it's possible to add the information later on. A conservative, and in this case adequate, heuristic to use then would be look for commits with identical author, author date, and commit message.
Unless somebody protests, the plan is now as follows:
On Sunday (the 12:th), sometime between 10:00 and 18:00 UTC, I will replace pike.git with a filtered version. The original pike.git will become available under the name pike-old.git. Complete mappings between the commit sha:s of the two repositories will also be published, to aid manual rebasing and similar housekeeping.
I would appreciate if no new commits were pushed to the repository during this time, unless I have announced that the activity has been either completed or cancelled.
As for the cherry-pick conversion, no notes or extra annotations to the commit messages will be added. Instead, I will log the relevant sha:s to facilitate adding notes at a later time.
The deed is done.
pike.git has been filtered, and is now open for pushing again.
Of course, you'll need to rebase your work first. As I said before, a simple "git pull" should take care of that if you have a branch that tracks a pike.git branch by rebasing. (Do not use "git rebase @{upstream}", it doesn't use the reflog so it won't work.) For more advanced branch structures, some manual work might be involved. If your branch was previously based on unfiltered commit X, you can find the filtered version Y of that commit by consulting the following file (each line contains old SHA and new SHA, or old SHA and "-" if the commit was deleted):
http://pike-svn.lysator.liu.se/twiki/pub/Main/RepoFilter/sha_map
Then run "git rebase --onto Y X" and all should be fine.
Once all your branches are rebased, I recommend a "git gc" to get rid of all those unfiltered commits.
For future reference, the following file
http://pike-svn.lysator.liu.se/twiki/pub/Main/RepoFilter/rebased_commits
details the commits which were changed from merges to cherry-picks. There are four SHA:s per line, namely
* Old SHA of merge * New SHA of cherry-pick * Old SHA of commit which was merged * New SHA of commit which was cherry-picked
So finding cherry-picks of the same commit only amounts to grouping on column 3 or 4.
Not sure what happened here but it doesn't seem successful:
dark-castle:pike79 $ git config branch.7.9.rebase true dark-castle:pike79 $ git status # On branch 7.9 nothing to commit (working directory clean) dark-castle:pike79 $ git pull warning: no common commits remote: Counting objects: 211536, done. remote: Compressing objects: 100% (41175/41175), done. remote: Total 211536 (delta 167690), reused 211330 (delta 167501) Receiving objects: 100% (211536/211536), 36.77 MiB | 1.32 MiB/s, done. Resolving deltas: 100% (167690/167690), done. From pike-git.lysator.liu.se:pike + c70644b...1a6873b 0.5 -> origin/0.5 (forced update) + 0e1ab2d...40f9ab1 0.6 -> origin/0.6 (forced update) [...] + 48df3ba...1b31891 branches/heddas_polypatchar -> origin/branches/heddas_polypatchar (forced update) error: Ref refs/remotes/origin/branches/hubbe is at 06983fde1434ab5470b2fb656c40db32b2e6a971 but expected 108b128dfb32fbdf2a5bc6335266a1676d64e9f9 ! 108b128...5c8e890 branches/hubbe -> origin/branches/hubbe (unable to update local ref) + d9322fc...5891e75 branches/hubbes_image_polygon -> origin/branches/hubbes_image_polygon (forced update) + 4be3c11...a52502a branches/hubbes_working_branch -> origin/branches/hubbes_working_branch (forced update) error: Ref refs/remotes/origin/branches/infovav is at 22c4b59ea865fbb2be49b1656ec7f85040ea65df but expected 4cff4c19627b5a237fc6f50b17f80556821797e7 ! 4cff4c1...68b88f5 branches/infovav -> origin/branches/infovav (unable to update local ref) [...] + b409bde...2fc7962 spider -> origin/spider (forced update) + 94446de...c96524a ulpc -> origin/ulpc (forced update) dark-castle:pike79 $ git status # On branch 7.9 # Your branch and 'origin/7.9' have diverged, # and have 28817 and 28783 different commit(s) each, respectively. # nothing to commit (working directory clean) dark-castle:pike79 $ git --version git version 1.7.3.2
I have no problem wiping that tree and checking it out from scratch again but I'm curious what I should have done differently.
Hm, I'm not sure what happened there either. It looks like it fetched origin, but never performed the rebase. What does you .git/config look like?
If you didn't have any unmerged commits on your local 7.9 branch, then you can just "git reset --hard origin/7.9" to get back on track. There should be no need to wipe the tree.
Here's the config (the rebase flag was set just prior to the procedure):
dark-castle:pike79 $ less .git/config [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true ignorecase = true [remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/* url = git-pike@pike-git.lysator.liu.se:pike.git [branch "7.9"] remote = origin merge = refs/heads/7.9 rebase = true
Looks ok. Maybe the reflog didn't contain the right things because the branch wasn't set to rebase the last time you pulled? What do .git/logs/refs/heads/7.9 and .git/logs/refs/remotes/origin/7.9 look like?
I've already performed reset --hard now so I don't know if forensics will be successful.
dark-castle:pike79 $ less .git/logs/refs/heads/7.9 0000000000000000000000000000000000000000 37f43e3a6e557681ba693d5a8a587a84bf649414 Jonas Walldén jonasw@roxen.com 1291069055 +0100 clone: from git-pike@pike-git.lysator.liu.se:pike.git 37f43e3a6e557681ba693d5a8a587a84bf649414 db4cc7a442096b9a14b47c64bf8e3bea1c596037 Jonas Walldén jonasw@roxen.com 1292163381 +0100 origin/7.9: updating HEAD
dark-castle:pike79 $ less .git/logs/refs/remotes/origin/7.9 37f43e3a6e557681ba693d5a8a587a84bf649414 db4cc7a442096b9a14b47c64bf8e3bea1c596037 Jonas Walldén jonasw@roxen.com 1292160905 +0100 pull : forced-update
Well, that looks fine as well; 37f43e3a6e557681ba693d5a8a587a84bf649414, from which your old HEAD derives, is mentioned in .git/logs/refs/remotes/origin/7.9, so pull should have been able to figure out that "rebase --onto db4cc7 37f43e" was the right thing to do.
The really weird thing is that it didn't even say anything. It just did a fetch and then exited silently. It should have said _something_ after that...
Well, if you want to continue digging, you could always do 'git reset --hard 37f43e3' to get back to where you were, and then try a new 'git pull' (optionally with '-r'), and see what happens...
Hm, looking back at your original pull output, I see that some refs were not updated by fetch. Probably that's why fetch failed, and the pull bailed out. For example, it says
error: Ref refs/remotes/origin/branches/hubbe is at 06983fde1434ab5470b2fb656c40db32b2e6a971 but expected 108b128dfb32fbdf2a5bc6335266a1676d64e9f9 ! 108b128...5c8e890 branches/hubbe -> origin/branches/hubbe (unable to update local ref)
But this is really weird. Why was your branch origin/branches/hubbe pointing to 06983fde1434ab5470b2fb656c40db32b2e6a971? That commit does not even exist in the old repository. It does exist in the new one though, and branches/Hubbe points to it.
Eh, you don't happen to run on a case-insensitive filesystem, do you?
Well, that's explains the mystery then. A case-insensitive filesystem doesn't work when there are branches whose names only differ by case because git writes the ref into a file named .git/refs/branchname, so if .git/refs/branch and .git/refs/Branch refer to the same file it will not be able to keep track of both the branches "branch" and "Branch".
Indeed, but don't keep your hopes up for a fix. If you tell them "this approach does not work if you want to do X", the usual response will be "don't want to do X then"...
There seems to be a core.ignorecase config option that looks relevant. Maybe worth trying for anyone on OS X who haven't updated yet.
Yep, seems that pulled in the conflicting branches:
dark-castle:pike79 $ git config --global core.ignorecase true dark-castle:pike79 $ git pull From pike-git.lysator.liu.se:pike + 108b128...5c8e890 branches/hubbe -> origin/branches/hubbe (forced update) + 4cff4c1...68b88f5 branches/infovav -> origin/branches/infovav (forced update) Current branch 7.9 is up to date.
Eh, according to the config file you pasted, you already had core.ignorecase set to true the first time. What does 'git show-ref branches/hubbe branches/Hubbe' say?
Indeed, that's odd.
dark-castle:pike79 $ git show-ref branches/hubbe branches/Hubbe 06983fde1434ab5470b2fb656c40db32b2e6a971 refs/remotes/origin/branches/Hubbe 5c8e89060d917cb3b06f831ed7df43e52b125c5c refs/remotes/origin/branches/hubbe
Well, those are the correct refs at least. Maybe the first set has been moved into .git/packed-refs (which has no case issues, naturally), and that's why the problem went away?
Anyway, it seems like "try once more" would be a good recomendation if pull fails on OS X. :-)
Oh, one more thing: You'll want to run "git fetch -t" to update all tags when you're done with the branches. Otherwise they will still point to the old SHA:s, and you will not be able to gc those commits.
Also note that commits reachable from the reflog will be preserved for (by default) 30 days. So unless you manually expire your reflogs, the old commits will not be garbage collected until next year.
Actually, the default reflog expire time is 90 days. It's unreachable commits that have a default 30 days expire time. Anyway, the blobs ought to be mostly the same, so I don't think the old history takes that much more space. Haven't measured it, though.
Note that old stashes also keep references to the old history. So if one wants to get rid of it then it's necessary to pop each stash into the new branches and stash them again.
Actually, the default reflog expire time is 90 days. It's unreachable commits that have a default 30 days expire time.
Yes, but the old commits _will_ be unreachable once you have pulled all refs and fixed your own local branches.
Anyway, the blobs ought to be mostly the same, so I don't think the old history takes that much more space. Haven't measured it, though.
About 50% more, according to my measurements, if you repack everything.
Note that old stashes also keep references to the old history. So if one wants to get rid of it then it's necessary to pop each stash into the new branches and stash them again.
Good catch, I hadn't thought about stashes.
On Fri, Dec 24, 2010 at 03:40:02PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Actually, the default reflog expire time is 90 days. It's unreachable commits that have a default 30 days expire time.
Yes, but the old commits _will_ be unreachable once you have pulled all refs and fixed your own local branches.
they would still be reachable from the reflog though... so it should be a total of 120 days until they disappear by themselves.
greetings, martin.
they would still be reachable from the reflog though...
"Unreachable" here means "not part of the current branch" (see the documentation). The two settings control how long commits _in the reflog_ are kept depending on whether they are "reachable" or not. If the commits can not be reached from the reflog, none of these settings apply, and only the gc.pruneExpire setting (which defaults to 2 weeks) will be considered.
so it should be a total of 120 days until they disappear by themselves.
Why 120? After 30 days both the gc.reflogExpireUnreachable and gc.pruneExpire limits will have been met, so there is no reason for the commit to be retained any longer.
pike-devel@lists.lysator.liu.se