The above file is now available via git://pike-git.lysator.liu.se/pcvs2git.git/
Note: Requires rcs (and git).
Note that the tool is by no means finished, but it seems to generate an acceptable git archive from the repositories that I've tested with.
Caveat emptor: I've not tested with any repositories larger than the Pike 0.5 repository yet (~7000 file revisions).
Interesting. Judging from the comments in the beginning, I take it that it still doesn't try to sort out moves and copies in the cvs repository, right? Is it possible to script it to e.g operate on specific ranges at a time to be able to cope with such things manually?
Or perhaps you're planning a better way to accomplish that, e.g. by feeding it a sort of event log that describes such things, along with stuff like joins from other repos, forks into different repos (i.e. Roxen-style branching), and other things that require out-of-band "constructed" commits in git?
Interesting. Judging from the comments in the beginning, I take it that it still doesn't try to sort out moves and copies in the cvs repository, right? Is it possible to script it to e.g operate on specific ranges at a time to be able to cope with such things manually?
Correct, I've not looked at that stuff yet; I considered getting a correct commit graph more important.
I've just started adding code to analyze the $Id$-tags when they differ from the expected, they give hints about renames, copies, merges, etc. Unfortunately, it looks like the information can't be used as is, since the reason for the differing is sometimes that the file has originally been developed in a different repository (or eg in a different main branch of Pike). I guess the best approach would be to require the user to specify how each of the cases should be handled.
Or perhaps you're planning a better way to accomplish that, e.g. by feeding it a sort of event log that describes such things, along with stuff like joins from other repos, forks into different repos (i.e. Roxen-style branching), and other things that require out-of-band "constructed" commits in git?
Yes, I'm considering supporting a config file where such stuff can be specified.
I guess the best approach would be to require the user to specify how each of the cases should be handled.
Yes, I doubt automatic detection can work out such cases well enough. Same applies to cases where branch forks and tags simply aren't well defined because of manually moved tags, for instance.
Btw, here's a small patch to keep it from failing miserably in case one doesn't give an author file:
--- i/pcvs2git.pike +++ w/pcvs2git.pike @@ -515,8 +515,8 @@ class GitRepository if (sizeof(authors)) { werror("Warning: %s: Author %O is not in the authors file.\n", c->uuid, login); - res = authors[login] = parse_email_addr(login, login); } + res = authors[login] = parse_email_addr(login, login); } return res; }
Tried to run it on the Roxen/4.5 repo, but after an hour or so it failed with out of memory. It had consumed at least 3.1 Gb before that, but it managed to get to the point where it started to fork git.
pike-devel@lists.lysator.liu.se