I guess the best approach would be to require the user to specify how each of the cases should be handled.
Yes, I doubt automatic detection can work out such cases well enough. Same applies to cases where branch forks and tags simply aren't well defined because of manually moved tags, for instance.
Btw, here's a small patch to keep it from failing miserably in case one doesn't give an author file:
--- i/pcvs2git.pike +++ w/pcvs2git.pike @@ -515,8 +515,8 @@ class GitRepository if (sizeof(authors)) { werror("Warning: %s: Author %O is not in the authors file.\n", c->uuid, login); - res = authors[login] = parse_email_addr(login, login); } + res = authors[login] = parse_email_addr(login, login); } return res; }
Tried to run it on the Roxen/4.5 repo, but after an hour or so it failed with out of memory. It had consumed at least 3.1 Gb before that, but it managed to get to the point where it started to fork git.