Re: Gitting (Re: 7.8 blockers)

27 Jul 2008


      Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
...
...
It does this only when doing diffs or annotations, not when retrieving
logs.
...
So the log will not show the original filename of a copy then, only
for a "resurrection"?  Can this be fixed by using the resurrection
method (whatever it is) instead of regular cp to copy the file?  I
mean, you should be able to "resurrect" a file regardless of whether
it was actually deleted in a later commit or not, and since you can
"resurrect" it to a new filename, that would serve as a copy
operation.
Maybe we should avoid getting lost in terminology.  Maybe your idea of
what a log is, and git's idea of what a log is differs.
It might help to understand what git does, in essence, when committing.
As a matter of fact, everytime you commit changes, what git basically
do is the following:
a. Take a snapshot of the *entire* source tree (not just the changes).
b. Take the SHA1-hash of the previous commit, and store it as the first
   parent of the current commit.
c. Collect any other SHA1-hashes of other commits that is being merged
   from (regardless if they are part of the current branch, or not).
d. Generate a new SHA1 for the current commit, which hashes all of the
   source-tree snapshot and collected parent SHA1s.
e. Make the newly calculated SHA1 the head of the current branch.
In diff/annotate or other views of the history, all the copies are
inferred from the full tree-snapshots and their parent relationships.
The snapshots themselves do not carry information on which file went
from where to where.  Yet when you do git diff or git annotate or
git blame, it will acurately and swiftly give you information on copied,
renamed and moved files or pieces of source.
...
...
No, it is not a performance problem because everything internally is
hashed, i.e. it's not comparing large amounts of texts, it's comparing
hashes mostly.
...
Ok, now I'm confused.  You previously implied that I could make local
changes (up to 50%) before commit and it would still be recognized as
a copy.  That would mean that it has to compare more than just a hash.
Say you have a 30MB checked out source tree containing about 8000 files.
Say you committed commit A, then copied one file, altered 40% of it,
then commit it as commit B; whereas commit A is the parent of commit B.
Then subsequently when you git diff A..B, git will look at both
source-tree snapshots, will notice that in the first snapshot of 30MB
it has 8000 files, in the second snapshot of 30MB it has 8001 files,
it quickly compares hashes for all files, and figures out that 8000
files are identical, then knows for which file it is missing history.
Then git begins to search for the new content, does partial hashes for
chunks of the file, and searches through the partial hashes in the other
files; possibly augmented by an algorithm which prefers files in the
same directory or with similar names first.
Then it tells you what it finds, and gives you the verdict that a file
is a copy of, or just that chunks of the file appear in other files (and
connects history there).
...
...
...
Also, I'm not entirely comfortable with the though that the history of
an already committed file might spontanously change later.
...
...
It doesn't.  There are such things as regression tests, development and
quality assurance on git is *very* active.
...
You said previously that the "usual" way to fix an incorrect guess was
to change the logic in git, and that this had indeed happened.  How
does regression tests help if people are deliberately changing the
behaviour?
Bad matches are a rare event in practice.  Whenever someone encounters
it, improvements to the search and match algorithms in git can be
offered for inclusion in mainstream git; the patches are only accepted
if they do not cause mismatches in regressiontests which already
exercise "cornercase" matches found and fixed earlier.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are 10 types of people in the world.
 Those who understand binary and those who do not."

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Gitting (Re: 7.8 blockers)