Re: Gitting (Re: 7.8 blockers)

28 Jul 2008


      ...
Say you have a 30MB checked out source tree containing about 8000 files.
Say you committed commit A, then copied one file, altered 40% of it,
then commit it as commit B; whereas commit A is the parent of commit B.
Then subsequently when you git diff A..B, git will look at both
source-tree snapshots, will notice that in the first snapshot of 30MB
it has 8000 files, in the second snapshot of 30MB it has 8001 files,
it quickly compares hashes for all files, and figures out that 8000
files are identical, then knows for which file it is missing history.
Then git begins to search for the new content, does partial hashes for
chunks of the file, and searches through the partial hashes in the other
files; possibly augmented by an algorithm which prefers files in the
same directory or with similar names first.
How much of a file is covered by each "partial hash"?  It seems to me
that even if only 40% of the file is changed, you might still get at
least one changed byte in each part.
...
Bad matches are a rare event in practice.  Whenever someone encounters
it, improvements to the search and match algorithms in git can be
offered for inclusion in mainstream git; the patches are only accepted
if they do not cause mismatches in regressiontests which already
exercise "cornercase" matches found and fixed earlier.
But it will cause mismatch (wrt previous logic) in some cases,
otherwise it wouldn't fix the encountered problem.  The fact that it
doesn't change the behaviour for previously fixed cases doesn't mean
that it doesn't change the behaviour for some cases which _didn't_
need fixing before.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Gitting (Re: 7.8 blockers)