* jc/rev-list:
rev-list --objects: use full pathname to help hashing.
* jc/pack-thin:
pack-objects: hash basename and direname a bit differently.
pack-objects: allow "thin" packs to exceed depth limits
pack-objects: use full pathname to help hashing with "thin" pack.
* np/delta:
Revert "diff-delta: produce optimal pack data"
Tweak break/merge score to adjust to the new delta generation code.
count-delta: fix counting of copied source.
This reverts 6b7d25d97b commit.
It turns out that the new algorithm has a really bad corner
case, that literally spends minutes for inputs that takes less
than a quater seconds to delta with the old algorithm. The
resulting delta is 50% smaller which is admirable, but the
performance degradation is simply unacceptable for unconditional
use.
Some example cases are these blobs in Linux 2.6 repository:
4917ec509720a42846d513addc11cbd25e0e3c4f
9af06ba723df75fed49f7ccae5b6c9c34bc5115f
dfc9cd58dc065d17030d875d3fea6e7862ede143
Signed-off-by: Junio C Hamano <junkio@cox.net>
...so that "Makefile"s from different revs are sorted together,
separate from "t/Makefile"s, but close enough.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This helps to group the same files from different revs together,
while spreading files with the same basename in different
directories, to help pack-object.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When creating a new pack to be used in .git/objects/pack/
directory, we carefully count the depth of deltified objects to
be reused, so that the generated pack does not to exceed the
specified depth limit for runtime efficiency. However, when we
are generating a thin pack that does not contain base objects,
such a pack can only be used during network transfer that is
expanded on the other end upon reception, so being careful and
artificially cutting the delta chain does not buy us anything
except increased bandwidth requirement. This patch disables the
delta chain depth limit check when reusing an existing delta.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since i wanted to limit the graph box size i was resetting
the window after an index of 5. This result in line joining
commit nodes to pass over nodes which are not related. The
changes fixes the same
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Running "git-am --resolved" without doing anything can create an empty
commit. Prevent it.
Thanks for Eric W. Biederman for spotting this.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This lowers the default merge threshold score to 75% from
earlier 80%. The break threshold stays the same at 50% for now,
but we might want to revisit it (and the rename detection limit
as well).
* break score: this much edit (both insertion of new material
and deletion of old material) needs to be there in the file
before we consider this _might_ be a rewrite and break the
filepair.
* merge score: after a filepair is broken by the above criteria
and goes through rename detection, if their pieces did not
match with other files as rename/copy, we merge them back
into one as if nothing happened. If the filepair had at
least this much deletion of old material, however, we say
this is completely rewritten with dissimilarity index X% when
we do so.
The updated delta code by Nico is so good that what we earlier
thought to be complete rewrite now reuses a lot more from the
source material (reducing the counted "delete"), so this
adjustment is needed to keep the perceived behaviour similar to
what we had earlier.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The previous one wrongly coalesced a span with the next one
even though the span being added does not reach it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
In windows you cannot remove current or opened directory,
an opened file, a running program, a loaded library, etc...
[jc: signoffs? With a minor quoting fix.]
Signed-off-by: Junio C Hamano <junkio@cox.net>
With the finer grained delta algorithm, count-delta algorithm
started overcounting copied source material, since the new delta
output tends to reuse the same source range more than once and
more aggressively. This broke an earlier assumption that the
number of bytes copied out from the source buffer is a good
approximation how much source material is actually remaining in
the result.
This uses fairly inefficient algorithm to keep track of ranges
of source material that are actually copied out to the
destination buffer. With this tweak, the obvious rename/break
detection tests in the testsuite start to work again.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This uses the same hashing algorithm to the "preferred base
tree" objects and the incoming pathnames, to group the same
files from different revs together, while spreading files with
the same basename in different directories.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since we sort objects by type, hash, preferredness and then
size, after we have a delta against preferred base, there is no
point trying a delta with non-preferred base. This seems to
save expensive calls to diff-delta and it also seems to save the
output space as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This implements "eye candy" similar to the pack-object/unpack-object
to entertain users while a large tree is being checked out after
a clone or a pull.
Signed-off-by: Junio C Hamano <junkio@cox.net>
New tests are added to the git-rm test case to cover this as well.
Signed-off-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds a git-rm command which provides convenience similar to
git-add, (and a bit more since it takes care of the rm as well if
given -f).
Like git-add, git-rm expands the given path names through
git-ls-files. This means it only acts on files listed in the
index. And it does act recursively on directories by default, (no -r
needed as in the case of rm itself). When it recurses, it does not
remove empty directories that are left behind.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Unless --no-tags flag was given, git-fetch tried to always
follow remote tags that point at the commits we picked up.
It is not very useful to pick up tags from remote unless storing
the fetched branch head in a local tracking branch. This is
especially true if the fetch is done to merge the remote branch
into our current branch as one-shot basis (i.e. "please pull"),
and is even harmful if the remote repository has many irrelevant
tags.
This proposed update disables the automated tag following unless
we are storing the a fetched branch head in a local tracking
branch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates the progress output to match "every one second or
every percent whichever comes early" used by unpack-objects, as
discussed on the list.
Signed-off-by: Junio C Hamano <junkio@cox.net>
If that pack is big, it takes significant time to write and might
benefit from some more eye candies as well. This is however disabled
when the pack is written to stdout since in that case the output is
usually piped into unpack_objects which already does its own progress
reporting.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This provides a stable and simpler progress reporting mechanism that
updates progress as often as possible but accurately not updating more
than once a second. The deltification phase is also made more
interesting to watch (since repacking a big repository and only seeing a
dot appear once every many seconds is rather boring and doesn't provide
much food for anticipation).
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
"empty ident not allowed" error makes commit-tree fail, so we
are already safer in that we would not end up with commit
objects that have bogus names on the author or committer fields.
However, before commit-tree is called there are already changes
made to the index file and the working tree. The operation can
be resumed after fixing the environment problem, but when this
triggers to a newcomer with unusable gecos, the first question
becomes "what did I lose and how would I recover".
This patch modifies some Porcelainish commands to verify
GIT_COMMITTER_IDENT as soon as we know we are going to make some
commits before doing much damage to prevent confusion.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Previous one warned people upfront to encourage fixing their
environment early, but some people just use repositories and git
tools read-only without making any changes, and in such a case
there is not much point insisting on them having a usable ident.
This round attempts to move the error until either "git-var"
asks for the ident explicitly or "commit-tree" wants to use it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It appears that some people who did not care about having bogus
names in their own commit messages are bitten by the recent
change to require a sane environment [*1*].
While it was a good idea to prevent people from using bogus
names to create commits and doing sign-offs, the error message
is not very informative. This patch attempts to warn things
upfront and hint people how to fix their environments.
[Footnote]
*1* The thread is this one.
http://marc.theaimsgroup.com/?t=113868084800004
Especially this message.
http://marc.theaimsgroup.com/?m=113932830015032
Signed-off-by: Junio C Hamano <junkio@cox.net>
This tries to rework the solution for the excess delta chain
problem. An earlier commit worked it around ``cheaply'', but
repeated repacking risks unbound growth of delta chains.
This version counts the length of delta chain we are reusing
from the existing pack, and makes sure a base object that has
sufficiently long delta chain does not get deltified.
Signed-off-by: Junio C Hamano <junkio@cox.net>
A new flag -q makes underlying pack-objects less chatty.
A new flag -f forces delta to be recomputed from scratch.
Signed-off-by: Junio C Hamano <junkio@cox.net>