Martin Koegler noted that create_delta() performs a new hash lookup
after every block copy encoding which are currently limited to 64KB.
In case of larger identical blocks, the next hash lookup would normally
point to the next 64KB block in the reference buffer and multiple block
copy operations will be consecutively encoded.
It is however possible that the reference buffer be sparsely indexed if
hash buckets have been trimmed down in create_delta_index() when hashing
of the reference buffer isn't well balanced. In that case the hash
lookup following a block copy might fail to match anything and the fact
that the reference buffer still matches beyond the previous 64KB block
will be missed.
Let's rework the code so that buffer comparison isn't bounded to 64KB
anymore. The match size should be as large as possible up front and
only then should multiple block copy be encoded to cover it all.
Also, fewer hash lookups will be performed in the end.
According to Martin, this patch should reduce his 92MB pack down to 75MB
with the dataset he has.
Tests performed on the Linux kernel repo show a slightly smaller pack and
a slightly faster repack.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This just basically creates a "pack_refs()" function that could be used by
anybody. You pass it in the flags you want as a bitmask (PACK_REFS_ALL and
PACK_REFS_PRUNE), and it will do all the heavy lifting.
Of course, it's still static, and it's all in the builtin-pack-refs.c
file, so it's not actually visible to the outside, but the next step would
be to just move it all to a library file (probably refs.c) and expose it.
Then we could easily make "git gc" do this too.
While I did it, I also made it check the return value of the fflush and
fsync stage, to make sure that we don't overwrite the old packed-refs file
with something that got truncated due to write errors!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
Fix git-svn to handle svn not reporting the md5sum of a file, and test.
Fix mishandling of $Id$ expanded in the repository copy in convert.c
More echo "$user_message" fixes.
Add tests for the last two fixes.
git-commit: use printf '%s\n' instead of echo on user-supplied strings
git-am: use printf instead of echo on user-supplied strings
Documentation: Add definition of "evil merge" to GIT Glossary
Replace the last 'dircache's by 'index'
Documentation: Clean up links in GIT Glossary
* maint-1.5.1:
Fix git-svn to handle svn not reporting the md5sum of a file, and test.
More echo "$user_message" fixes.
Add tests for the last two fixes.
git-commit: use printf '%s\n' instead of echo on user-supplied strings
git-am: use printf instead of echo on user-supplied strings
Documentation: Add definition of "evil merge" to GIT Glossary
Replace the last 'dircache's by 'index'
Documentation: Clean up links in GIT Glossary
If the repository contained an expanded ident keyword (i.e. $Id:XXXX$),
then the wrong bytes were discarded, and the Id keyword was not
expanded. The fault was in convert.c:ident_to_worktree().
Previously, when a "$Id:" was found in the repository version,
ident_to_worktree() would search for the next "$" after this, and
discarded everything it found until then. That was done with the loop:
do {
ch = *cp++;
if (ch == '$')
break;
rem--;
} while (rem);
The above loop left cp pointing one character _after_ the final "$"
(because of ch = *cp++). This was different from the non-expanded case,
were cp is left pointing at the "$", and was different from the comment
which stated "discard up to but not including the closing $". This
patch fixes that by making the loop:
do {
ch = *cp;
if (ch == '$')
break;
cp++;
rem--;
} while (rem);
That is, cp is tested _then_ incremented.
This loop exits if it finds a "$" or if it runs out of bytes in the
source. After this loop, if there was no closing "$" the expansion is
skipped, and the outer loop is allowed to continue leaving this
non-keyword as it was. However, when the "$" is found, size is
corrected, before running the expansion:
size -= (cp - src);
This is wrong; size is going to be corrected anyway after the expansion,
so there is no need to do it here. This patch removes that redundant
correction.
To help find this bug, I heavily commented the routine; those comments
are included here as a bonus.
Signed-off-by: Andy Parkins <andyparkins@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates t4014 to check the two fixes for git-am and git-commit
we observed with "echo" that does backslash interpolation by default
without being asked with -e option.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes the same issue git-am had, which was fixed by Jeff
King in the previous commit. Cleverly enough, this commit's log
message is a good test case at the same time.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Under some implementations of echo (such as that provided by
dash), backslash escapes are recognized without any other
options. This means that echo-ing user-supplied strings may
cause any backslash sequences in them to be converted. Using
printf resolves the ambiguity.
This bug can be seen when using git-am to apply a patch
whose subject contains the character sequence "\n"; the
characters are converted to a literal newline. Noticed by
Szekeres Istvan.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now that the default delta depth is 50, it is a good idea to also bump
MAX_CHAIN to 50.
While at it, make the display a bit prettier by making the MAX_CHAIN
limit inclusive, and display the number of deltas that are above that
limit at the end instead of the beginning.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Ensure that the same link is not repeated in single glossary entry,
and that there is no self-link i.e. link to current entry.
Add links to other definitions in git glossary.
Remove inappropriate (nonsense) links, or change link to link to
correct definition (to correct term).
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
fix memory leak in parse_object when check_sha1_signature fails
name-rev: tolerate clock skew in committer dates
Update bash completion for git-config options
Teach bash completion about recent log long options
Teach bash completion about 'git remote update'
Update bash completion header documentation
Remove a duplicate --not option in bash completion
Teach bash completion about git-shortlog
Hide the plumbing diff-{files,index,tree} from bash completion
Update bash completion to ignore some more plumbing commands
* 'master' of git://repo.or.cz/git/fastimport:
Update bash completion for git-config options
Teach bash completion about recent log long options
Teach bash completion about 'git remote update'
Update bash completion header documentation
Remove a duplicate --not option in bash completion
Teach bash completion about git-shortlog
Hide the plumbing diff-{files,index,tree} from bash completion
Update bash completion to ignore some more plumbing commands
I've taught myself to use "git gc" instead of doing the repack explicitly,
but it doesn't actually do what I think it should do.
We've had packed refs for a long time now, and I think it just makes sense
to pack normal branches too. So I end up having to do
git pack-refs --all --prune
in order to get a nice git repo that doesn't have any unnecessary files.
So why not just do that in "git gc"? It's not as if there really is any
downside to packing branches, even if they end up changing later. Quite
often they don't, and even if they do, so what?
Also, make the default for refs packing just be an unambiguous "do it",
rather than "do it by default only for non-bare repositories". If you want
that behaviour, you can always just add a
[gc]
packrefs = notbare
in your ~/.gitconfig file, but I don't actually see why bare would be any
different (except for the broken reason that http-fetching used to be
totally broken, and not doing it just meant that it didn't even get
fixed in a timely manner!).
So here's a trivial patch to make "git gc" do a better job. Hmm?
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When check_sha1_signature fails, program is not terminated:
it prints an error message and returns NULL, so the
buffer returned by read_sha1_file should be freed before.
Signed-off-by: Carlos Rica <jasampler@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In git.git repository, "git-name-rev v1.3.0~158" cannot name the
rev, while adjacent revs can be named.
This was because it gives up traversal from the tips of existing
refs as soon as it sees a commit that has older commit timestamp
than what is being named. This is usually a good heuristics,
but v1.3.0~158 has a slightly older commit timestamp than
v1.3.0~157 (i.e. it's child), as these two were made in a
separate repostiory (in fact, in a different continent).
This adds a hardcoded slop value (1 day) to the cut-off
heuristics to work this kind of problem around. The current
algorithm essentially runs around from the available tips down
to ancient commits and names every single rev available that are
newer than cut-off date, so a single day slop would not add that
much overhead in repositories with long enough history where the
performance of name-rev matters.
I think the algorithm could be made a bit smarter by deepening
the graph on demand as a new commit is asked to be named (this
would require rewriting of name_rev() function not to recurse
itself but use a traversal list like revision.c traverser does),
but that would be a separate issue.
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
A few new configuration options grew out of the woodwork during the
1.5.2 series. Most of these are pretty easy to support a completion
of, so we do so.
I wanted to also add completion support for the <driver> part of
merge.<driver>.name but to do that we have to look at all of the
.gitattributes files and guess what the unique set of <driver>
strings would be. Since this appears to be non-trivial I'm punting
on it at this time.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
(Somewhat) recently git-log learned about --reverse (to show commits
in the opposite order) and a looong time ago I think it learned
about --raw (to show the raw diff, rather than a unified diff).
These are both useful options, so we should make them easy for the
user to complete.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Recently the git-remote command grew an update subcommand, which
can be used to execute git-fetch across multiple repositories
in a single step. These can be configured with the 'remotes.*'
configuration options, so we can offer completion for any name that
matches and appears to be useful to git-remote update.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
1) Added a note about supporting the long options for most commands,
as we have been doing so for quite some time.
2) Include a notice that these routines are covered by the GPL,
as that may not be obvious, even though they are distributed
as part of the core Git distribution.
3) Added a short section on how to send patches to the routines,
and to whom they should get sent to. Currently that is me,
as I am the active maintainer.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This was just me being silly; I put the --not option into the
completion list twice. There's no duplicates shown in the shell
as the shell removes them before showing them to the user. But we
really don't need the duplicates in the source script either.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We've had completion for git-log for quite some time, but just
today I noticed we don't have it for the new builtin shortlog
that runs git-log internally. This is indeed a handy thing to
have completion for, especially when your branch names are of
the Very-Very-Long-and-Hard/To-Type/Variety/That-Some-Use.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The diff-* programs are meant to be plumbing for the diff frontend;
most end users aren't invoking these commands directly. Consequently
we should avoid showing them as possible completions.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When e8438420bb allowed us to reload
the marks table on subsequent runs of fast-import we really broke
things, as we set pack_id to MAX_PACK_ID for any objects we imported
into the marks table. Creating a branch from that mark should fail
as we attempt to read the object through a non-existant packed_git
pointer. Instead we have to use the normal Git object system to
locate the older commit, as we ourselves do not have a reference
to the packed_git it resides in.
This bug only occurred because t9300 was not complete enough.
When we added the --import-marks feature we didn't actually test
its implementation enough to verify the function worked as intended.
I have corrected that, and included the changes as part of this fix.
Prior versions of fast-import fail the new test(s); this commit
allows them to pass.
Credit for this bug find goes to Simon Hausmann <simon@lst.de> as
he recently identified a similiar bug in the tree lazy-loading path.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
To resolve a corner case uncovered by Simon Hausmann I need to
reuse the logic for the SHA-1 expression version of the 'from '
command within the mark version of the 'from ' command. This change
doesn't alter any functionality, but is merely breaking the common
code out to a function that I can reuse.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Commit a5c1780a03 sets the pack_id of existing
objects to MAX_PACK_ID. When the same object is referenced later again it is
found in the local object hash. With such a pack_id fast-import should not try
to locate that object in the newly created pack(s).
Signed-off-by: Simon Hausmann <simon@lst.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix uninitialized last_object->no_free variable that is accessed in
store_object.
Signed-off-by: Simon Hausmann <simon@lst.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
git-archive already knows how to generate an archive as a tar or a zip
file, but gitweb did not. zip archvies are much more usable in a Windows
environment due to native support and this patch allows a site admin the
option to deliver zip rather than tar files. The selection is done by
inserting
$feature{'snapshot'}{'default'} = ['x-zip', 'zip', ''];
in gitweb_config.perl.
Tar files remain the default option.
Signed-off-by: Mark Levedahl <mdl123@verizon.net>
Acked-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This works in repositories that have their refs packed by
"git-pack-refs --all --prune" whereas testing the file
$git_dir/refs/heads/$opt_o does not.
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The parser was inconsistently done, in that it did not look at
the last command line parameter to see if it could be an unknown
option, although it was designed to notice unknown options if
they were given in positions the command expects to find them
(i.e. everything except the last parameter, which ought to be
<commit-ish>). This prevented a very natural invocation
$ git cherry-pick --usage
from issuing the usage help.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch documents the branch.autosetupmerge config option, added
by commit 0746d19a.
Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
While we can easily test the cvs <-> git-cvsserver
communication with :fork: and git-cvsserver server
there are some pserver specifics we should test, too.
Currently this are two tests of the pserver authentication.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add some cvs update tests that include various merge
situations. Also add a basic test for update -C
since it fits so well in there.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add a few test cases for the config file parsing
done by git-cvsserver.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>