When we pack everything into one big pack with "git repack
-Ad", any unreferenced objects in to-be-deleted packs are
exploded into loose objects, with the intent that they will
be examined and possibly cleaned up by the next run of "git
prune".
Since the exploded objects will receive the mtime of the
pack from which they come, if the source pack is old, those
loose objects will end up pruned immediately. In that case,
it is much more efficient to skip the exploding step
entirely for these objects.
This patch teaches pack-objects to receive the expiration
information and avoid writing these objects out. It also
teaches "git gc" to pass the value of gc.pruneexpire to
repack (which in turn learns to pass it along to
pack-objects) so that this optimization happens
automatically during "git gc" and "git gc --auto".
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-pack-objects is already careful to start out its temporary packs
under .git/objects/pack/ (cf. 8b4eb6b, Do not perform cross-directory
renames when creating packs, 2008-09-22), but git-repack did not
respond in kind so the effort was lost when the filesystem boundary is
exactly at that directory.
Let git-repack pass a path under .git/objects/pack/ as the base for
its temporary packs.
This means we might need the $PACKDIR sooner (before the pack-objects
invocation), so move the mkdir up just to be safe.
Also note that the only use of *.pack is in the find invocation way
before the pack-objects call, so the temporary packs will not suddenly
show up in any wildcards because of the directory change.
Reported-by: Marat Radchenko <marat@slonopotamus.org>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 479b56ba ('make "repack -f" imply "pack-objects --no-reuse-object"'),
git repack -f was changed to include recompressing all objects on the
zlib level on the assumption that if the user wants to spend that much
time already, some more time won't hurt (and recompressing is useful if
the user changed the zlib compression level).
However, "some more time" can be quite long with very big repositories,
so some users are going to appreciate being able to choose. If we are
going to give them the choice, --no-reuse-object will probably be
interesting a lot less frequently than --no-reuse-delta. Hence, this
reverts -f to the old behaviour (--no-reuse-delta) and adds a new -F
option that replaces the current -f.
Measurements taken using this patch on a current clone of git.git
indicate a 17% decrease in time being made available to users:
git repack -Adf 34.84s user 0.56s system 145% cpu 24.388 total
git repack -AdF 38.79s user 0.56s system 133% cpu 29.394 total
Signed-off-by: Jan Krüger <jk@jk.gs>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/maint-graft-unhide-true-parents:
git repack: keep commits hidden by a graft
Add a test showing that 'git repack' throws away grafted-away parents
Conflicts:
git-repack.sh
When you have grafts that pretend that a given commit has different
parents than the ones recorded in the commit object, it is dangerous
to let 'git repack' remove those hidden parents, as you can easily
remove the graft and end up with a broken repository.
So let's play it safe and keep those parent objects and everything
that is reachable by them, in addition to the grafted parents.
As this behavior can only be triggered by git pack-objects, and as that
command handles duplicate parents gracefully, we do not bother to cull
duplicated parents that may result by using both true and grafted
parents.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that there is say() in git-sh-setup, these scripts don't need to use
their own. Migrate them over by setting GIT_QUIET and removing their
custom say() functions.
Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
Conflicts:
t/t7700-repack.sh
The --kept-pack-only option to pack-objects treats all kept packs as equal.
This results in objects that reside in an alternate pack that has a .keep
file, not being packed into a newly created pack when the user specifies the
-a option to repack. Since the user may not have any control over the
alternate database, git should not refrain from repacking those objects
even though they are in a pack with a .keep file.
This fixes the 'packed obs in alternate ODB kept pack are repacked' test in
t7700.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-1.6.0-keep-pack:
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.
The new --kept-pack-only option means just that. When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The script used $args and $existing without initializing it to empty. It
would have been confused by an environment variable the end user had
before running it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some platforms refuse to rename a file that is open. When repacking an
already packed repository without adding any new object, the resulting
pack will contain the same set of objects as an existing pack, and on such
platforms, a newly created packfile cannot replace the existing one.
The logic detected this issue but did not try hard enough to recover from
it. Especially because the files that needs renaming come in pairs, there
potentially are different failure modes that one can be renamed but the
others cannot. Asking manual recovery to end users were error prone.
This patch tries to make it more robust by first making sure all the
existing files that need to be renamed have been renamed before
continuing, and attempts to roll back if some failed to rename.
This is based on an initial patch by Robin Rosenberg.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -A option calls pack-objects with the --unpack-unreachable option so
that the unreachable objects in local packs are left in the local object
store loose. But if the -d option to repack was _not_ used, then these
unpacked loose objects are redundant and unnecessary.
Update tests in t7701.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repack is called with either the -a or -A option, the user has
requested to repack all objects including those referenced by the
alternates mechanism. Currently, if there are no local packs without
.keep files, then repack will call pack-objects with the
'--unpacked --incremental' options which causes it to exclude alternate
packed objects. So, remove this fallback.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user created a .keep file for a local pack, then it can be inferred
that the user does not want those objects repacked.
This fixes the repack bug tested by t7700.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you misuse a git command, you are shown the usage string.
But this is currently shown in the dashed form. So if you just
copy what you see, it will not work, when the dashed form
is no longer supported.
This patch makes git commands show the dash-less version.
For shell scripts that do not specify OPTIONS_SPEC, git-sh-setup.sh
generates a dash-less usage string now.
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As announced for 1.6.0.
Access over the native protocol by old git versions is unaffected as
this capability is negociated by the protocol. Otherwise setting this
config option to "false" and doing a 'git repack -a -d' is enough to
remain compatible with ancient git versions (older than 1.4.4).
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the pack-files are now always created stably on disk, there is no
need to sync() before pruning lose objects or old stale pack-files.
[jc: with Nico's clean-up]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/repack:
Documentation/git-repack.txt: document new -A behaviour
let pack-objects do the writing of unreachable objects as loose objects
add a force_object_loose() function
builtin-gc.c: deprecate --prune, it now really has no effect
git-gc: always use -A when manually repacking
repack: modify behavior of -A option to leave unreferenced objects unpacked
Conflicts:
builtin-pack-objects.c
Commit ccc1297226 changed the behavior
of 'git repack -A' so unreachable objects are stored as loose objects.
However it did so in a naive and inn efficient way by making packs
about to be deleted inaccessible and feeding their content through
'git unpack-objects'. While this works, there are major flaws with
this approach:
- It is unacceptably sloooooooooooooow.
In the Linux kernel repository with no actual unreachable objects,
doing 'git repack -A -d' before:
real 2m33.220s
user 2m21.675s
sys 0m3.510s
And with this change:
real 0m36.849s
user 0m24.365s
sys 0m1.950s
For reference, here's the timing for 'git repack -a -d':
real 0m35.816s
user 0m22.571s
sys 0m2.011s
This is explained by the fact that 'git unpack-objects' was used to
unpack _every_ objects even if (almost) 100% of them were thrown away.
- There is a black out period.
Between the removal of the .idx file for the redundant pack and the
completion of its unpacking, the unreachable objects become completely
unaccessible. This is not a big issue as we're talking about unreachable
objects, but some consistency is always good.
- There is no way to easily set a sensible mtime for the newly created
unreachable loose objects.
So, while having a command called "pack-objects" to perform object
unpacking looks really odd, this is probably the best compromize to be
able to solve the above issues in an efficient way.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous behavior of the -A option was to retain any previously
packed objects which had become unreferenced, and place them into the newly
created pack file. Since git-gc, when run automatically with the --auto
option, calls repack with the -A option, this had the effect of retaining
unreferenced packed objects indefinitely. To avoid this scenario, the
user was required to run git-gc with the little known --prune option or
to manually run repack with the -a option.
This patch changes the behavior of the -A option so that unreferenced
objects that exist in any pack file being replaced, will be unpacked into
the repository. The unreferenced loose objects can then be garbage collected
by git-gc (i.e. git-prune) based on the gc.pruneExpire setting.
Also add new tests for checking whether unreferenced objects which were
previously packed are properly left in the repository unpacked after
repacking.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 5715d0b (Migrate git-repack.sh to use git-rev-parse --parseopt,
2007-11-04), parsing of the '-n' command line option was accidentally lost
when git-repack.sh was migrated to use git-rev-parse --parseopt. This adds
it back.
Signed-off-by: A Large Angry SCM <gitzilla@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Discussion on the list tonight came to the conclusion that showing
the name of the packfile we just created during git-repack is not
a very useful message for any end-user. For the really technical
folk who need to have the name of the newest packfile they can use
something such as `ls -t .git/objects/pack | head -2` to find the
most recently created packfile.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* jc/autogc:
git-gc --auto: run "repack -A -d -l" as necessary.
git-gc --auto: restructure the way "repack" command line is built.
git-gc --auto: protect ourselves from accumulated cruft
git-gc --auto: add documentation.
git-gc --auto: move threshold check to need_to_gc() function.
repack -A -d: use --keep-unreachable when repacking
pack-objects --keep-unreachable
Export matches_pack_name() and fix its return value
Invoke "git gc --auto" from commit, merge, am and rebase.
Implement git gc --auto
A lot of shell scripts contained stuff starting with
while case "$#" in 0) break ;; esac
and similar. I consider breaking out of the condition instead of the
body od the loop ugly, and the implied "true" value of the
non-matching case is not really obvious to humans at first glance. It
happens not to be obvious to some BSD shells, either, but that's
because they are not POSIX-compliant. In most cases, this has been
replaced by a straight condition using "test". "case" has the
advantage of being faster than "test" on vintage shells where "test"
is not a builtin. Since none of them is likely to run the git
scripts, anyway, the added readability should be worth the change.
A few loops have had their termination condition expressed
differently.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packfile portion of the "remove redundant" code
near the bottom of git-repack.sh is broken when
pack splitting occurs. Particularly since this is
the only place where we automatically delete packfiles,
make sure it works properly for all cases, old or new.
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add --max-pack-size parsing and usage messages.
Upgrade git-repack.sh to handle multiple packfile names,
and build packfiles in GIT_OBJECT_DIRECTORY not GIT_DIR.
Update documentation.
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Recomputing delta is much more expensive than recompressing
anyway, and when the user says 'repack -f', it is a sign that
the user is willing to spend CPU cycles.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Steven Grimm noticed that git-repack's verbosity is inconsistent
because pack-objects is chatty and prune-packed is not. This
makes the latter a bit more chatty and gives -q option to
squelch it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds a new option --reflog to pack-objects and revision
machinery; do not bother documenting it for now, since this is
only useful for local repacking.
When the option is passed, objects reachable from reflog entries
are marked as interesting while computing the set of objects to
pack.
Signed-off-by: Junio C Hamano <junkio@cox.net>
During `git repack -a -d` only repack objects which are loose or
which reside in an active (a non-kept) pack. This allows the user
to keep large packs as-is without continuous repacking and can be
very helpful on large repositories. It should also help us resolve
a race condition between `git repack -a -d` and the new pack store
functionality in `git-receive-pack`.
Kept packs are those which have a corresponding .keep file in
$GIT_OBJECT_DIRECTORY/pack. That is pack-X.pack will be kept
(not repacked and not deleted) if pack-X.keep exists in the same
directory when `git repack -a -d` starts.
Currently this feature is not documented and there is no user
interface to keep an existing pack.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* np/pack:
add the capability for index-pack to read from a stream
index-pack: compare only the first 20-bytes of the key.
git-repack: repo.usedeltabaseoffset
pack-objects: document --delta-base-offset option
allow delta data reuse even if base object is a preferred base
zap a debug remnant
let the GIT native protocol use offsets to delta base when possible
make pack data reuse compatible with both delta types
make git-pack-objects able to create deltas with offset to base
teach git-index-pack about deltas with offset to base
teach git-unpack-objects about deltas with offset to base
introduce delta objects with offset to base
When configuration variable `repack.UseDeltaBaseOffset` is set
for the repository, the command passes `--delta-base-offset`
option to `git-pack-objects`; this typically results in slightly
smaller packs, but the generated packs are incompatible with
versions of git older than (and including) v1.4.3.
We will make it default to true sometime in the future, but not
for a while.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now that we explicitly create all tmpfiles below $GIT_DIR, there's no reason
to care about which directory we're in.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Avoid failing when cwd is !writable by writing the
packfiles in $GIT_DIR, which is more in line with other commands.
Without this, git-repack was failing when run from crontab
by non-root user accounts. For large repositories, this
also makes the mv operation a lot cheaper, and avoids leaving
temp packfiles around the fs upon failure.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes the following warning:
git-repack: line 42: cd: .git/objects/pack: No such file or directory
This happens only, when git-repack -a is run without any packs in the
repository.
Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We are trying to catch error condition of git-rev-list and cause
the downstream pack-objects to barf, but if you run rev-list
with anything that mucks with its stderr (such as GIT_TRACE),
any stderr output would cause the pipeline to fail.
[jc: originally from Matthias Lederhofer, with a reworded error message.]
Signed-off-by: Junio C Hamano <junkio@cox.net>
After a clone, packfiles are read-only by default and "mv" to
replace the pack with a new one goes interactive, asking if the
user wants to replace it. If one is successfully moved and the
other is not, the pack and its idx would become out-of-sync and
corrupts the repository.
Recovering is straightforward -- it is just the matter of
finding the remaining .tmp-pack-* and make sure they are both
moved -- but we should be extra careful not to do something so
alarming to the users.
Signed-off-by: Junio C Hamano <junkio@cox.net>