1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-15 05:33:04 +01:00
Commit graph

29 commits

Author SHA1 Message Date
Jeff King
d3038d22f9 prune: keep objects reachable from recent objects
Our current strategy with prune is that an object falls into
one of three categories:

  1. Reachable (from ref tips, reflogs, index, etc).

  2. Not reachable, but recent (based on the --expire time).

  3. Not reachable and not recent.

We keep objects from (1) and (2), but prune objects in (3).
The point of (2) is that these objects may be part of an
in-progress operation that has not yet updated any refs.

However, it is not always the case that objects for an
in-progress operation will have a recent mtime. For example,
the object database may have an old copy of a blob (from an
abandoned operation, a branch that was deleted, etc). If we
create a new tree that points to it, a simultaneous prune
will leave our tree, but delete the blob. Referencing that
tree with a commit will then work (we check that the tree is
in the object database, but not that all of its referred
objects are), as will mentioning the commit in a ref. But
the resulting repo is corrupt; we are missing the blob
reachable from a ref.

One way to solve this is to be more thorough when
referencing a sha1: make sure that not only do we have that
sha1, but that we have objects it refers to, and so forth
recursively. The problem is that this is very expensive.
Creating a parent link would require traversing the entire
object graph!

Instead, this patch pushes the extra work onto prune, which
runs less frequently (and has to look at the whole object
graph anyway). It creates a new category of objects: objects
which are not recent, but which are reachable from a recent
object. We do not prune these objects, just like the
reachable and recent ones.

This lets us avoid the recursive check above, because if we
have an object, even if it is unreachable, we should have
its referent. We can make a simple inductive argument that
with this patch, this property holds (that there are no
objects with missing referents in the repository):

  0. When we have no objects, we have nothing to refer or be
     referred to, so the property holds.

  1. If we add objects to the repository, their direct
     referents must generally exist (e.g., if you create a
     tree, the blobs it references must exist; if you create
     a commit to point at the tree, the tree must exist).
     This is already the case before this patch. And it is
     not 100% foolproof (you can make bogus objects using
     `git hash-object`, for example), but it should be the
     case for normal usage.

     Therefore for any sequence of object additions, the
     property will continue to hold.

  2. If we remove objects from the repository, then we will
     not remove a child object (like a blob) if an object
     that refers to it is being kept. That is the part
     implemented by this patch.

     Note, however, that our reachability check and the
     actual pruning are not atomic. So it _is_ still
     possible to violate the property (e.g., an object
     becomes referenced just as we are deleting it). This
     patch is shooting for eliminating problems where the
     mtimes of dependent objects differ by hours or days,
     and one is dropped without the other. It does nothing
     to help with short races.

Naively, the simplest way to implement this would be to add
all recent objects as tips to the reachability traversal.
However, this does not perform well. In a recently-packed
repository, all reachable objects will also be recent, and
therefore we have to look at each object twice. This patch
instead performs the reachability traversal, then follows up
with a second traversal for recent objects, skipping any
that have already been marked.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:42 -07:00
Jeff King
27e1e22d5e prune: factor out loose-object directory traversal
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.

Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).

We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:39 -07:00
Junio C Hamano
3e30cb0fbf Merge branch 'mh/replace-refs-variable-rename'
* mh/replace-refs-variable-rename:
  Document some functions defined in object.c
  Add docstrings for lookup_replace_object() and do_lookup_replace_object()
  rename read_replace_refs to check_replace_refs
2014-03-14 14:27:06 -07:00
Nguyễn Thái Ngọc Duy
754dbc43f0 i18n: mark all progress lines for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 09:08:37 -08:00
Michael Haggerty
afc711b8e1 rename read_replace_refs to check_replace_refs
The semantics of this flag was changed in commit

    e1111cef23 inline lookup_replace_object() calls

but wasn't renamed at the time to minimize code churn.  Rename it now,
and add a comment explaining its use.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-20 14:16:55 -08:00
Junio C Hamano
92251b1b5b Merge branch 'nd/shallow-clone'
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).

* nd/shallow-clone: (31 commits)
  t5537: fix incorrect expectation in test case 10
  shallow: remove unused code
  send-pack.c: mark a file-local function static
  git-clone.txt: remove shallow clone limitations
  prune: clean .git/shallow after pruning objects
  clone: use git protocol for cloning shallow repo locally
  send-pack: support pushing from a shallow clone via http
  receive-pack: support pushing to a shallow clone via http
  smart-http: support shallow fetch/clone
  remote-curl: pass ref SHA-1 to fetch-pack as well
  send-pack: support pushing to a shallow clone
  receive-pack: allow pushes that update .git/shallow
  connected.c: add new variant that runs with --shallow-file
  add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
  receive/send-pack: support pushing from a shallow clone
  receive-pack: reorder some code in unpack()
  fetch: add --update-shallow to accept refs that update .git/shallow
  upload-pack: make sure deepening preserves shallow roots
  fetch: support fetching from a shallow repository
  clone: support remote shallow repository
  ...
2014-01-17 12:21:20 -08:00
Junio C Hamano
061614b309 Merge branch 'mh/path-max'
A few places where we relied on a fixed length buffer to hold
pathnames in these two programs have been converted to use strbuf.

* mh/path-max:
  builtin/prune.c: use strbuf to avoid having to worry about PATH_MAX
  prune-packed: use strbuf to avoid having to worry about PATH_MAX
2014-01-10 10:32:21 -08:00
Jeff King
4454e9cb59 builtin/prune.c: use strbuf to avoid having to worry about PATH_MAX
While at it, rename prune_tmp_object(), which used to be a helper to
remove temporary files that were created to become loose object
files, to prune_tmp_file(), as the function is also used to remove
any random cruft whose name begins with tmp_ directly in .git/object
or .git/object/pack directories these days.

Noticed-by:  Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-18 15:53:56 -08:00
Nguyễn Thái Ngọc Duy
eab3296c7e prune: clean .git/shallow after pruning objects
This patch teaches "prune" to remove shallow roots that are no longer
reachable from any refs (e.g. when the relevant refs are removed).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:19 -08:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Junio C Hamano
9d54f97e34 Merge branch 'nd/prune-packed-dryrun-verbose'
* nd/prune-packed-dryrun-verbose:
  prune-packed: avoid implying "1" is DRY_RUN in prune_packed_objects()
2013-06-06 12:17:52 -07:00
Nguyễn Thái Ngọc Duy
af0b4a3b59 prune-packed: avoid implying "1" is DRY_RUN in prune_packed_objects()
Commit b60daf0 (Make git-prune-packed a bit more chatty. - 2007-01-12)
changes the meaning of prune_packed_objects()'s argument, from "dry
run or not dry run" to a bitmap.

It however forgot to update prune_packed_objects() caller in
builtin/prune.c to use new DRY_RUN macro. It's fine (for a long time!)
but there is a risk that someday someone may change the value of
DRY_RUN to something else and builtin/prune.c suddenly breaks. Avoid
that possibility.

While at there, change "opts == VERBOSE" to "opts & VERBOSE" as there
is no obvious reason why we only be chatty when DRY_RUN is not set.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 09:20:54 -07:00
Junio C Hamano
27ec394a97 prune: introduce OPT_EXPIRY_DATE() and use it
Earlier we added support for --expire=all (or --expire=now) that
considers all crufts, regardless of their age, as eligible for
garbage collection by turning command argument parsers that use
approxidate() to use parse_expiry_date(), but "git prune" used a
built-in parse-options facility OPT_DATE() and did not benefit from
the new function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-25 11:42:10 -07:00
Junio C Hamano
f1ad05f3a5 Merge branch 'jk/fully-peeled-packed-ref' into maint-1.8.1
* jk/fully-peeled-packed-ref:
  pack-refs: add fully-peeled trait
  pack-refs: write peeled entry for non-tags
  use parse_object_or_die instead of die("bad object")
  avoid segfaults on parse_object failure
2013-04-03 08:43:03 -07:00
Junio C Hamano
870987dec7 Merge branch 'jk/fully-peeled-packed-ref'
Not that we do not actively encourage having annotated tags outside
refs/tags/ hierarchy, but they were not advertised correctly to the
ls-remote and fetch with recent version of Git.

* jk/fully-peeled-packed-ref:
  pack-refs: add fully-peeled trait
  pack-refs: write peeled entry for non-tags
  use parse_object_or_die instead of die("bad object")
  avoid segfaults on parse_object failure
2013-03-25 14:01:07 -07:00
Jeff King
f7892d1817 use parse_object_or_die instead of die("bad object")
Some call-sites do:

  o = parse_object(sha1);
  if (!o)
	  die("bad object %s", some_name);

We can now handle that as a one-liner, and get more
consistent output.

In the third case of this patch, it looks like we are losing
information, as the existing message also outputs the sha1
hex; however, parse_object will already have written a more
specific complaint about the sha1, so there is no point in
repeating it here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-17 12:52:14 -07:00
Junio C Hamano
8c11b25de4 Merge branch 'rj/path-cleanup'
* rj/path-cleanup:
  Call mkpathdup() rather than xstrdup(mkpath(...))
  Call git_pathdup() rather than xstrdup(git_path("..."))
  path.c: Use vsnpath() in the implementation of git_path()
  path.c: Don't discard the return value of vsnpath()
  path.c: Remove the 'git_' prefix from a file scope function
2012-09-14 11:53:53 -07:00
Junio C Hamano
096bbd6537 Merge branch 'nd/i18n-parseopt-help'
A lot of i18n mark-up for the help text from "git <cmd> -h".

* nd/i18n-parseopt-help: (66 commits)
  Use imperative form in help usage to describe an action
  Reduce translations by using same terminologies
  i18n: write-tree: mark parseopt strings for translation
  i18n: verify-tag: mark parseopt strings for translation
  i18n: verify-pack: mark parseopt strings for translation
  i18n: update-server-info: mark parseopt strings for translation
  i18n: update-ref: mark parseopt strings for translation
  i18n: update-index: mark parseopt strings for translation
  i18n: tag: mark parseopt strings for translation
  i18n: symbolic-ref: mark parseopt strings for translation
  i18n: show-ref: mark parseopt strings for translation
  i18n: show-branch: mark parseopt strings for translation
  i18n: shortlog: mark parseopt strings for translation
  i18n: rm: mark parseopt strings for translation
  i18n: revert, cherry-pick: mark parseopt strings for translation
  i18n: rev-parse: mark parseopt strings for translation
  i18n: reset: mark parseopt strings for translation
  i18n: rerere: mark parseopt strings for translation
  i18n: status: mark parseopt strings for translation
  i18n: replace: mark parseopt strings for translation
  ...
2012-09-07 11:09:09 -07:00
Ramsay Jones
4e2d094dde Call mkpathdup() rather than xstrdup(mkpath(...))
In addition to updating the xstrdup(mkpath(...)) call sites with
mkpathdup(), we also fix a memory leak (in merge_3way()) caused by
neglecting to free the memory allocated to the 'base_name' variable.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-04 13:34:46 -07:00
Nguyễn Thái Ngọc Duy
8f5b7281b4 i18n: prune: mark parseopt strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-20 12:23:19 -07:00
Brandon Casey
90b29cb7a8 prune.c: only print informational message in show_only or verbose mode
"git prune" reports removal of loose object files that are no longer
necessary only under the "-v" option, but unconditionally reports
removal of temporary files that are no longer needed.

The original thinking was that the presence of a leftover temporary
file should be an unusual occurrence that may indicate an earlier
failure of some sort, and the user may want to be reminded of it.
Removing an unnecessary loose object file, on the other hand, is
just part of the normal operation.  That is why the former is always
printed out and the latter only when -v is used.

But neither report is particularly useful.  Hide both of these
behind the "-v" option for consistency.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-07 15:01:37 -07:00
Karsten Blees
d34e70d6b8 fix deletion of .git/objects sub-directories in git-prune/repack
Both git-prune and git-repack (and thus, git-gc) try to rmdir while
holding a DIR* handle on the directory.  This can leave dangling
empty directories in the .git/objects on platforms where directory
cannot be removed while they are open.

First call closedir() and then rmdir(); that is more logical ordering.

Reported-by: John Chen <john0312@gmail.com>
Reported-by: Stefan Naewe <stefan.naewe@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Improved-and-Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07 10:24:33 -08:00
Jeff King
bf0a59b387 prune: handle --progress/no-progress
And have "git gc" pass no-progress when quiet.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-07 22:12:19 -08:00
Nguyễn Thái Ngọc Duy
dc347195cc prune: show progress while marking reachable objects
prune already shows progress meter while pruning. The marking part may
take a few seconds or more, depending on repository size. Show
progress meter during this time too.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-07 22:12:19 -08:00
René Scharfe
e21adb8c10 add description parameter to OPT__DRY_RUN
Allows better help text to be defined than "dry run".  Also make use
of the macro in places that already had a different description.  No
object code changes intended.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-15 09:57:37 -08:00
René Scharfe
fd03881a48 add description parameter to OPT__VERBOSE
Allows better help text to be defined than "be verbose".  Also make use
of the macro in places that already had a different description.  No
object code changes intended.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-15 09:56:51 -08:00
René Scharfe
24aea03313 prune: allow --dry-run for -n and --verbose for -v
For consistency with other git commands, let git prune accept the long
options --dry-run and --verbose for the respective short ones -n and -v.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-09 10:13:18 -07:00
Junio C Hamano
2e0e8b68e3 Merge branch 'lt/deepen-builtin-source'
* lt/deepen-builtin-source:
  Move 'builtin-*' into a 'builtin/' subdirectory

Conflicts:
	Makefile
2010-03-10 15:25:18 -08:00
Linus Torvalds
81b50f3ce4 Move 'builtin-*' into a 'builtin/' subdirectory
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of

	[torvalds@nehalem git]$ em buil<tab>
	Display all 180 possibilities? (y or n)
	[torvalds@nehalem git]$ em builtin-sh
	builtin-shortlog.c     builtin-show-branch.c  builtin-show-ref.c
	builtin-shortlog.o     builtin-show-branch.o  builtin-show-ref.o
	[torvalds@nehalem git]$ em builtin-shor<tab>
	builtin-shortlog.c  builtin-shortlog.o
	[torvalds@nehalem git]$ em builtin-shortlog.c

you get

	[torvalds@nehalem git]$ em buil<tab>		[type]
	builtin/   builtin.h
	[torvalds@nehalem git]$ em builtin		[auto-completes to]
	[torvalds@nehalem git]$ em builtin/sh<tab>	[type]
	shortlog.c     shortlog.o     show-branch.c  show-branch.o  show-ref.c     show-ref.o
	[torvalds@nehalem git]$ em builtin/sho		[auto-completes to]
	[torvalds@nehalem git]$ em builtin/shor<tab>	[type]
	shortlog.c  shortlog.o
	[torvalds@nehalem git]$ em builtin/shortlog.c

which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.

NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead.  I think bash has some cut-off
around 100 choices or something.

So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion.  But you can
simulate that by using 'ls' instead, or something similar.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-22 14:29:41 -08:00
Renamed from builtin-prune.c (Browse further)