On Wed, 8 Apr 2009, Björn Steinbrink wrote:
>
> The name of the processed object was duplicated for passing it to
> add_object(), but that already calls path_name, which allocates a new
> string anyway. So the memory allocated by the xstrdup calls just went
> nowhere, leaking memory.
Ack, ack.
There's another easy 5% or so for the built-in object walker: once we've
created the hash from the name, the name isn't interesting any more, and
so something trivial like this can help a bit.
Does it matter? Probably not on its own. But a few more memory saving
tricks and it might all make a difference.
Linus
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
Conflicts:
t/t7700-repack.sh
On a 32-bit system, the maximum possible size for an object is less than
4GB, while 64-bit systems may cope with larger objects. Due to this
limitation, variables holding object sizes are using an unsigned long
type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
When large objects are encountered, and/or people play with large delta
depth values, it is possible for the maximum allowed delta size
computation to overflow, especially on a 32-bit system. When this
occurs, surviving result bits may represent a value much smaller than
what it is supposed to be, or even zero. This prevents some objects
from being deltified although they do get deltified when a smaller depth
limit is used. Fix this by always performing a 64-bit multiplication.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If pack-objects is called with the --unpack-unreachable option then it
will unpack (i.e. loosen) all unreferenced objects from local not-kept
packs, including those that also exist in packs residing in an alternate
object database or a locally kept pack. The only user of this option is
git-repack.
In this case, repack will follow the call to pack-objects with a call to
prune-packed, which will delete these newly loosened objects, making the
act of loosening a waste of time. The unnecessary loosening can be
avoided by checking whether an object exists in a non-local pack or a
locally kept pack before loosening it.
This fixes the 'local packed unreachable obs that exist in alternate ODB
are not loosened' test in t7700.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack. It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two features were invented for use by repack when repack will delete
the local packs that have been made redundant. The packs accessible
through alternates are not deleted by repack, so the objects contained in
them are still accessible after the local packs are deleted. They do not
need to be repacked into the new pack or loosened. For the case of
loosening they would immediately be deleted by the subsequent prune-packed
that is called by repack anyway.
This fixes the test
'packed unreachable obs in alternate ODB are not loosened' in t7700.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.
Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.
This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.
The new --kept-pack-only option means just that. When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack(). The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a repository created with git older than f49fb35 (git-init-db: create
"pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is
not created upon initialization. It was Ok because subdirectories are
created as needed inside directories init-db creates, and back then,
packfiles were recent invention.
After the said commit, new codepaths started relying on the presense of
objects/pack/ directory in the repository. This was exacerbated with
8b4eb6b (Do not perform cross-directory renames when creating packs,
2008-09-22) that moved the location temporary pack files are created from
objects/ directory to objects/pack/ directory, because moving temporary to
the final location was done carefully with lazy leading directory creation.
Many packfile related operations in such an old repository can fail
mysteriously because of this.
This commit introduces two helper functions to make things work better.
- odb_mkstemp() is a specialized version of mkstemp() to refactor the
code and teach it to create leading directories as needed;
- odb_pack_keep() refactors the code to create a ".keep" file while
create leading directories as needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.
This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting. Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there are few objects to deltify, they might be split amongst threads
so that there is simply no other objects left to delta against within
the same thread. Let's use the same 2*window treshold as used for the
final load balancing to allow extra threads to be created.
This fixes the benign t5300 test failure.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
... and display the actual number of threads used when locally
repacking. A remote server still won't tell you how many threads it
uses during a fetch though.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Especially on Windows where an opened file cannot be replaced, make
sure pack-objects always close packs it is about to replace. Even on
non Windows systems, this could save potential bad results if ever
objects were to be read from the new pack file using offset from the old
index.
This should fix t5303 on Windows.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/maint-keep-pack:
repack: only unpack-unreachable if we are deleting redundant packs
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
* np/pack-safer:
t5303: fix printf format string for portability
t5303: work around printf breakage in dash
pack-objects: don't leak pack window reference when splitting packs
extend test coverage for latest pack corruption resilience improvements
pack-objects: allow "fixing" a corrupted pack without a full repack
make find_pack_revindex() aware of the nasty world
make check_object() resilient to pack corruptions
make packed_object_info() resilient to pack corruptions
make unpack_object_header() non fatal
better validation on delta base object offsets
close another possibility for propagating pack corruption
* bc/maint-keep-pack:
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
* maint:
Start 1.6.0.5 cycle
Fix pack.packSizeLimit and --max-pack-size handling
checkout: Fix "initial checkout" detection
Remove the period after the git-check-attr summary
Conflicts:
RelNotes
If the limit was sufficiently low, having a single object written
could bust the limit (by design), but caused the remaining allowed
size to go negative for subsequent objects, which for an unsigned
variable is a rather huge limit.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With this patch, --local means pack only local objects that are not already
packed.
Additionally, this fixes t7700 testing whether loose objects in an alternate
object database are repacked.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a new option to pack-objects which will cause it to ignore an
object which appears in a local pack which has a .keep file, even if it
was specified for packing.
This option will be used by the porcelain repack.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the pack data to be reused is found to be bad, let's fall back to
full object access through the generic path which has its own strategies
to find alternate object sources in that case. This allows for "fixing"
a corrupted pack simply by copying either another pack containing the
object(s) found to be bad, or the loose object itself, into the object
store and launch a repack without the need for -f.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It currently calls die() whenever given offset is not found thinking
that such thing should never happen. But this offset may come from a
corrupted pack whych _could_ happen and not be found. Callers should
deal with this possibility gracefully instead.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The check_object() function tries to get away with the least amount of
pack access possible when it already has partial information on given
object rather than calling the more costly packed_object_info().
When things don't look right, it should just give up and fall back to
packed_object_info() directly instead of die()'ing.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to have pack corruption in the object header. Currently
unpack_object_header() simply die() on them instead of letting the caller
deal with that gracefully.
So let's have unpack_object_header() return an error instead, and find
a better name for unpack_object_header_gently() in that context. All
callers of unpack_object_header() are ready for it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In one case, it was possible to have a bad offset equal to 0 effectively
pointing a delta onto itself and crashing git after too many recursions.
In the other cases, a negative offset could result due to off_t being
signed. Catch those.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Abstract
--------
With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking. This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.
The Context
-----------
The Git database is all checksummed with SHA1 hashes. Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data. However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.
First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data. Any
disk corruption would be caught already in this case.
Then, packed objects are also compressed with zlib but only for their
actual payload. The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions. Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.
Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms. Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.
Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.
The Solution
------------
As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either. Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.
So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result. The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.
Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed. Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.
What about the streamed pack case? Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already. It is therefore worthless doing local validation
in that case.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Start 1.6.0.4 cycle
add instructions on how to send patches to the mailing list with Gmail
Documentation/gitattributes: Add subsection header for each attribute
git send-email: avoid leaking directory file descriptors.
send-pack: do not send out single-level refs such as refs/stash
fix overlapping memcpy in normalize_absolute_path
pack-objects: avoid reading uninitalized data
correct cache_entry allocation
Conflicts:
RelNotes
In the main loop of find_deltas, we do:
struct object_entry *entry = *list++;
...
if (!*list_size)
...
break
Because we look at and increment *list _before_ the check of
list_size, in the very last iteration of the loop we will
look at uninitialized data, and increment the pointer beyond
one past the end of the allocated space. Since we don't
actually do anything with the data until after the check,
this is not a problem in practice.
But since it technically violates the C standard, and
because it provokes a spurious valgrind warning, let's just
move the initialization of entry to a safe place.
This fixes valgrind errors in t5300, t5301, t5302, t303, and
t9400.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many call sites immediately initialize allocated memory with zero after
calling xmalloc. A single call to xcalloc can replace this two-call
sequence.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
A comment on top of create_tmpfile() describes caveats ('can have
problems on various systems (FAT, NFS, Coda)') that should apply
in this situation as well. This in the end did not end up solving
any of my personal problems, but it might be a useful cleanup patch
nevertheless.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
User notifications are presented as 'git cmd', and code comments
are presented as '"cmd"' or 'git's cmd', rather than 'git-cmd'.
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* np/maint-safer-pack:
fixup_pack_header_footer(): use nicely aligned buffer sizes
index-pack: use fixup_pack_header_footer()'s validation mode
pack-objects: use fixup_pack_header_footer()'s validation mode
improve reliability of fixup_pack_header_footer()
pack-objects: improve returned information from write_one()
This improves commit 6d6f9cddbe a bit by simply not including missing
bases in the list of objects to process at all.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* np/maint-safer-pack:
fixup_pack_header_footer(): use nicely aligned buffer sizes
index-pack: use fixup_pack_header_footer()'s validation mode
pack-objects: use fixup_pack_header_footer()'s validation mode
improve reliability of fixup_pack_header_footer()
pack-objects: improve returned information from write_one()
When limiting the pack size, a new header has to be written to the
pack and a new SHA1 computed. Make sure that the SHA1 of what is being
read back matches the SHA1 of what was written.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, this function has the potential to read corrupted pack data
from disk and give it a valid SHA1 checksum. Let's add the ability to
validate SHA1 checksum of existing data along the way, including before
and after any arbitrary point in the pack.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>