Code to avoid adding the same alternate object store twice was
subtly broken for a long time, but nobody seems to have noticed.
* rs/fix-alt-odb-path-comparison:
sha1_file: avoid overrunning alternate object base string
While checking if a new alternate object database is a duplicate make
sure that old and new base paths have the same length before comparing
them with memcmp. This avoids overrunning the buffer of the existing
entry if the new one is longer and it stops rejecting foobar/ after
foo/ was already added.
Signed-off-by: Rene Scharfe <ls.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reworded the error message given upon a failure to open an existing
loose object file due to e.g. permission issues; it was reported as
the object being corrupt, but that is not quite true.
* jk/report-fail-to-read-objects-better:
open_sha1_file: report "most interesting" errno
When we try to open a loose object file, we first attempt to
open in the local object database, and then try any
alternates. This means that the errno value when we return
will be from the last place we looked (and due to the way
the code is structured, simply ENOENT if we do not have have
any alternates).
This can cause confusing error messages, as read_sha1_file
checks for ENOENT when reporting a missing object. If errno
is something else, we report that. If it is ENOENT, but
has_loose_object reports that we have it, then we claim the
object is corrupted. For example:
$ chmod 0 .git/objects/??/*
$ git rev-list --all
fatal: loose object b2d6fab18b92d49eac46dc3c5a0bcafabda20131 (stored in .git/objects/b2/d6fab18b92d49eac46dc3c5a0bcafabda20131) is corrupt
This patch instead keeps track of the "most interesting"
errno we receive during our search. We consider ENOENT to be
the least interesting of all, and otherwise report the first
error found (so problems in the object database take
precedence over ones in alternates). Here it is with this
patch:
$ git rev-list --all
fatal: failed to read object b2d6fab18b92d49eac46dc3c5a0bcafabda20131: Permission denied
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.
* jl/nor-or-nand-and:
code and test: fix misuses of "nor"
comments: fix misuses of "nor"
contrib: fix misuses of "nor"
Documentation: fix misuses of "nor"
Replace open-coded reallocation with ALLOC_GROW() macro.
* dd/use-alloc-grow:
sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
read-cache.c: use ALLOC_GROW() in add_index_entry()
builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
attr.c: use ALLOC_GROW() in handle_attr_line()
dir.c: use ALLOC_GROW() in create_simplify()
reflog-walk.c: use ALLOC_GROW()
replace_object.c: use ALLOC_GROW() in register_replace_object()
patch-ids.c: use ALLOC_GROW() in add_commit()
diffcore-rename.c: use ALLOC_GROW()
diff.c: use ALLOC_GROW()
commit.c: use ALLOC_GROW() in register_commit_graft()
cache-tree.c: use ALLOC_GROW() in find_subtree()
bundle.c: use ALLOC_GROW() in add_to_ref_list()
builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
Fix a small leak in the delta stack used when resolving a long
delta chain at runtime.
* nd/sha1-file-delta-stack-leakage-fix:
sha1_file: fix delta_stack memory leak in unpack_entry
* mh/object-code-cleanup:
sha1_file.c: document a bunch of functions defined in the file
sha1_file_name(): declare to return a const string
find_pack_entry(): document last_found_pack
replace_object: use struct members instead of an array
Helped-by: He Sun <sunheehnus@gmail.com>
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Change the return value of sha1_file_name() to (const char *).
(Callers have no business mucking about here.) Change callers
accordingly, deleting a few superfluous temporary variables along the
way.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a comment at the declaration of last_found_pack and where it is
used in find_pack_entry(). In the latter, separate the cases (1) to
make a place for the new comment and (2) to turn the success case into
affirmative logic.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This delta_stack array can grow to any length depending on the actual
delta chain, but we forget to free it. Normally it does not matter
because we use small_delta_stack[] from stack and small_delta_stack
can hold 64-delta chains, more than standard --depth=50 in pack-objects.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clone $origin foo\bar\baz" on Windows failed to create the
leading directories (i.e. a moral-equivalent of "mkdir -p").
* ss/safe-create-leading-dir-with-slash:
safe_create_leading_directories(): on Windows, \ can separate path components
Code clean-up and protection against concurrent write access to the
ref namespace.
* mh/safe-create-leading-directories:
rename_tmp_log(): on SCLD_VANISHED, retry
rename_tmp_log(): limit the number of remote_empty_directories() attempts
rename_tmp_log(): handle a possible mkdir/rmdir race
rename_ref(): extract function rename_tmp_log()
remove_dir_recurse(): handle disappearing files and directories
remove_dir_recurse(): tighten condition for removing unreadable dir
lock_ref_sha1_basic(): if locking fails with ENOENT, retry
lock_ref_sha1_basic(): on SCLD_VANISHED, retry
safe_create_leading_directories(): add new error value SCLD_VANISHED
cmd_init_db(): when creating directories, handle errors conservatively
safe_create_leading_directories(): introduce enum for return values
safe_create_leading_directories(): always restore slash at end of loop
safe_create_leading_directories(): split on first of multiple slashes
safe_create_leading_directories(): rename local variable
safe_create_leading_directories(): add explicit "slash" pointer
safe_create_leading_directories(): reduce scope of local variable
safe_create_leading_directories(): fix format of "if" chaining
When cloning to a directory "C:\foo\bar" from Windows' cmd.exe where
"foo" does not exist yet, Git would throw an error like
fatal: could not create work tree dir 'c:\foo\bar'.: No such file or directory
Fix this by not hard-coding a platform specific directory separator
into safe_create_leading_directories().
This patch, including its entire commit message, is derived from a
patch by Sebastian Schuberth.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an object lookup fails, we re-read the objects/pack
directory to pick up any new packfiles that may have been
created since our last read. We also discard any pack
revindex structs we've allocated.
The discarding is a problem for the pack-bitmap code, which keeps
a pointer to the revindex for the bitmapped pack. After the
discard, the pointer is invalid, and we may read free()d
memory.
Other revindex users do not keep a bare pointer to the
revindex; instead, they always access it through
revindex_for_pack(), which lazily builds the revindex. So
one solution is to teach the pack-bitmap code a similar
trick. It would be slightly less efficient, but probably not
all that noticeable.
However, it turns out this discarding is not actually
necessary. When we call reprepare_packed_git, we do not
throw away our old pack list. We keep the existing entries,
and only add in new ones. So there is no safety problem; we
will still have the pack struct that matches each revindex.
The packfile itself may go away, of course, but we are
already prepared to handle that, and it may happen outside
of reprepare_packed_git anyway.
Throwing away the revindex may save some RAM if the pack
never gets reused (about 12 bytes per object). But it also
wastes some CPU time (to regenerate the index) if the pack
does get reused. It's hard to say which is more valuable,
but in either case, it happens very rarely (only when we
race with a simultaneous repack). Just leaving the revindex
in place is simple and safe both for current and future
code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "cat-file --batch" to show delta-base object name for a
packed object that is represented as a delta.
* jk/oi-delta-base:
cat-file: provide %(deltabase) batch format
sha1_object_info_extended: provide delta base sha1s
When we figure out how many file descriptors to allocate for
keeping packfiles open, a system with non-working getrlimit() could
cause us to die(), but because we make this call only to get a
rough estimate of how many is available and we do not even attempt
to use up all file descriptors available ourselves, it is nicer to
fall back to a reasonable low value rather than dying.
* jh/rlimit-nofile-fallback:
get_max_fd_limit(): fall back to OPEN_MAX upon getrlimit/sysconf failure
read_sha1_file() that is the workhorse to read the contents given
an object name honoured object replacements, but there is no
corresponding mechanism to sha1_object_info() that is used to
obtain the metainfo (e.g. type & size) about the object, leading
callers to weird inconsistencies.
* cc/replace-object-info:
replace info: rename 'full' to 'long' and clarify in-code symbols
Documentation/git-replace: describe --format option
builtin/replace: unset read_replace_refs
t6050: add tests for listing with --format
builtin/replace: teach listing using short, medium or full formats
sha1_file: perform object replacement in sha1_object_info_extended()
t6050: show that git cat-file --batch fails with replace objects
sha1_object_info_extended(): add an "unsigned flags" parameter
sha1_file.c: add lookup_replace_object_extended() to pass flags
replace_object: don't check read_replace_refs twice
rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT
Add a new possible error result that can be returned by
safe_create_leading_directories() and
safe_create_leading_directories_const(): SCLD_VANISHED. This value
indicates that a file or directory on the path existed at one point
(either it already existed or the function created it), but then it
disappeared. This probably indicates that another process deleted the
directory while we were working. If SCLD_VANISHED is returned, the
caller might want to retry the function call, as there is a chance
that a new attempt will succeed.
Why doesn't safe_create_leading_directories() do the retrying
internally? Because an empty directory isn't really ever safe until
it holds a file. So even if safe_create_leading_directories() were
absolutely sure that the directory existed before it returned, there
would be no guarantee that the directory still existed when the caller
tried to write something in it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of returning magic integer values (which a couple of callers
go to the trouble of distinguishing), return values from an enum. Add
a docstring.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Always restore the slash that we scribbled over at the end of the
loop, rather than also fixing it up at each premature exit from the
loop. This makes it harder to forget to do the cleanup as new paths
are added to the code.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the input path has multiple slashes between path components (e.g.,
"foo//bar"), then the old code was breaking the path at the last
slash, not the first one. So in the above example, the second slash
was overwritten with NUL, resulting in the parent directory being
sought as "foo/".
When stat() is called on "foo/", it fails with ENOTDIR if "foo" exists
but is not a directory. This caused the wrong path to be taken in the
subsequent logic.
So instead, split path components at the first intercomponent slash
rather than the last one.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename "pos" to "next_component", because now it always points at the
next component of the path name that has to be processed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Keep track of the position of the slash character independently of
"pos", thereby making the purpose of each variable clearer and
working towards other upcoming changes.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it more obvious that values of "st" don't persist across
loop iterations.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Count-objects will report any "garbage" files in the packs
directory, including files whose extensions it does not
know (case 1), and files whose matching ".pack" file is
missing (case 2). Without having learned about ".bitmap"
files, the current code reports all such files as garbage
(case 1), even if their pack exists. Instead, they should be
treated as case 2.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.
However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.
For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check "oi.whence == OI_PACKED && oi.u.packed.is_delta"
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On broken systems where RLIMIT_NOFILE is visible by the compliers
but underlying getrlimit() system call does not behave, we used to
simply die() when we are trying to decide how many file descriptors
to allocate for keeping packfiles open. Instead, allow the fallback
codepath to take over when we get such a failure from getrlimit().
The same issue exists with _SC_OPEN_MAX and sysconf(); restructure
the code in a similar way to prepare for a broken sysconf() as well.
Noticed-by: Joey Hess <joey@kitenet.net>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two processes creating loose objects at the same time could have
failed unnecessarily when the name of their new objects started
with the same byte value, due to a race condition.
* jh/loose-object-dirs-creation-race:
sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs
"git cat-file --batch-check=ok" did not check the existence of the
named object.
* sb/sha1-loose-object-info-check-existence:
sha1_loose_object_info(): do not return success on missing object
sha1_object_info_extended() should perform object replacement
if it is needed.
The simplest way to do that is to make it call
lookup_replace_object_extended().
And now its "unsigned flags" parameter is used as it is passed
to lookup_replace_object_extended().
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This parameter is not used yet, but it will be used to tell
sha1_object_info_extended() if it should perform object
replacement or not.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, there is only one caller to lookup_replace_object()
that can benefit from passing it some flags, but we expect
that there could be more.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The READ_SHA1_FILE_REPLACE flag is more related to using the
lookup_replace_object() function rather than the
read_sha1_file() function.
We also need such a flag to be used with sha1_object_info()
instead of read_sha1_file().
The name LOOKUP_REPLACE_OBJECT is therefore better for this
flag.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git cat-file --batch-check=ok" did not check the existence of the
named object.
* sb/sha1-loose-object-info-check-existence:
sha1_loose_object_info(): do not return success on missing object
When two processes created one loose object file each, which fell
into the same fan-out bucket that previously did not have any
objects, they both tried to do an equivalent of
mkdir .git/objects/$fanout &&
chmod $shared_perm .git/objects/$fanout
before writing into their file .git/objects/$fanout/$remainder,
one of which could have failed unnecessarily when the second
invocation of mkdir found that the directory already has been
created by the first one.
* jh/loose-object-dirs-creation-race:
sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs
In git v1.4.3, we introduced a new loose object format that
encoded some object information outside of the zlib stream.
Ultimately the format was dropped in v1.5.3, but we kept the
reading side around to help people migrate objects. Each
time we open a loose object, we use a heuristic to check
whether it is in the normal loose format, or the
experimental one.
This heuristic is robust in the face of valid data, but it
tends to treat corrupted or garbage data as an experimental
object. With the regular format, we would notice quickly
that zlib's crc does not check out and complain. With the
experimental object, we are likely to extract a nonsensical
object size and try to allocate a huge buffer, resulting in
xmalloc calling "die".
This latter behavior is much worse, for two reasons. One,
git reports an allocation error when the real error is
corruption. And two, the program dies unconditionally, so
you cannot even run fsck (which would otherwise ignore the
broken object and keep going).
We could try to improve the heuristic to err on the side of
normal objects in the face of corruption, but there is
really little point. The experimental format is long-dead,
and was never enabled by default to begin with. We can
instead simply remove it. The only affected repository would
be one that explicitly set core.legacyheaders in 2007, and
then never repacked in the intervening 6 years.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 052fe5ea (sha1_loose_object_info: make type lookup optional,
2013-07-12), sha1_loose_object_info() returns happily without
checking if the object in question exists, which is not what the the
caller sha1_object_info_extended() expects; the caller does not even
bother checking the existence of the object itself.
Noticed-by: Sven Brauch <svenbrauch@googlemail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are cases (e.g. when running concurrent fetches in a repo) where
multiple Git processes concurrently attempt to create loose objects
within the same objects/XX/ dir. The creation of the loose object files
is (AFAICS) safe from races, but the creation of the objects/XX/ dir in
which the loose objects reside is unsafe, for example:
Two concurrent fetches - A and B. As part of its fetch, A needs to store
12aaaaa as a loose object. B, on the other hand, needs to store 12bbbbb
as a loose object. The objects/12 directory does not already exist.
Concurrently, both A and B determine that they need to create the
objects/12 directory (because their first call to git_mkstemp_mode()
within create_tmpfile() fails witn ENOENT). One of them - let's say A -
executes the following mkdir() call before the other. This first call
returns success, and A moves on. When B gets around to calling mkdir(),
it fails with EEXIST, because A won the race. The mkdir() error causes B
to return -1 from create_tmpfile(), which propagates all the way,
resulting in the fetch failing with:
error: unable to create temporary file: File exists
fatal: failed to write object
fatal: unpack-objects failed
Although it's hard to add a testcase reproducing this issue, it's easy
to provoke if we insert a sleep after the
if (mkdir(buffer, 0777) || adjust_shared_perm(buffer))
return -1;
block, and then run two concurrent "git fetch"es against the same repo.
The fix is to simply handle mkdir() failing with EEXIST as a success.
If EEXIST is somehow returned for the wrong reasons (because the relevant
objects/XX is not a directory, or is otherwise unsuitable for object
storage), the following call to adjust_shared_perm(), or ultimately the
retried call to git_mkstemp_mode() will fail, and we end up returning
error from create_tmpfile() in any case.
Note that there are still cases where two users with unsuitable umasks
in a shared repo can end up in two races where one user first wins the
mkdir() race to create an objects/XX/ directory, and then the other user
wins the adjust_shared_perms() race to chmod() that directory, but fails
because it is (transiently, until the first users completes its chmod())
unwriteable to the other user. However, (an equivalent of) this race also
exists before this patch, and is made no worse by this patch.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 5b0864070 (sha1_object_info_extended: make type calculation
optional, Jul 12 2013) changed the return value of the
sha1_object_info_extended function to 0/-1 for success/error.
Previously this function returned the object type for success or
-1 for error. But unfortunately the above commit forgot to change
or move the comment above this function that says "returns enum
object_type or negative".
To fix this inconsistency, let's move the comment above the
sha1_object_info function where it is still true.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git_open_noatime` helper can be of general interest for other
consumers of git's different on-disk formats.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>