Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.
* jk/prune-mtime:
sha1_file: fix iterating loose alternate objects
for_each_loose_file_in_objdir: take an optional strbuf path
The string in 'base' contains a path suffix to a specific object;
when its value is used, the suffix must either be filled (as in
stat_sha1_file, open_sha1_file, check_and_freshen_nonlocal) or
cleared (as in prepare_packed_git) to avoid junk at the end.
660c889e (sha1_file: add for_each iterators for loose and packed
objects, 2014-10-15) introduced loose_from_alt_odb(), but this did
neither and treated 'base' as a complete path to the "base" object
directory, instead of a pointer to the "base" of the full path
string.
The trailing path after 'base' is still initialized to NUL, hiding
the bug in some common cases. Additionally the descendent
for_each_file_in_obj_subdir() function swallows ENOENT, so an error
only shows if the alternate's path was last filled with a valid
object (where statting /path/to/existing/00/0bjectfile/00 fails).
Signed-off-by: Jonathon Mah <me@JonathonMah.com>
Helped-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We feed a root "objdir" path to this iterator function,
which then copies the result into a strbuf, so that it can
repeatedly append the object sub-directories to it. Let's
make it easy for callers to just pass us a strbuf in the
first place.
We leave the original interface as a convenience for callers
who want to just pass a const string like the result of
get_object_directory().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new name is more consistent with the names of other
string_list-related functions.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tighten the logic to decide that an unreachable cruft is
sufficiently old by covering corner cases such as an ancient object
becoming reachable and then going unreachable again, in which case
its retention period should be prolonged.
* jk/prune-mtime: (28 commits)
drop add_object_array_with_mode
revision: remove definition of unused 'add_object' function
pack-objects: double-check options before discarding objects
repack: pack objects mentioned by the index
pack-objects: use argv_array
reachable: use revision machinery's --indexed-objects code
rev-list: add --indexed-objects option
rev-list: document --reflog option
t5516: test pushing a tag of an otherwise unreferenced blob
traverse_commit_list: support pending blobs/trees with paths
make add_object_array_with_context interface more sane
write_sha1_file: freshen existing objects
pack-objects: match prune logic for discarding objects
pack-objects: refactor unpack-unreachable expiration check
prune: keep objects reachable from recent objects
sha1_file: add for_each iterators for loose and packed objects
count-objects: use for_each_loose_file_in_objdir
count-objects: do not use xsize_t when counting object size
prune-packed: use for_each_loose_file_in_objdir
reachable: mark index blobs as SEEN
...
When we try to write a loose object file, we first check
whether that object already exists. If so, we skip the
write as an optimization. However, this can interfere with
prune's strategy of using mtimes to mark files in progress.
For example, if a branch contains a particular tree object
and is deleted, that tree object may become unreachable, and
have an old mtime. If a new operation then tries to write
the same tree, this ends up as a noop; we notice we
already have the object and do nothing. A prune running
simultaneously with this operation will see the object as
old, and may delete it.
We can solve this by "freshening" objects that we avoid
writing by updating their mtime. The algorithm for doing so
is essentially the same as that of has_sha1_file. Therefore
we provide a new (static) interface "check_and_freshen",
which finds and optionally freshens the object. It's trivial
to implement freshening and simple checking by tweaking a
single parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We typically iterate over the reachable objects in a
repository by starting at the tips and walking the graph.
There's no easy way to iterate over all of the objects,
including unreachable ones. Let's provide a way of doing so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.
Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).
We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check the return value of the callback and stop iterating
if it is non-zero. However, we do not make the non-zero
return value available to the caller, so they have no way of
knowing whether the operation succeeded or not (technically
they can keep their own error flag in the callback data, but
that is unlike our other for_each functions).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lockfile API and its users have been cleaned up.
* mh/lockfile: (38 commits)
lockfile.h: extract new header file for the functions in lockfile.c
hold_locked_index(): move from lockfile.c to read-cache.c
hold_lock_file_for_append(): restore errno before returning
get_locked_file_path(): new function
lockfile.c: rename static functions
lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
commit_lock_file_to(): refactor a helper out of commit_lock_file()
trim_last_path_component(): replace last_path_elm()
resolve_symlink(): take a strbuf parameter
resolve_symlink(): use a strbuf for internal scratch space
lockfile: change lock_file::filename into a strbuf
commit_lock_file(): use a strbuf to manage temporary space
try_merge_strategy(): use a statically-allocated lock_file object
try_merge_strategy(): remove redundant lock_file allocation
struct lock_file: declare some fields volatile
lockfile: avoid transitory invalid states
git_config_set_multivar_in_file(): avoid call to rollback_lock_file()
dump_marks(): remove a redundant call to rollback_lock_file()
api-lockfile: document edge cases
commit_lock_file(): rollback lock file on failure to rename
...
When running a required clean filter, we do not have to mmap the
original before feeding the filter. Instead, stream the file
contents directly to the filter and process its output.
* sp/stream-clean-filter:
sha1_file: don't convert off_t to size_t too early to avoid potential die()
convert: stream from fd to required clean filter to reduce used address space
copy_fd(): do not close the input file descriptor
mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
config.c: add git_env_ulong() to parse environment variable
convert: drop arguments other than 'path' from would_convert_to_git()
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xsize_t() checks if an off_t argument can be safely converted to
a size_t return value. If the check is executed too early, it could
fail for large files on 32-bit architectures even if the size_t code
path is not taken. Other paths might be able to handle the large file.
Specifically, index_stream_convert_blob() is able to handle a large file
if a filter is configured that returns a small result.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach a few codepaths to punt (instead of dying) when large blobs
that would not fit in core are involved in the operation.
* nd/large-blobs:
diff: shortcut for diff'ing two binary SHA-1 objects
diff --stat: mark any file larger than core.bigfilethreshold binary
diff.c: allow to pass more flags to diff_populate_filespec
sha1_file.c: do not die failing to malloc in unpack_compressed_entry
wrapper.c: introduce gentle xmallocz that does not die()
Reduce the use of fixed sized buffer passed to getcwd() calls
by introducing xgetcwd() helper.
* rs/strbuf-getcwd:
use strbuf_add_absolute_path() to add absolute paths
abspath: convert absolute_path() to strbuf
use xgetcwd() to set $GIT_DIR
use xgetcwd() to get the current directory or die
wrapper: add xgetcwd()
abspath: convert real_path_internal() to strbuf
abspath: use strbuf_getcwd() to remember original working directory
setup: convert setup_git_directory_gently_1 et al. to strbuf
unix-sockets: use strbuf_getcwd()
strbuf: add strbuf_getcwd()
The data is streamed to the filter process anyway. Better avoid mapping
the file if possible. This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media. The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit. The new
implementation can filter files of any size as long as the filter output
is small enough.
The new code path is only taken if the filter is required. The filter
consumes data directly from the fd. If it fails, the original data is
not immediately available. The condition can easily be handled as
a fatal error, which is expected for a required filter anyway.
If the filter was not required, the condition would need to be handled
in a different way, like seeking to 0 and reading the data. But this
would require more restructuring of the code and is probably not worth
it. The obvious approach of falling back to reading all data would not
help achieving the main purpose of this patch, which is to handle large
files with limited address space. If reading all data is an option, we
can simply take the old code path right away and mmap the entire file.
The environment variable GIT_MMAP_LIMIT, which has been introduced in
a previous commit is used to test that the expected code path is taken.
A related test that exercises required filters is modified to verify
that the data actually has been modified on its way from the file system
to the object store.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to test expectations about mmap in a way similar to testing
expectations about malloc with GIT_ALLOC_LIMIT introduced by
d41489a6 (Add more large blob test cases, 2012-03-07), introduce a
new environment variable GIT_MMAP_LIMIT to limit the largest allowed
mmap length.
xmmap() is modified to check the size of the requested region and
fail it if it is beyond the limit. Together with GIT_ALLOC_LIMIT
tests can now confirm expectations about memory consumption.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is only the path that matters in the decision whether to filter
or not. Clarify this by making path the only argument of
would_convert_to_git().
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fewer die() gives better control to the caller, provided that the
caller _can_ handle it. And in unpack_compressed_entry() case, it can,
because unpack_compressed_entry() already returns NULL if it fails to
inflate data.
A side effect from this is fsck continues to run when very large blobs
are present (and do not fit in memory).
Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kb/perf-trace:
api-trace.txt: add trace API documentation
progress: simplify performance measurement by using getnanotime()
wt-status: simplify performance measurement by using getnanotime()
git: add performance tracing for git's main() function to debug scripts
trace: add trace_performance facility to debug performance issues
trace: add high resolution timer function to debug performance issues
trace: add 'file:line' to all trace output
trace: move code around, in preparation to file:line output
trace: add current timestamp to all trace output
trace: disable additional trace output for unit tests
trace: add infrastructure to augment trace output with additional info
sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
trace: improve trace performance
trace: remove redundant printf format attribute
trace: consistently name the format parameter
trace: move trace declarations from cache.h to new trace.h
* jk/strip-suffix:
prepare_packed_git_one: refactor duplicate-pack check
verify-pack: use strbuf_strip_suffix
strbuf: implement strbuf_strip_suffix
index-pack: use strip_suffix to avoid magic numbers
use strip_suffix instead of ends_with in simple cases
replace has_extension with ends_with
implement ends_with via strip_suffix
add strip_suffix function
sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
Code to avoid adding the same alternate object store twice was
subtly broken for a long time, but nobody seems to have noticed.
* rs/fix-alt-odb-path-comparison:
sha1_file: avoid overrunning alternate object base string
When adding alternate object directories, we try not to add the
directory of the current repository to avoid cycles. Unfortunately,
that test was broken, since it compared an absolute with a relative
path.
Signed-off-by: Ephrim Khong <dr.khong@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This changes GIT_TRACE_PACK_ACCESS functionality as follows:
* supports the same options as GIT_TRACE (e.g. printing to stderr)
* no longer supports relative paths
* appends to the trace file rather than overwriting
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While checking if a new alternate object database is a duplicate make
sure that old and new base paths have the same length before comparing
them with memcmp. This avoids overrunning the buffer of the existing
entry if the new one is longer and it stops rejecting foobar/ after
foo/ was already added.
Signed-off-by: Rene Scharfe <ls.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are reloading the list of packs, we check whether a
particular pack has been loaded. This is slightly tricky,
because we load packs based on the presence of their ".idx"
files, but record the name of the matching ".pack" file.
Therefore we want to compare their bases.
The existing code stripped off ".idx" from a file we found,
then compared that whole base length to strings containing
the ".pack" version. This meant we could end up comparing
bytes past what the ".pack" string contained, if the ".idx"
file name was much longer.
In practice, it worked OK because memcmp would end up seeing
a difference in the two strings and would return early
before hitting the full length. However, memcmp may
sometimes read extra bytes past a difference (e.g., because
it is comparing 64-bit words), or is even free to compare in
reverse order.
Furthermore, our memcmp made no guarantees that we matched
the whole pack name, up to ".pack". So "foo.idx" would match
"foo-bar.pack", which is wrong (but does not typically
happen, because our pack names have a fixed size).
We can fix both issues, avoid magic numbers, and document
that we expect to compare against a string with ".pack" by
using strip_suffix.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two are almost the same function, with the exception
that has_extension only matches if there is content before
the suffix. So ends_with(".exe", ".exe") is true, but
has_extension would not be.
This distinction does not matter to any of the callers,
though, and we can just replace uses of has_extension with
ends_with. We prefer the "ends_with" name because it is more
generic, and there is nothing about the function that
requires it to be used for file extensions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using strbuf to create a message string in case a path is
too long for our fixed-size buffer, replace that buffer with a strbuf
and thus get rid of the limitation.
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reworded the error message given upon a failure to open an existing
loose object file due to e.g. permission issues; it was reported as
the object being corrupt, but that is not quite true.
* jk/report-fail-to-read-objects-better:
open_sha1_file: report "most interesting" errno
When we try to open a loose object file, we first attempt to
open in the local object database, and then try any
alternates. This means that the errno value when we return
will be from the last place we looked (and due to the way
the code is structured, simply ENOENT if we do not have have
any alternates).
This can cause confusing error messages, as read_sha1_file
checks for ENOENT when reporting a missing object. If errno
is something else, we report that. If it is ENOENT, but
has_loose_object reports that we have it, then we claim the
object is corrupted. For example:
$ chmod 0 .git/objects/??/*
$ git rev-list --all
fatal: loose object b2d6fab18b92d49eac46dc3c5a0bcafabda20131 (stored in .git/objects/b2/d6fab18b92d49eac46dc3c5a0bcafabda20131) is corrupt
This patch instead keeps track of the "most interesting"
errno we receive during our search. We consider ENOENT to be
the least interesting of all, and otherwise report the first
error found (so problems in the object database take
precedence over ones in alternates). Here it is with this
patch:
$ git rev-list --all
fatal: failed to read object b2d6fab18b92d49eac46dc3c5a0bcafabda20131: Permission denied
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.
* jl/nor-or-nand-and:
code and test: fix misuses of "nor"
comments: fix misuses of "nor"
contrib: fix misuses of "nor"
Documentation: fix misuses of "nor"
Replace open-coded reallocation with ALLOC_GROW() macro.
* dd/use-alloc-grow:
sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
read-cache.c: use ALLOC_GROW() in add_index_entry()
builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
attr.c: use ALLOC_GROW() in handle_attr_line()
dir.c: use ALLOC_GROW() in create_simplify()
reflog-walk.c: use ALLOC_GROW()
replace_object.c: use ALLOC_GROW() in register_replace_object()
patch-ids.c: use ALLOC_GROW() in add_commit()
diffcore-rename.c: use ALLOC_GROW()
diff.c: use ALLOC_GROW()
commit.c: use ALLOC_GROW() in register_commit_graft()
cache-tree.c: use ALLOC_GROW() in find_subtree()
bundle.c: use ALLOC_GROW() in add_to_ref_list()
builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
Fix a small leak in the delta stack used when resolving a long
delta chain at runtime.
* nd/sha1-file-delta-stack-leakage-fix:
sha1_file: fix delta_stack memory leak in unpack_entry
* mh/object-code-cleanup:
sha1_file.c: document a bunch of functions defined in the file
sha1_file_name(): declare to return a const string
find_pack_entry(): document last_found_pack
replace_object: use struct members instead of an array
Helped-by: He Sun <sunheehnus@gmail.com>
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Change the return value of sha1_file_name() to (const char *).
(Callers have no business mucking about here.) Change callers
accordingly, deleting a few superfluous temporary variables along the
way.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a comment at the declaration of last_found_pack and where it is
used in find_pack_entry(). In the latter, separate the cases (1) to
make a place for the new comment and (2) to turn the success case into
affirmative logic.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This delta_stack array can grow to any length depending on the actual
delta chain, but we forget to free it. Normally it does not matter
because we use small_delta_stack[] from stack and small_delta_stack
can hold 64-delta chains, more than standard --depth=50 in pack-objects.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clone $origin foo\bar\baz" on Windows failed to create the
leading directories (i.e. a moral-equivalent of "mkdir -p").
* ss/safe-create-leading-dir-with-slash:
safe_create_leading_directories(): on Windows, \ can separate path components
Code clean-up and protection against concurrent write access to the
ref namespace.
* mh/safe-create-leading-directories:
rename_tmp_log(): on SCLD_VANISHED, retry
rename_tmp_log(): limit the number of remote_empty_directories() attempts
rename_tmp_log(): handle a possible mkdir/rmdir race
rename_ref(): extract function rename_tmp_log()
remove_dir_recurse(): handle disappearing files and directories
remove_dir_recurse(): tighten condition for removing unreadable dir
lock_ref_sha1_basic(): if locking fails with ENOENT, retry
lock_ref_sha1_basic(): on SCLD_VANISHED, retry
safe_create_leading_directories(): add new error value SCLD_VANISHED
cmd_init_db(): when creating directories, handle errors conservatively
safe_create_leading_directories(): introduce enum for return values
safe_create_leading_directories(): always restore slash at end of loop
safe_create_leading_directories(): split on first of multiple slashes
safe_create_leading_directories(): rename local variable
safe_create_leading_directories(): add explicit "slash" pointer
safe_create_leading_directories(): reduce scope of local variable
safe_create_leading_directories(): fix format of "if" chaining
When cloning to a directory "C:\foo\bar" from Windows' cmd.exe where
"foo" does not exist yet, Git would throw an error like
fatal: could not create work tree dir 'c:\foo\bar'.: No such file or directory
Fix this by not hard-coding a platform specific directory separator
into safe_create_leading_directories().
This patch, including its entire commit message, is derived from a
patch by Sebastian Schuberth.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>