* jk/path-name-safety-2.6:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
* jk/path-name-safety-2.5:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
* jk/path-name-safety-2.4:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.
So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.
This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to read the pack data using the offsets stored in the pack
idx file has been made more carefully check the validity of the
data in the idx.
* jk/pack-idx-corruption-safety:
sha1_file.c: mark strings for translation
use_pack: handle signed off_t overflow
nth_packed_object_offset: bounds-check extended offset
t5313: test bounds-checks of corrupted/malicious pack/idx files
"git config section.var value" to set a value in per-repository
configuration file failed when it was run outside any repository,
but didn't say the reason correctly.
* js/config-set-in-non-repository:
git config: report when trying to modify a non-existing repo config
A helper function "git submodule" uses since v2.7.0 to list the
modules that match the pathspec argument given to its subcommands
(e.g. "submodule add <repo> <path>") has been fixed.
* sb/submodule-module-list-fix:
submodule helper list: respect correct path prefix
* jk/tighten-alloc: (23 commits)
compat/mingw: brown paper bag fix for 50a6c8e
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
...
The "v(iew)" subcommand of the interactive "git am -i" command was
broken in 2.6.0 timeframe when the command was rewritten in C.
* jc/am-i-v-fix:
am -i: fix "v"iew
pager: factor out a helper to prepare a child process to run the pager
pager: lose a separate argv[]
"git rev-parse --git-common-dir" used in the worktree feature
misbehaved when run from a subdirectory.
* nd/git-common-dir-fix:
rev-parse: take prefix into account in --git-common-dir
"git show 'HEAD:Foo[BAR]Baz'" did not interpret the argument as a
rev, i.e. the object named by the the pathname with wildcard
characters in a tree object.
* nd/dwim-wildcards-as-pathspecs:
get_sha1: don't die() on bogus search strings
check_filename: tighten dwim-wildcard ambiguity
checkout: reorder check_filename conditional
Many codepaths forget to check return value from git_config_set();
the function is made to die() to make sure we do not proceed when
setting a configuration variable failed.
* ps/config-error:
config: rename git_config_set_or_die to git_config_set
config: rename git_config_set to git_config_set_gently
compat: die when unable to set core.precomposeunicode
sequencer: die on config error when saving replay opts
init-db: die on config errors when initializing empty repo
clone: die on config error in cmd_clone
remote: die on config error when manipulating remotes
remote: die on config error when setting/adding branches
remote: die on config error when setting URL
submodule--helper: die on config error when cloning module
submodule: die on config error when linking modules
branch: die on config error when editing branch description
branch: die on config error when unsetting upstream
branch: report errors in tracking branch setup
config: introduce set_or_die wrappers
If a pack .idx file has a corrupted offset for an object, we
may try to access an offset in the .idx or .pack file that
is larger than the file's size. For the .pack case, we have
use_pack() to protect us, which realizes the access is out
of bounds. But if the corrupted value asks us to look in the
.idx file's secondary 64-bit offset table, we blindly add it
to the mmap'd index data and access arbitrary memory.
We can fix this with a simple bounds-check compared to the
size we found when we opened the .idx file.
Note that there's similar code in index-pack that is
triggered only during "index-pack --verify". To support
both, we pull the bounds-check into a separate function,
which dies when it sees a corrupted file.
It would be nice if we could return an error, so that the
pack code could try to find a good copy of the object
elsewhere. Currently nth_packed_object_offset doesn't have
any way to return an error, but it could probably use "0" as
a sentinel value (since no object can start there). This is
the minimal fix, and we can improve the resilience later on
top.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a pilot error to call `git config section.key value` outside of
any Git worktree. The message
error: could not lock config file .git/config: No such file or
directory
is not very helpful in that situation, though. Let's print a helpful
message instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a regression introduced by 74703a1e4d (submodule: rewrite
`module_list` shell function in C, 2015-09-02).
Add a test to ensure we list the right submodule when giving a
specific pathspec.
Reported-By: Caleb Jorden <cjorden@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two variants of this function, one that takes a
string and one that takes a ptr/len combo. But we only call
the latter with the length of a NUL-terminated string, so
our first simplification is to drop it in favor of the
string variant.
Since we know we have a string, we can also replace the
manual memory computation with a call to alloc_ref().
Furthermore, we can rely on get_oid_hex() to complain if it
hits the end of the string. That means we can simplify the
check for "<sha1> <ref>" versus just "<ref>". Rather than
manage the ptr/len pair, we can just bump the start of our
string forward. The original code over-allocated based on
the original "namelen" (which wasn't _wrong_, but was simply
wasteful and confusing).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We frequently allocate strings as xmalloc(len + 1), where
the extra 1 is for the NUL terminator. This can be done more
simply with xmallocz, which also checks for integer
overflow.
There's no case where switching xmalloc(n+1) to xmallocz(n)
is wrong; the result is the same length, and malloc made no
guarantees about what was in the buffer anyway. But in some
cases, we can stop manually placing NUL at the end of the
allocated buffer. But that's only safe if it's clear that
the contents will always fill the buffer.
In each case where this patch does so, I manually examined
the control flow, and I tried to err on the side of caution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many manual argv allocations that predate the
argv_array API. Switching to that API brings a few
advantages:
1. We no longer have to manually compute the correct final
array size (so it's one less thing we can screw up).
2. In many cases we had to make a separate pass to count,
then allocate, then fill in the array. Now we can do it
in one pass, making the code shorter and easier to
follow.
3. argv_array handles memory ownership for us, making it
more obvious when things should be free()d and and when
not.
Most of these cases are pretty straightforward. In some, we
switch from "run_command_v" to "run_command" which lets us
directly use the argv_array embedded in "struct
child_process".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paths that have been told the index about with "add -N" are not
quite yet in the index, but a few commands behaved as if they
already are in a harmful way.
* nd/ita-cleanup:
grep: make it clear i-t-a entries are ignored
add and use a convenience macro ce_intent_to_add()
blame: remove obsolete comment
Rename git_config_set_or_die functions to git_config_set, leading
to the new default behavior of dying whenever a configuration
error occurs.
By now all callers that shall die on error have been transitioned
to the _or_die variants, thus making this patch a simple rename
of the functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The desired default behavior for `git_config_set` is to die
whenever an error occurs. Dying is the default for a lot of
internal functions when failures occur and is in this case the
right thing to do for most callers as otherwise we might run into
inconsistent repositories without noticing.
As some code may rely on the actual return values for
`git_config_set` we still require the ability to invoke these
functions without aborting. Rename the existing `git_config_set`
functions to `git_config_set_gently` to keep them available for
those callers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating an empty repository with `git init-db` we do not
check for error codes returned by `git_config_set` functions.
This may cause the user to end up with an inconsistent repository
without any indication for the user.
Fix this problem by dying early with an error message when we are
unable to write the configuration files to disk.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The clone command does not check for error codes returned by
`git_config_set` functions. This may cause the user to end up
with an inconsistent repository without any indication with what
went wrong.
Fix this problem by dying with an error message when we are
unable to write the configuration files to disk.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When manipulating remotes we try to set various configuration
values without checking if the values were persisted correctly,
possibly leaving the remote in an inconsistent state.
Fix this issue by dying early and notifying the user about the
error.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we add or set new branches (e.g. by `git remote add -f` or
`git remote set-branches`) we do not check for error codes when
writing the branches to the configuration file. When persisting
the configuration failed we are left with a remote that has none
or not all of the branches that should have been set without
notifying the user.
Fix this issue by dying early on configuration error.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When invoking `git-remote --set-url` we do not check the return
value when writing the actual new URL to the configuration file,
pretending to the user that the configuration has been set while
it was in fact not persisted.
Fix this problem by dying early when setting the config fails.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When setting the 'core.worktree' option for a newly cloned
submodule we ignore the return value of `git_config_set_in_file`.
As this leaves the submodule in an inconsistent state, we instead
want to inform the user that something has gone wrong by printing
an error and aborting the program.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we try to unset upstream configurations we do not check
return codes for the `git_config_set` functions. As those may
indicate that we were unable to unset the respective
configuration we may exit successfully without any error message
while in fact the upstream configuration was not unset.
Fix this by dying with an error message when we cannot unset the
configuration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
You can tweak the reflog expiration for a particular subset
of refs by configuring gc.foo.reflogexpire. We keep a linked
list of reflog_expire_cfg structs, each of which holds the
pattern and a "len" field for the length of the pattern. The
pattern itself is _not_ NUL-terminated.
However, we feed the pattern directly to wildmatch(), which
expects a NUL-terminated string, meaning it may keep reading
random junk after our struct.
We can fix this by allocating an extra byte for the NUL
(which is already zero because we use xcalloc). Let's also
drop the misleading "len" field, which is no longer
necessary. The existing use of "len" can be converted to use
strncmp().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'v'iew subcommand of the interactive mode of "git am -i" was
broken by the rewrite to C we did at around 2.6.0 timeframe at
7ff26832 (builtin-am: implement -i/--interactive, 2015-08-04); we
used to spawn the pager via the shell, accepting things like
PAGER='less -S'
in the environment, but the rewrite forgot and tried to directly
spawn a command whose name is the entire string.
The previous refactoring of the new helper function makes it easier
for us to do the right thing.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the time, get_git_common_dir() returns an absolute path so
prefix is irrelevant. If it returns a relative path (e.g. from the
main worktree) then prefixing is required.
Noticed-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When specifying both revisions and pathnames, we allow
"<rev> -- <pathspec>" to be spelled without the "--" as long
as it is not ambiguous. The original logic was something
like:
1. Resolve each item with get_sha1(). If successful,
we know it can be a <rev>. Verify that it _isn't_ a
filename, using verify_non_filename(), and complain of
ambiguity otherwise.
2. If get_sha1() didn't succeed, make sure that it _is_
a file, using verify_filename(). If not, complain
that it is neither a <rev> nor a <pathspec>.
Both verify_filename() and verify_non_filename() rely on
check_filename(), which definitely said "yes, this is a
file" or "no, it is not" using lstat().
Commit 28fcc0b (pathspec: avoid the need of "--" when
wildcard is used, 2015-05-02) introduced a convenience
feature: check_filename() will consider anything with
wildcard meta-characters as a possible filename, without
even checking the filesystem.
This works well for case 2. For such a wildcard, we would
previously have died and said "it is neither". Post-28fcc0b,
we assume it's a pathspec and proceed.
But it makes some instances of case 1 worse. We may have an
extended sha1 expression that contains meta-characters
(e.g., "HEAD^{/foo.*bar}"), and we now complain that it's
also a filename, due to the wildcard characters (even though
that wildcard would not match anything in the filesystem).
One solution would be to actually expand the pathname and
see if it matches anything on the filesystem. But that's
potentially expensive, and we do not have to be so rigorous
for this DWIM magic (if you want rigor, use "--").
Instead, we can just use different rules for cases 1 and 2.
When we know something is a rev, we will complain only if it
meets a much higher standard for "this is also a file";
namely that it actually exists in the filesystem. Case 2
remains the same: we use the looser "it could be a filename"
standard introduced by 28fcc0b.
We can accomplish this by pulling the wildcard logic out of
check_filename() and putting it into verify_filename(). Its
partner verify_non_filename() does not need a change, since
check_filename() goes back to implementing the "higher
standard".
Besides these two callers of check_filename(), there is one
other: git-checkout does a similar DWIM itself. It hits this
code path only after get_sha1() has returned failure, making
it case 2, which gets the special wildcard treatment.
Note that we drop the tests in t2019 in favor of a more
complete set in t6133. t2019 was not the right place for
them (it's about refname ambiguity, not dwim parsing
ambiguity), and the second test explicitly checked for the
opposite result of the case we are fixing here (which didn't
really make any sense; as shown by the test_must_fail in the
test, it would only serve to annoy people).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we have a "--" flag, we should not be doing DWIM magic
based on whether arguments can be filenames. Reorder the
conditional to avoid the check_filename() call entirely in
this case. The outcome is the same, but the short-circuit
makes the dependency more clear.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The underlying machinery used by "ls-files -o" and other commands
have been taught not to create empty submodule ref cache for a
directory that is not a submodule. This removes a ton of wasted
CPU cycles.
* jk/ref-cache-non-repository-optim:
resolve_gitlink_ref: ignore non-repository paths
clean: make is_git_repository a public function
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
"git tag" started listing a tag "foo" as "tags/foo" when a branch
named "foo" exists in the same repository; remove this unnecessary
disambiguation, which is a regression introduced in v2.7.0.
* jk/list-tag-2.7-regression:
tag: do not show ambiguous tag names as "tags/foo"
t6300: use test_atom for some un-modern tests
Many codepaths that run "gc --auto" before exiting kept packfiles
mapped and left the file descriptors to them open, which was not
friendly to systems that cannot remove files that are open. They
now close the packs before doing so.
* js/close-packs-before-gc:
receive-pack: release pack files before garbage-collecting
merge: release pack files before garbage-collecting
am: release pack files before garbage-collecting
fetch: release pack files before garbage-collecting
Some codepaths used fopen(3) when opening a fixed path in $GIT_DIR
(e.g. COMMIT_EDITMSG) that is meant to be left after the command is
done. This however did not work well if the repository is set to
be shared with core.sharedRepository and the umask of the previous
user is tighter. They have been made to work better by calling
unlink(2) and retrying after fopen(3) fails with EPERM.
* js/fopen-harder:
Handle more file writes correctly in shared repos
commit: allow editing the commit message even in shared repos
A few unportable C construct have been spotted by clang compiler
and have been fixed.
* jk/clang-pedantic:
bswap: add NO_UNALIGNED_LOADS define
avoid shifting signed integers 31 bits
I couldn't find any other examples of people referring to this
character as a "blank".
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since b7cc53e9 (tag.c: use 'ref-filter' APIs, 2015-07-11),
git-tag has started showing tags with ambiguous names (i.e.,
when both "heads/foo" and "tags/foo" exists) as "tags/foo"
instead of just "foo". This is both:
- pointless; the output of "git tag" includes only
refs/tags, so we know that "foo" means the one in
"refs/tags".
and
- ambiguous; in the original output, we know that the line
"foo" means that "refs/tags/foo" exists. In the new
output, it is unclear whether we mean "refs/tags/foo" or
"refs/tags/tags/foo".
The reason this happens is that commit b7cc53e9 switched
git-tag to use ref-filter's "%(refname:short)" output
formatting, which was adapted from for-each-ref. This more
general code does not know that we care only about tags, and
uses shorten_unambiguous_ref to get the short-name. We need
to tell it that we care only about "refs/tags/", and it
should shorten with respect to that value.
In theory, the ref-filter code could figure this out by us
passing FILTER_REFS_TAGS. But there are two complications
there:
1. The handling of refname:short is deep in formatting
code that does not even have our ref_filter struct, let
alone the arguments to the filter_ref struct.
2. In git v2.7.0, we expose the formatting language to the
user. If we follow this path, it will mean that
"%(refname:short)" behaves differently for "tag" versus
"for-each-ref" (including "for-each-ref refs/tags/"),
which can lead to confusion.
Instead, let's add a new modifier to the formatting
language, "strip", to remove a specific set of prefix
components. This fixes "git tag", and lets users invoke the
same behavior from their own custom formats (for "tag" or
"for-each-ref") while leaving ":short" with its same
consistent meaning in all places.
We introduce a test in t7004 for "git tag", which fails
without this patch. We also add a similar test in t3203 for
"git branch", which does not actually fail. But since it is
likely that "branch" will eventually use the same formatting
code, the test helps defend against future regressions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have always had is_git_directory(), for looking at a
specific directory to see if it contains a git repo. In
0179ca7 (clean: improve performance when removing lots of
directories, 2015-06-15), we added is_git_repository() which
checks for a non-bare repository by looking at its ".git"
entry.
However, the fix in 0179ca7 needs to be applied other
places, too. Let's make this new helper globally available.
We need to give it a better name, though, to avoid confusion
with is_git_directory(). This patch does that, documents
both functions with a comment to reduce confusion, and
removes the clean-specific references in the comments.
Based-on-a-patch-by: Andreas Krey <a.krey@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prefix is already set up in "revs". The same prefix should be used for
all options parsing. So kill the last argument. This patch does not
actually change anything because the only caller does use the same
prefix for init_revisions() and diff_no_index().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>