Some context before we talk about the removed code.
This paint_down() is part of step 6 of 58babff (shallow.c: the 8 steps
to select new commits for .git/shallow - 2013-12-05). When we fetch from
a shallow repository, we need to know if one of the new/updated refs
needs new "shallow commits" in .git/shallow (because we don't have
enough history of those refs) and which one.
The question at step 6 is, what (new) shallow commits are required in
other to maintain reachability throughout the repository _without_
cutting our history short? To answer, we mark all commits reachable from
existing refs with UNINTERESTING ("rev-list --not --all"), mark shallow
commits with BOTTOM, then for each new/updated refs, walk through the
commit graph until we either hit UNINTERESTING or BOTTOM, marking the
ref on the commit as we walk.
After all the walking is done, we check the new shallow commits. If we
have not seen any new ref marked on a new shallow commit, we know all
new/updated refs are reachable using just our history and .git/shallow.
The shallow commit in question is not needed and can be thrown away.
So, the code.
The loop here (to walk through commits) is basically
1. get one commit from the queue
2. ignore if it's SEEN or UNINTERESTING
3. mark it
4. go through all the parents and..
5a. mark it if it's never marked before
5b. put it back in the queue
What we do in this patch is drop step 5a because it is not
necessary. The commit being marked at 5a is put back on the queue, and
will be marked at step 3 at the next iteration. The only case it will
not be marked is when the commit is already marked UNINTERESTING (5a
does not check this), which will be ignored at step 2.
But we don't care about refs marking on UNINTERESTING. We care about the
marking on _shallow commits_ that are not reachable from our current
history (and having UNINTERESTING on it means it's reachable). So it's
ok for an UNINTERESTING not to be ref-marked.
Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
First of all, 1 << 31 is technically undefined behaviour, so let's just
use an unsigned literal.
If i is 'signed int' and gcc doesn't know that i is positive, gcc
generates code to compute the C99-mandated values of "i / 32" and "i %
32", which is a lot more complicated than simple a simple shifts/mask.
The only caller of paint_down actually passes an "unsigned int" value,
but the prototype of paint_down causes (completely well-defined)
conversion to signed int, and gcc has no way of knowing that the
converted value is non-negative. Just make the id parameter unsigned.
In update_refstatus, the change in generated code is much smaller,
presumably because gcc is smart enough to see that i starts as 0 and is
only incremented, so it is allowed (per the UD of signed overflow) to
assume that i is always non-negative. But let's just help less smart
compilers generate good code anyway.
Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The expression info->free+size is technically undefined behaviour in
exactly the case we want to test for. Moreover, the compiler is likely
to translate the expression to
(unsigned long)info->free + size > (unsigned long)info->end
where there's at least a theoretical chance that the LHS could wrap
around 0, giving a false negative.
This might as well be written using pointer subtraction avoiding these
issues.
Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
paint_alloc() allocates a big block of memory and splits it into
smaller, fixed size, chunks of memory whenever it's called. Each chunk
contains enough bits to present all "new refs" [1] in a fetch from a
shallow repository.
We do not check if the new "big block" is smaller than the requested
memory chunk though. If it happens, we'll happily pass back a memory
region smaller than expected. Which will lead to problems eventually.
A normal fetch may add/update a dozen new refs. Let's stay on the
"reasonably extreme" side and say we need 16k refs (or bits from
paint_alloc's perspective). Each chunk of memory would be 2k, much
smaller than the memory pool (512k).
So, normally, the under-allocation situation should never happen. A bad
guy, however, could make a fetch that adds more than 4m new/updated refs
to this code which results in a memory chunk larger than pool size.
Check this case and abort.
Noticed-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Reviewed-by: Jeff King <peff@peff.net>
[1] Details are in commit message of 58babff (shallow.c: the 8 steps to
select new commits for .git/shallow - 2013-12-05), step 6.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to allocate a "big" block of memory in paint_alloc(). The exact
size does not really matter. But the pool size has no relation with
commit-slab. Stop using that macro here.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
paint_alloc() is basically malloc(), tuned for allocating a fixed number
of bits on every call without worrying about freeing any individual
allocation since all will be freed at the end. It does it by allocating
a big block of memory every time it runs out of "free memory". "slab" is
a poor choice of name, at least poorer than "pool".
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a stash, we need to look at the diff between
the working tree and HEAD, and do so using the git-diff
porcelain. Because git-diff enables porcelain config like
renames by default, this causes at least one problem. The
--name-only format will not mention the source side of a
rename, meaning we will fail to stash a deletion that is
part of a rename.
We could fix that case by passing --no-renames, but this is
a symptom of a larger problem. We should be using the
diff-index plumbing here, which does not have renames
enabled by default, and also does not respect any
potentially confusing config options.
Reported-by: Matthew Patey <matthew.patey2167@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit 17966c0a6 (http: avoid disconnecting on 404s
for loose objects, 2016-07-11), we turn off curl's
FAILONERROR option and instead manually deal with failing
HTTP codes.
However, the logic to do so only recognizes HTTP 404 as a
failure. This is probably the most common result, but if we
were to get another code, the curl result remains CURLE_OK,
and we treat it as success. We still end up detecting the
failure when we try to zlib-inflate the object (which will
fail), but instead of reporting the HTTP error, we just
claim that the object is corrupt.
Instead, let's catch anything in the 300's or above as an
error (300's are redirects which are not an error at the
HTTP level, but are an indication that we've explicitly
disabled redirects, so we should treat them as such; we
certainly don't have the resulting object content).
Note that we also fill in req->errorstr, which we didn't do
before. Without FAILONERROR, curl will not have filled this
in, and it will remain a blank string. This never mattered
for the 404 case, because in the logic below we hit the
"missing_target()" branch and print nothing. But for other
errors, we'd want to say _something_, if only to fill in the
blank slot in the error message.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ew/http-walker:
list: avoid incompatibility with *BSD sys/queue.h
http-walker: reduce O(n) ops with doubly-linked list
http: avoid disconnecting on 404s for loose objects
http-walker: remove unused parameter from fetch_object
The previous commit made HTTP redirects more obvious and
tightened up the default behavior. However, there's another
way for a server to ask a git client to fetch arbitrary
content: by having an http-alternates file (or a regular
alternates file, which is used as a backup).
Similar to the HTTP redirect case, a malicious server can
claim to have refs pointing at object X, return a 404 when
the client asks for X, but point to some other URL via
http-alternates, which the client will transparently fetch.
The end result is that it looks from the user's perspective
like the objects came from the malicious server, as the
other URL is not mentioned at all.
Worse, because we feed the new URL to curl ourselves, the
usual protocol restrictions do not kick in (neither curl's
default of disallowing file://, nor the protocol
whitelisting in f4113cac0 (http: limit redirection to
protocol-whitelist, 2015-09-22).
Let's apply the same rules here as we do for HTTP redirects.
Namely:
- unless http.followRedirects is set to "always", we will
not follow remote redirects from http-alternates (or
alternates) at all
- set CURLOPT_PROTOCOLS alongside CURLOPT_REDIR_PROTOCOLS
restrict ourselves to a known-safe set and respect any
user-provided whitelist.
- mention alternate object stores on stderr so that the
user is aware another source of objects may be involved
The first item may prove to be too restrictive. The most
common use of alternates is to point to another path on the
same server. While it's possible for a single-server
redirect to be an attack, it takes a fairly obscure setup
(victim and evil repository on the same host, host speaks
dumb http, and evil repository has access to edit its own
http-alternates file).
So we could make the checks more specific, and only cover
cross-server redirects. But that means parsing the URLs
ourselves, rather than letting curl handle them. This patch
goes for the simpler approach. Given that they are only used
with dumb http, http-alternates are probably pretty rare.
And there's an escape hatch: the user can allow redirects on
a specific server by setting http.<url>.followRedirects to
"always".
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We instruct curl to always follow HTTP redirects. This is
convenient, but it creates opportunities for malicious
servers to create confusing situations. For instance,
imagine Alice is a git user with access to a private
repository on Bob's server. Mallory runs her own server and
wants to access objects from Bob's repository.
Mallory may try a few tricks that involve asking Alice to
clone from her, build on top, and then push the result:
1. Mallory may simply redirect all fetch requests to Bob's
server. Git will transparently follow those redirects
and fetch Bob's history, which Alice may believe she
got from Mallory. The subsequent push seems like it is
just feeding Mallory back her own objects, but is
actually leaking Bob's objects. There is nothing in
git's output to indicate that Bob's repository was
involved at all.
The downside (for Mallory) of this attack is that Alice
will have received Bob's entire repository, and is
likely to notice that when building on top of it.
2. If Mallory happens to know the sha1 of some object X in
Bob's repository, she can instead build her own history
that references that object. She then runs a dumb http
server, and Alice's client will fetch each object
individually. When it asks for X, Mallory redirects her
to Bob's server. The end result is that Alice obtains
objects from Bob, but they may be buried deep in
history. Alice is less likely to notice.
Both of these attacks are fairly hard to pull off. There's a
social component in getting Mallory to convince Alice to
work with her. Alice may be prompted for credentials in
accessing Bob's repository (but not always, if she is using
a credential helper that caches). Attack (1) requires a
certain amount of obliviousness on Alice's part while making
a new commit. Attack (2) requires that Mallory knows a sha1
in Bob's repository, that Bob's server supports dumb http,
and that the object in question is loose on Bob's server.
But we can probably make things a bit more obvious without
any loss of functionality. This patch does two things to
that end.
First, when we encounter a whole-repo redirect during the
initial ref discovery, we now inform the user on stderr,
making attack (1) much more obvious.
Second, the decision to follow redirects is now
configurable. The truly paranoid can set the new
http.followRedirects to false to avoid any redirection
entirely. But for a more practical default, we will disallow
redirects only after the initial ref discovery. This is
enough to thwart attacks similar to (2), while still
allowing the common use of redirects at the repository
level. Since c93c92f30 (http: update base URLs when we see
redirects, 2013-09-28) we re-root all further requests from
the redirect destination, which should generally mean that
no further redirection is necessary.
As an escape hatch, in case there really is a server that
needs to redirect individual requests, the user can set
http.followRedirects to "true" (and this can be done on a
per-server basis via http.*.followRedirects config).
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The discover_refs() function has a local "options" variable
to hold the http_get_options we pass to http_get_strbuf().
But this shadows the global "struct options" that holds our
program-level options, which cannot be accessed from this
function.
Let's give the local one a more descriptive name so we can
tell the two apart.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a malicious server redirects the initial ref
advertisement, it may be able to leak sha1s from other,
unrelated servers that the client has access to. For
example, imagine that Alice is a git user, she has access to
a private repository on a server hosted by Bob, and Mallory
runs a malicious server and wants to find out about Bob's
private repository.
Mallory asks Alice to clone an unrelated repository from her
over HTTP. When Alice's client contacts Mallory's server for
the initial ref advertisement, the server issues an HTTP
redirect for Bob's server. Alice contacts Bob's server and
gets the ref advertisement for the private repository. If
there is anything to fetch, she then follows up by asking
the server for one or more sha1 objects. But who is the
server?
If it is still Mallory's server, then Alice will leak the
existence of those sha1s to her.
Since commit c93c92f30 (http: update base URLs when we see
redirects, 2013-09-28), the client usually rewrites the base
URL such that all further requests will go to Bob's server.
But this is done by textually matching the URL. If we were
originally looking for "http://mallory/repo.git/info/refs",
and we got pointed at "http://bob/other.git/info/refs", then
we know that the right root is "http://bob/other.git".
If the redirect appears to change more than just the root,
we punt and continue to use the original server. E.g.,
imagine the redirect adds a URL component that Bob's server
will ignore, like "http://bob/other.git/info/refs?dummy=1".
We can solve this by aborting in this case rather than
silently continuing to use Mallory's server. In addition to
protecting from sha1 leakage, it's arguably safer and more
sane to refuse a confusing redirect like that in general.
For example, part of the motivation in c93c92f30 is
avoiding accidentally sending credentials over clear http,
just to get a response that says "try again over https". So
even in a non-malicious case, we'd prefer to err on the side
of caution.
The downside is that it's possible this will break a
legitimate but complicated server-side redirection scheme.
The setup given in the newly added test does work, but it's
convoluted enough that we don't need to care about it. A
more plausible case would be a server which redirects a
request for "info/refs?service=git-upload-pack" to just
"info/refs" (because it does not do smart HTTP, and for some
reason really dislikes query parameters). Right now we
would transparently downgrade to dumb-http, but with this
patch, we'd complain (and the user would have to set
GIT_SMART_HTTP=0 to fetch).
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function looks for a common tail between what we asked
for and where we were redirected to, but it open-codes the
comparison. We can avoid some confusing subtractions by
using strip_suffix_mem().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pathname value in a clean/smudge filter process "key=value" pair can
contain the '=' character (introduced in edcc858). Make the user aware
of this issue in the docs, add a corresponding test case, and fix the
issue in filter process value parser of the example implementation in
contrib.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove superfluous .gitignore pattern and invalid '.' in `git commit`
calls.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If git-p4 tried to store an empty file in GitLFS then it crashed while
parsing the pointer file:
oid = re.search(r'^oid \w+:(\w+)', pointerFile, re.MULTILINE).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
This happens because GitLFS does not create a pointer file for an empty
file. Teach git-p4 this behavior to fix the problem and add a test case.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update Travis-CI dependencies to the latest available versions in
Linux build.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
--only is implied when paths are present, and required
them unless --amend. But with --allow-empty it should
be allowed as well - it is the only way to create an
empty commit in the presence of staged changes.
Signed-off-by: Andreas Krey <a.krey@gmx.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next line the `actual` is overwritten again, so no need to redirect
the output of checkout into that file.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Noticed-by: David Turner <dturner@twosigma.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Working with a repo that used to be all CRLF. At some point it
was changed to all LF, with `text=auto` in .gitattributes.
Trying to cherry-pick a commit from before the switchover fails:
$ git cherry-pick -Xrenormalize <commit>
fatal: CRLF would be replaced by LF in [path]
Commit 65237284 "unify the "auto" handling of CRLF" introduced
a regression:
Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX,
SAFE_CRLF_RENORMALIZE was feed into check_safe_crlf(). This is
wrong because here everything else than SAFE_CRLF_WARN is treated as
SAFE_CRLF_FAIL.
Call check_safe_crlf() only if checksafe is SAFE_CRLF_WARN or
SAFE_CRLF_FAIL.
Reported-by: Eevee (Lexy Munroe) <eevee@veekun.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pull --rebase" always runs "git rebase" after fetching the
commit to serve as the new base, even when the new base is a
descendant of the current HEAD, i.e. we haven't done any work.
In such a case, we can instead fast-forward to the new base without
invoking the rebase process.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the rule to convert "unsigned char [20]" into "struct
object_id *" in contrib/coccinelle/
* rs/cocci:
cocci: avoid self-references in object_id transformations
Update to the test framework made in 2.9 timeframe broke running
the tests under valgrind, which has been fixed.
* nd/test-helpers:
valgrind: support test helpers
Portability update and workaround for builds on recent Mac OS X.
* ls/macos-update:
travis-ci: disable GIT_TEST_HTTPD for macOS
Makefile: set NO_OPENSSL on macOS by default
Fix for a racy false-positive test failure.
* as/merge-attr-sleep:
t6026: clarify the point of "kill $(cat sleep.pid)"
t6026: ensure that long-running script really is
Revert "t6026-merge-attr: don't fail if sleep exits early"
Revert "t6026-merge-attr: ensure that the merge driver was called"
t6026-merge-attr: ensure that the merge driver was called
t6026-merge-attr: don't fail if sleep exits early
Recent update to git-sh-setup (a library of shell functions that
are used by our in-tree scripted Porcelain commands) included
another shell library git-sh-i18n without specifying where it is,
relying on the $PATH. This has been fixed to be more explicit by
prefixing $(git --exec-path) output in front.
* ak/sh-setup-dot-source-i18n-fix:
git-sh-setup: be explicit where to dot-source git-sh-i18n from.
"git daemon" used fixed-length buffers to turn URL to the
repository the client asked for into the server side directory
path, using snprintf() to avoid overflowing these buffers, but
allowed possibly truncated paths to the directory. This has been
tightened to reject such a request that causes overlong path to be
required to serve.
* jk/daemon-path-ok-check-truncation:
daemon: detect and reject too-long paths
The code that we have used for the past 10+ years to cycle
4-element ring buffers turns out to be not quite portable in
theoretical world.
* rs/ring-buffer-wraparound:
hex: make wraparound of the index into ring-buffer explicit
"git send-email" attempts to pick up valid e-mails from the
trailers, but people in real world write non-addresses there, like
"Cc: Stable <add@re.ss> # 4.8+", which broke the output depending
on the availability and vintage of Mail::Address perl module.
* mm/send-email-cc-cruft-after-address:
Git.pm: add comment pointing to t9000
t9000-addresses: update expected results after fix
parse_mailboxes: accept extra text after <...> address
The command-line completion script (in contrib/) learned to
complete "git cmd ^mas<HT>" to complete the negative end of
reference to "git cmd ^master".
* cp/completion-negative-refs:
completion: support excluding refs
Extract a small helper out of the function that reads the authors
script file "git am" internally uses.
This by itself is not useful until a second caller appears in the
future for "rebase -i" helper.
* jc/am-read-author-file:
am: refactor read_author_script()
Since 650c44925 (common-main: call git_extract_argv0_path(),
2016-07-01), the argv[0] that is seen in cmd_main() of
individual programs is always the basename of the
executable, as common-main strips off the full path. This
can produce confusing results for git-daemon, which wants to
re-exec itself.
For instance, if the program was originally run as
"/usr/lib/git/git-daemon", it will try just re-execing
"git-daemon", which will find the first instance in $PATH.
If git's exec-path has not been prepended to $PATH, we may
find the git-daemon from a different version (or no
git-daemon at all).
Normally this isn't a problem. Git commands are run as "git
daemon", the git wrapper puts the exec-path at the front of
$PATH, and argv[0] is already "daemon" anyway. But running
git-daemon via its full exec-path, while not really a
recommended method, did work prior to 650c44925. Let's make
it work again.
The real goal of 650c44925 was not to munge argv[0], but to
reliably set the argv0_path global. The only reason it
munges at all is that one caller, the git.c wrapper,
piggy-backed on that computation to find the command
basename. Instead, let's leave argv[0] untouched in
common-main, and have git.c do its own basename computation.
While we're at it, let's drop the return value from
git_extract_argv0_path(). It was only ever used in this one
callsite, and its dual purposes is what led to this
confusion in the first place.
Note that by changing the interface, the compiler can
confirm for us that there are no other callers storing the
return value. But the compiler can't tell us whether any of
the cmd_main() functions (besides git.c) were relying on the
basename munging. However, we can observe that prior to
650c44925, no other cmd_main() functions did that munging,
and no new cmd_main() functions have been introduced since
then. So we can't be regressing any of those cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lazy prereq for MKTEMP uses "mktemp -t" to see if
mergetool's internal mktemp call will be able to run. But
unlike the call inside mergetool, we do not ever bother to
clean up the result, and the /tmp of git developers will
slowly fill up with "foo.XXXXXX" directories as they run the
test suite over and over. Let's clean up the directory
after we've verified its creation.
Note that we don't use test_when_finished here, and instead
just make rmdir part of the &&-chain. We should only remove
something that we're confident we just created. A failure in
the middle of the chain either means there's nothing to
clean up, or we are very confused and should err on the side
of caution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>