Now that we can read packet data from memory as easily as a
descriptor, get_remote_heads can take either one as a
source. This will allow further refactoring in remote-curl.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packet_read function reads from a descriptor. The
packet_get_line function is similar, but reads from an
in-memory buffer, and uses a completely separate
implementation. This patch teaches the generic packet_read
function to accept either source, and we can do away with
packet_get_line's implementation.
There are two other differences to account for between the
old and new functions. The first is that we used to read
into a strbuf, but now read into a fixed size buffer. The
only two callers are fine with that, and in fact it
simplifies their code, since they can use the same
static-buffer interface as the rest of the packet_read_line
callers (and we provide a similar convenience wrapper for
reading from a buffer rather than a descriptor).
This is technically an externally-visible behavior change in
that we used to accept arbitrary sized packets up to 65532
bytes, and now cap out at LARGE_PACKET_MAX, 65520. In
practice this doesn't matter, as we use it only for parsing
smart-http headers (of which there is exactly one defined,
and it is small and fixed-size). And any extension headers
would be breaking the protocol to go over LARGE_PACKET_MAX
anyway.
The other difference is that packet_get_line would return
on error rather than dying. However, both callers of
packet_get_line are actually improved by dying.
The first caller does its own error checking, but we can
drop that; as a result, we'll actually get more specific
reporting about protocol breakage when packet_read dies
internally. The only downside is that packet_read will not
print the smart-http URL that failed, but that's not a big
deal; anybody not debugging can already see the remote's URL
already, and anybody debugging would want to run with
GIT_CURL_VERBOSE anyway to see way more information.
The second caller, which is just trying to skip past any
extra smart-http headers (of which there are none defined,
but which we allow to keep room for future expansion), did
not error check at all. As a result, it would treat an error
just like a flush packet. The resulting mess would generally
cause an error later in get_remote_heads, but now we get
error reporting much closer to the source of the problem.
Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is just write_or_die by another name. The one
distinction is that write_or_die will treat EPIPE specially
by suppressing error messages. That's fine, as we die by
SIGPIPE anyway (and in the off chance that it is disabled,
write_or_die will simulate it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before parsing a suspected smart-HTTP response verify the returned
Content-Type matches the standard. This protects a client from
attempting to process a payload that smells like a smart-HTTP
server response.
JGit has been doing this check on all responses since the dawn of
time. I mistakenly failed to include it in git-core when smart HTTP
was introduced. At the time I didn't know how to get the Content-Type
from libcurl. I punted, meant to circle back and fix this, and just
plain forgot about it.
Signed-off-by: Shawn Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In particular, gcc issues an "'gzip_size' might be used uninitialized"
warning (-Wuninitialized). However, this warning is a false positive,
since the 'gzip_size' variable would not, in fact, be used uninitialized.
In order to suppress the warning, we simply initialise the variable to
zero in it's declaration.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes fetch from servers that ask for auth only during the actual
packing phase. This is not really a recommended configuration, but it
cleans up the code at the same time.
* jk/maint-http-half-auth-fetch:
remote-curl: retry failed requests for auth even with gzip
remote-curl: hoist gzip buffer size to top of post_rpc
Commit b81401c taught the post_rpc function to retry the
http request after prompting for credentials. However, it
did not handle two cases:
1. If we have a large request, we do not retry. That's OK,
since we would have sent a probe (with retry) already.
2. If we are gzipping the request, we do not retry. That
was considered OK, because the intended use was for
push (e.g., listing refs is OK, but actually pushing
objects is not), and we never gzip on push.
This patch teaches post_rpc to retry even a gzipped request.
This has two advantages:
1. It is possible to configure a "half-auth" state for
fetching, where the set of refs and their sha1s are
advertised, but one cannot actually fetch objects.
This is not a recommended configuration, as it leaks
some information about what is in the repository (e.g.,
an attacker can try brute-forcing possible content in
your repository and checking whether it matches your
branch sha1). However, it can be slightly more
convenient, since a no-op fetch will not require a
password at all.
2. It future-proofs us should we decide to ever gzip more
requests.
Signed-off-by: Jeff King <peff@peff.net>
When we gzip the post data for a smart-http rpc request, we
compute the gzip body and its size inside the "use_gzip"
conditional. We keep track of the body after the conditional
ends, but not the size. Let's remember both, which will
enable us to retry failed gzip requests in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Further clean-up to the http codepath that picks up results after
cURL library is done with one request slot.
* jk/maint-http-init-not-in-result-handler:
http: do not set up curl auth after a 401
remote-curl: do not call run_slot repeatedly
Fixes a regression in maint-1.7.11 (v1.7.11.7), maint (v1.7.12.1)
and master (v1.8.0-rc0).
* jk/maint-http-half-auth-push:
http: fix segfault in handle_curl_result
When we get an http 401, we prompt for credentials and put
them in our global credential struct. We also feed them to
the curl handle that produced the 401, with the intent that
they will be used on a retry.
When the code was originally introduced in commit 42653c0,
this was a necessary step. However, since dfa1725, we always
feed our global credential into every curl handle when we
initialize the slot with get_active_slot. So every further
request already feeds the credential to curl.
Moreover, accessing the slot here is somewhat dubious. After
the slot has produced a response, we don't actually control
it any more. If we are using curl_multi, it may even have
been re-initialized to handle a different request.
It just so happens that we will reuse the curl handle within
the slot in such a case, and that because we only keep one
global credential, it will be the one we want. So the
current code is not buggy, but it is misleading.
By cleaning it up, we can remove the slot argument entirely
from handle_curl_result, making it much more obvious that
slots should not be accessed after they are marked as
finished.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b81401c (http: prompt for credentials on failed POST)
taught post_rpc to call run_slot in a loop in order to retry
a request after asking the user for credentials. However,
after a call to run_slot we will have called
finish_active_slot. This means we have released the slot,
and we should no longer look at it.
As it happens, this does not cause any bugs in the current
code, since we know that we are not using curl_multi in this
code path, and therefore nobody will have taken over our
slot in the meantime. However, it is good form to actually
call get_active_slot again. It also future proofs us against
changes in the http code.
We can do this by jumping back to a retry label at the top
of our function. We just need to reorder a few setup lines
that should not be repeated; everything else within the loop
is either idempotent, needs to be repeated, or in a path we
do not follow (e.g., we do not even try when large_request
is set, because we don't know how much data we might have
streamed from our helper program).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we create an http active_request_slot, we can set its
"results" pointer back to local storage. The http code will
fill in the details of how the request went, and we can
access those details even after the slot has been cleaned
up.
Commit 8809703 (http: factor out http error code handling)
switched us from accessing our local results struct directly
to accessing it via the "results" pointer of the slot. That
means we're accessing the slot after it has been marked as
finished, defeating the whole purpose of keeping the results
storage separate.
Most of the time this doesn't matter, as finishing the slot
does not actually clean up the pointer. However, when using
curl's multi interface with the dumb-http revision walker,
we might actually start a new request before handing control
back to the original caller. In that case, we may reuse the
slot, zeroing its results pointer, and leading the original
caller to segfault while looking for its results inside the
slot.
Instead, we need to pass a pointer to our local results
storage to the handle_curl_result function, rather than
relying on the pointer in the slot struct. This matches what
the original code did before the refactoring (which did not
use a separate function, and therefore just accessed the
results struct directly).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows users to turn off smart-http when talking to dumb-only
servers.
* jk/smart-http-switch:
remote-curl: let users turn off smart http
remote-curl: rename is_http variable
Usually there is no need for users to specify whether an
http remote is smart or dumb; the protocol is designed so
that a single initial request is made, and the client can
determine the server's capability from the response.
However, some misconfigured dumb-only servers may not like
the initial request by a smart client, as it contains a
query string. Until recently, commit 703e6e7 worked around
this by making a second request. However, that commit was
recently reverted due to its side effect of masking the
initial request's error code.
Since git has had that workaround for several years, we
don't know exactly how many such misconfigured servers are
out there. The reversion of 703e6e7 assumes they are rare
enough not to worry about. Still, that reversion leaves
somebody who does run into such a server with no escape
hatch at all. Let's give them an environment variable they
can tweak to perform the "dumb" request.
This is intentionally not a documented interface. It's
overly simple and is really there for debugging in case
somebody does complain about git not working with their
server. A real user-facing interface would entail a
per-remote or per-URL config variable.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't actually care whether the connection is http or
not; what we care about is whether it might be smart http.
Rename the variable to be more accurate, which will make it
easier to later make smart-http optional.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some HTTP servers try to use gzip compression on the /info/refs
request to save transfer bandwidth. Repositories with many tags
may find the /info/refs request can be gzipped to be 50% of the
original size due to the few but often repeated bytes used (hex
SHA-1 and commonly digits in tag names).
For most HTTP requests enable "Accept-Encoding: gzip" ensuring
the /info/refs payload can use this encoding format.
Only request gzip encoding from servers. Although deflate is
supported by libcurl, most servers have standardized on gzip
encoding for compression as that is what most browsers support.
Asking for deflate increases request sizes by a few bytes, but is
unlikely to ever be used by a server.
Disable the Accept-Encoding header on probe RPCs as response bodies
are supposed to be exactly 4 bytes long, "0000". The HTTP headers
requesting and indicating compression use more space than the data
transferred in the body.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 703e6e76a1.
Retrying without the query parameter was added as a workaround
for a single broken HTTP server at git.debian.org[1]. The server
was misconfigured to route every request with a query parameter
into gitweb.cgi. Admins fixed the server's configuration within
16 hours of the bug report to the Git mailing list, but we still
patched Git with this fallback and have been paying for it since.
Most Git hosting services configure the smart HTTP protocol and the
retry logic confuses users when there is a transient HTTP error as
Git dropped the real error from the smart HTTP request. Removing the
retry makes root causes easier to identify.
[1] http://thread.gmane.org/gmane.comp.version-control.git/137609
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the smart-http GET requests go through the http_get_*
functions, which will prompt for credentials and retry if we
see an HTTP 401.
POST requests, however, do not go through any central point.
Moreover, it is difficult to retry in the general case; we
cannot assume the request body fits in memory or is even
seekable, and we don't know how much of it was consumed
during the attempt.
Most of the time, this is not a big deal; for both fetching
and pushing, we make a GET request before doing any POSTs,
so typically we figure out the credentials during the first
request, then reuse them during the POST. However, some
servers may allow a client to get the list of refs from
receive-pack without authentication, and then require
authentication when the client actually tries to POST the
pack.
This is not ideal, as the client may do a non-trivial amount
of work to generate the pack (e.g., delta-compressing
objects). However, for a long time it has been the
recommended example configuration in git-http-backend(1) for
setting up a repository with anonymous fetch and
authenticated push. This setup has always been broken
without putting a username into the URL. Prior to commit
986bbc0, it did work with a username in the URL, because git
would prompt for credentials before making any requests at
all. However, post-986bbc0, it is totally broken. Since it
has been advertised in the manpage for some time, we should
make sure it works.
Unfortunately, it is not as easy as simply calling post_rpc
again when it fails, due to the input issue mentioned above.
However, we can still make this specific case work by
retrying in two specific instances:
1. If the request is large (bigger than LARGE_PACKET_MAX),
we will first send a probe request with a single flush
packet. Since this request is static, we can freely
retry it.
2. If the request is small and we are not using gzip, then
we have the whole thing in-core, and we can freely
retry.
That means we will not retry in some instances, including:
1. If we are using gzip. However, we only do so when
calling git-upload-pack, so it does not apply to
pushes.
2. If we have a large request, the probe succeeds, but
then the real POST wants authentication. This is an
extremely unlikely configuration and not worth worrying
about.
While it might be nice to cover those instances, doing so
would be significantly more complex for very little
real-world gain. In the long run, we will be much better off
when curl learns to internally handle authentication as a
callback, and we can cleanly handle all cases that way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git push" over smart-http lost progress output a few releases ago.
By Jeff King
* jk/maint-push-progress:
t5541: test more combinations of --progress
teach send-pack about --[no-]progress
send-pack: show progress when isatty(2)
When "git fetch" encounters repositories with too many references, the
command line of "fetch-pack" that is run by a helper e.g. remote-curl, may
fail to hold all of them. Now such an internal invocation can feed the
references through the standard input of "fetch-pack".
By Ivan Todoroski
* it/fetch-pack-many-refs:
remote-curl: main test case for the OS command line overflow
fetch-pack: test cases for the new --stdin option
remote-curl: send the refs to fetch-pack on stdin
fetch-pack: new --stdin option to read refs from stdin
Conflicts:
t/t5500-fetch-pack.sh
The send_pack function gets a "progress" flag saying "yes,
definitely show progress" or "no, definitely do not show
progress". This gets set properly by transport_push when
send_pack is called directly.
However, when the send-pack command is executed separately
(as it is for the remote-curl helper), there is no way to
tell it "definitely do this". As a result, we do not
properly respect "git push --no-progress" for smart-http
remotes; you will still get progress if stderr is a tty.
This patch teaches send-pack --progress and --no-progress,
and teaches remote-curl to pass the appropriate option to
override send-pack's isatty check. This fixes the
--no-progress case above, and as a bonus, also makes "git
push --progress" work when stderr is not a tty.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we can throw an arbitrary number of refs at fetch-pack using
its --stdin option, we use it in the remote-curl helper to bypass the
OS command line length limit.
Signed-off-by: Ivan Todoroski <grnch@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The protocol between transport-helper.c and remote-curl requires
remote-curl to always print a blank line after the push command
has run. If the blank line is ommitted, transport-helper kills its
container process (the git push the user started) with exit(128)
and no message indicating a problem, assuming the helper already
printed reasonable error text to the console.
However if the remote rejects all branches with "ng" commands in the
report-status reply, send-pack terminates with non-zero status, and
in turn remote-curl exited with non-zero status before outputting
the blank line after the helper status printed by send-pack. No
error messages reach the user.
This caused users to see the following from git push over HTTP
when the remote side's update hook rejected the branch:
$ git push http://... master
Counting objects: 4, done.
Delta compression using up to 6 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 301 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
$
Always print a blank line after the send-pack process terminates,
ensuring the helper status report (if it was output) will be
correctly parsed by the calling transport-helper.c. This ensures
the helper doesn't abort before the status report can be shown to
the user.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, git push --quiet produces some non-error output, e.g.:
$ git push --quiet
Unpacking objects: 100% (3/3), done.
This fixes a bug reported for the fedora git package:
https://bugzilla.redhat.com/show_bug.cgi?id=725593
Reported-by: Jesse Keating <jkeating@redhat.com>
Cc: Todd Zullinger <tmz@pobox.com>
Commit 90a6c7d4 (propagate --quiet to send-pack/receive-pack)
introduced the --quiet option to receive-pack and made send-pack
pass that option. Older versions of receive-pack do not recognize
the option, however, and terminate immediately. The commit was
therefore reverted.
This change instead adds a 'quiet' capability to receive-pack,
which is a backwards compatible.
In addition, this fixes push --quiet via http: A verbosity of 0
means quiet for remote helpers.
Reported-by: Tobias Ulmer <tobiasu@tmux.org>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When receive-pack advertises its list of refs, it generally hides the
capabilities information after a NUL at the end of the first ref.
However, when we have an empty repository, there are no refs, and
therefore receive-pack writes a fake ref "capabilities^{}" with the
capabilities afterwards.
On the client side, git reads the result with get_remote_heads(). We pick
the capabilities from the end of the line, and then call check_ref() to
make sure the ref name is valid. We see that it isn't, and don't bother
adding it to our list of refs.
However, the call to check_ref() is enabled by passing the REF_NORMAL flag
to get_remote_heads. For the regular git transport, we pass REF_NORMAL in
get_refs_via_connect() if we are doing a push (since only receive-pack
uses this fake ref). But in remote-curl, we never use this flag, and we
accept the fake ref as a real one, passing it back from the helper to the
parent git-push.
Most of the time this bug goes unnoticed, as the fake ref won't match our
refspecs. However, if "--mirror" is used, then we see it as remote cruft
to be pruned, and try to pass along a deletion refspec for it. Of course
this refspec has bogus syntax (because of the ^{}), and the helper
complains, aborting the push.
Let's have remote-curl mirror what the builtin get_refs_via_connect() does
(at least for the case of using git protocol; we can leave the dumb
info/refs reader as it is).
This also fixes pushing with --mirror to a smart-http remote that uses
alternates. The fake ".have" refs the server gives to avoid unnecessary
network transfer has a similar bad interactions with the machinery.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before commit 986bbc08, git was proactive about asking for
http passwords. It assumed that if you had a username in
your URL, you would also want a password, and asked for it
before making any http requests.
However, this could interfere with the use of .netrc (see
986bbc08 for details). And it was also unnecessary, since
the http fetching code had learned to recognize an HTTP 401
and prompt the user then. Furthermore, the proactive prompt
could interfere with the usage of .netrc (see 986bbc08 for
details).
Unfortunately, the http push-over-DAV code never learned to
recognize HTTP 401, and so was broken by this change. This
patch does a quick fix of re-enabling the "proactive auth"
strategy only for http-push, leaving the dumb http fetch and
smart-http as-is.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_remote_heads function reads the list of remote refs
during git protocol session. It dates all the way back to
def88e9 (Commit first cut at "git-fetch-pack", 2005-07-04).
At that time, the idea was to come up with a list of refs we
were interested in, and then filter the list as we got it
from the remote side.
Later, 1baaae5 (Make maximal use of the remote refs,
2005-10-28) stopped filtering at the get_remote_heads layer,
letting us use the non-matching refs to find common history.
As a result, all callers now simply pass an empty match
list (and any future callers will want to do the same). So
let's drop these now-useless parameters.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/http-auth:
http_init: accept separate URL parameter
http: use hostname in credential description
http: retry authentication failures for all http requests
remote-curl: don't retry auth failures with dumb protocol
improve httpd auth tests
url: decode buffers that are not NUL-terminated
The http_init function takes a "struct remote". Part of its
initialization procedure is to look at the remote's url and
grab some auth-related parameters. However, using the url
included in the remote is:
- wrong; the remote-curl helper may have a separate,
unrelated URL (e.g., from remote.*.pushurl). Looking at
the remote's configured url is incorrect.
- incomplete; http-fetch doesn't have a remote, so passes
NULL. So http_init never gets to see the URL we are
actually going to use.
- cumbersome; http-push has a similar problem to
http-fetch, but actually builds a fake remote just to
pass in the URL.
Instead, let's just add a separate URL parameter to
http_init, and all three callsites can pass in the
appropriate information.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the HTTP connection is broken in the middle of a fetch or clone
body, the client presented a useless error message due to part of
the upload-pack->remote-curl pkt-line protocol leaking out of the
helper as the helper's "fetch result":
error: RPC failed; result=18, HTTP code = 200
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: unpack-objects failed
warning: https unexpectedly said: '0000'
Instead when the HTTP RPC fails discard all remaining data from
upload-pack and report nothing to the transport helper. Errors
were already sent to stderr.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit ffa69e61d3, reversing
changes made to 4a13c4d148.
Adding a new command line option to receive-pack and feed it from
send-pack is not an acceptable way to add features, as there is no
guarantee that your updated send-pack will be talking to updated
receive-pack. New features need to be added via the capability mechanism
negotiated over the protocol.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cb/maint-quiet-push:
receive-pack: do not overstep command line argument array
propagate --quiet to send-pack/receive-pack
Conflicts:
Documentation/git-receive-pack.txt
Documentation/git-send-pack.txt
* cb/maint-quiet-push:
receive-pack: do not overstep command line argument array
propagate --quiet to send-pack/receive-pack
Conflicts:
Documentation/git-receive-pack.txt
Documentation/git-send-pack.txt
* jc/zlib-wrap:
zlib: allow feeding more than 4GB in one go
zlib: zlib can only process 4GB at a time
zlib: wrap deflateBound() too
zlib: wrap deflate side of the API
zlib: wrap inflateInit2 used to accept only for gzip format
zlib: wrap remaining calls to direct inflate/inflateEnd
zlib wrapper: refactor error message formatter
* sr/transport-helper-fix: (21 commits)
transport-helper: die early on encountering deleted refs
transport-helper: implement marks location as capability
transport-helper: Use capname for refspec capability too
transport-helper: change import semantics
transport-helper: update ref status after push with export
transport-helper: use the new done feature where possible
transport-helper: check status code of finish_command
transport-helper: factor out push_update_refs_status
fast-export: support done feature
fast-import: introduce 'done' command
git-remote-testgit: fix error handling
git-remote-testgit: only push for non-local repositories
remote-curl: accept empty line as terminator
remote-helpers: export GIT_DIR variable to helpers
git_remote_helpers: push all refs during a non-local export
transport-helper: don't feed bogus refs to export push
git-remote-testgit: import non-HEAD refs
t5800: document some non-functional parts of remote helpers
t5800: use skip_all instead of prereq
t5800: factor out some ref tests
...
Currently, git push --quiet produces some non-error output, e.g.:
$ git push --quiet
Unpacking objects: 100% (3/3), done.
Add the --quiet option to send-pack/receive-pack and pass it to
unpack-objects in the receive-pack codepath and to receive-pack in
the push codepath.
This fixes a bug reported for the fedora git package:
https://bugzilla.redhat.com/show_bug.cgi?id=725593
Reported-by: Jesse Keating <jkeating@redhat.com>
Cc: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
doc/fast-import: clarify notemodify command
Documentation: minor grammatical fix in rev-list-options.txt
Documentation: git-filter-branch honors replacement refs
remote-curl: Add a format check to parsing of info/refs
git-config: Remove extra whitespaces
When parsing info/refs, no checks were applied that the file was in
the requried format. Since the file is read from a remote webserver,
this isn't guarenteed to be true. Add a check that the file at least
only contains lines that consist of 40 characters followed by a tab
and then the ref name.
Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching an http URL, we first try fetching info/refs
with an extra "service" parameter. This will work for a
smart-http server, or a dumb server which ignores extra
parameters when fetching files. If that fails, we retry
without the extra parameter to remain compatible with dumb
servers which didn't like our first request.
If the server returned a "401 Unauthorized", indicating that
the credentials we provided were not good, there is not much
point in retrying. With the current code, we just waste an
extra round trip to the HTTP server before failing.
But as the http code becomes smarter about throwing away
rejected credentials and re-prompting the user for new ones
(which it will later in this series), this will become more
confusing. At some point we will stop asking for credentials
to retry smart http, and will be asking for credentials to
retry dumb http. So now we're not only wasting an extra HTTP
round trip for something that is unlikely to work, but we're
making the user re-type their password for it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This went unnoticed because the transport helper infrastructore did
not check the return value of the helper, nor did the helper print
anything before exiting.
While at it also make sure that the stream doesn't end unexpectedly.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/zlib-wrap:
zlib: allow feeding more than 4GB in one go
zlib: zlib can only process 4GB at a time
zlib: wrap deflateBound() too
zlib: wrap deflate side of the API
zlib: wrap inflateInit2 used to accept only for gzip format
zlib: wrap remaining calls to direct inflate/inflateEnd
zlib wrapper: refactor error message formatter
Conflicts:
sha1_file.c
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.
But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.
In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.
Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().
There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Yes, these don't match perfectly with the void* first parameter of the
fread/fwrite in the standard library, but they do match the curl
expected method signature. This is needed when a refactor passes a
curl_write_callback around, which would otherwise give incorrect
parameter warnings.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
libcurl may choose to try and use Expect: 100-continue for
any type of POST, not just a Transfer: chunked-encoding type.
Force it to disable this feature, as not all proxy servers support
100-continue and leaving it enabled can cause 1 second stalls during
the negotiation phase of fetch-pack/upload-pack.
In ("206b099d26 smart-http: Don't use Expect: 100-Continue") we
tried to disable this for only large POST bodies, but it should be
disabled for every POST body.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some HTTP/1.1 servers or proxies don't correctly implement the
100-Continue feature of HTTP/1.1. Its a difficult feature to
implement right, and isn't commonly used by browsers, so many
developers may not even be aware that their server (or proxy)
doesn't honor it.
Within the smart HTTP protocol for Git we only use this newer
"Expect: 100-Continue" feature to probe for missing authentication
before uploading a large payload like a pack file during push.
If authentication is necessary, we expect the server to send the
401 Not Authorized response before the bulk data transfer starts,
thus saving the client bandwidth during the retry.
A different method to probe for working authentication is to send an
empty command list (that is just "0000") to $URL/git-receive-pack.
or $URL/git-upload-pack. All versions of both receive-pack and
upload-pack since the introduction of smart HTTP in Git 1.6.6
cleanly accept just a flush-pkt under --stateless-rpc mode, and
exit with success.
If HTTP level authentication is successful, the backend will return
an empty response, but with HTTP status code 200. This enables
the client to continue with the transfer.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the remote HTTP server fails (e.g. returns 404 or 500) when we
posted the RPC to it, we won't have sent anything to the background
Git process that is supposed to handle the stream. Because we
didn't send anything, its waiting for input from remote-curl, and
remote-curl cannot read its response payload because doing so would
lead to a deadlock.
Send the background task EOF on its input before we try to read
its response back, that way it will break out of its read loop
and terminate.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rc/maint-curl-helper:
remote-curl: ensure that URLs have a trailing slash
http: make end_url_with_slash() public
t5541-http-push: add test for URLs with trailing slash
Conflicts:
remote-curl.c
Previously, we blindly assumed that URLs passed to the remote-curl
helper did not end with a trailing slash.
Use the convenience function end_url_with_slash() from http.[ch] to
ensure that URLs have a trailing slash on invocation of the remote-curl
helper, and use the URL as one with a trailing slash throughout.
It is possible for users to pass a URL with a trailing slash to
remote-curl, by, say, setting it in remote.<name>.url in their git
config. The resulting requests have an empty path component (//) and may
break implementations of the http git protocol.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an HTTP request returns a 401, Git will currently fail with a
confusing message saying that it got a 401, which is not very
descriptive.
Currently if a user wants to use Git over HTTP, they have to use one
URL with the username in the URL (e.g. "http://user@host.com/repo.git")
for write access and another without the username for unauthenticated
read access (unless they want to be prompted for the password each
time). However, since the HTTP servers will return a 401 if an action
requires authentication, we can prompt for username and password if we
see this, allowing us to use a single URL for both purposes.
This patch changes http_request to prompt for the username and password,
then return HTTP_REAUTH so http_get_strbuf can try again. If it gets
a 401 even when a user/pass is supplied, http_request will now return
HTTP_NOAUTH which remote_curl can then use to display a more
intelligent error message that is less confusing.
Signed-off-by: Scott Chacon <schacon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tc/http-cleanup:
remote-curl: init walker only when needed
remote-curl: use http_fetch_ref() instead of walker wrapper
http: init and cleanup separately from http-walker
http-walker: cleanup more thoroughly
http-push: remove "|| 1" to enable verbose check
t554[01]-http-push: refactor, add non-ff tests
t5541-http-push: check that ref is unchanged for non-ff test
* sp/maint-push-sideband:
receive-pack: Send internal errors over side-band #2
t5401: Use a bare repository for the remote peer
receive-pack: Send hook output over side band #2
receive-pack: Wrap status reports inside side-band-64k
receive-pack: Refactor how capabilities are shown to the client
send-pack: demultiplex a sideband stream with status data
run-command: support custom fd-set in async
run-command: Allow stderr to be a caller supplied pipe
Conflicts:
builtin-receive-pack.c
run-command.c
t/t5401-update-hooks.sh
Invoke get_http_walker() only when fetching with the dumb protocol.
Additionally, add an invocation to walker_free() after we're done using
the walker.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The http-walker implementation of walker->fetch_ref() doesn't do
anything special compared to http_fetch_ref() anyway.
Remove init_walker() invocation before fetching the ref, since we aren't
using the walker wrapper and don't need a walker instance anymore.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, all our http operations were done with http-walker. With the
new remote-curl helper, we find ourselves using http methods outside of
http-walker - for example, fetching info/refs.
Accomodate this by separating http_init() and http_cleanup() invocations
from http-walker.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/maint-push-sideband:
receive-pack: Send hook output over side band #2
receive-pack: Wrap status reports inside side-band-64k
receive-pack: Refactor how capabilities are shown to the client
send-pack: demultiplex a sideband stream with status data
run-command: support custom fd-set in async
run-command: Allow stderr to be a caller supplied pipe
Update git fsck --full short description to mention packs
Conflicts:
run-command.c
This patch adds the possibility to supply a set of non-0 file
descriptors for async process communication instead of the
default-created pipe.
Additionally, we now support bi-directional communiction with the
async procedure, by giving the async function both read and write
file descriptors.
To retain compatiblity and similar "API feel" with start_command,
we require start_async callers to set .out = -1 to get a readable
file descriptor. If either of .in or .out is 0, we supply no file
descriptor to the async process.
[sp: Note: Erik started this patch, and a huge bulk of it is
his work. All bugs were introduced later by Shawn.]
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "info/refs" is a static file and not behind a CGI handler, some
servers may not handle a GET request for it with a query string
appended (eg. "?foo=bar") properly.
If such a request fails, retry it sans the query string. In addition,
ensure that the "smart" http protocol is not used (a service has to be
specified with "?service=<service name>" to be conformant).
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Reported-and-tested-by: Yaroslav Halchenko <debian@onerussian.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/symbol-static:
date.c: mark file-local function static
Replace parse_blob() with an explanatory comment
symlinks.c: remove unused functions
object.c: remove unused functions
strbuf.c: remove unused function
sha1_file.c: remove unused function
mailmap.c: remove unused function
utf8.c: mark file-local function static
submodule.c: mark file-local function static
quote.c: mark file-local function static
remote-curl.c: mark file-local function static
read-cache.c: mark file-local functions static
parse-options.c: mark file-local function static
entry.c: mark file-local function static
http.c: mark file-local functions static
pretty.c: mark file-local function static
builtin-rev-list.c: mark file-local function static
bisect.c: mark file-local function static
* maint:
remote-curl: Fix Accept header for smart HTTP connections
grep: -L should show empty files
rebase--interactive: Ignore comments and blank lines in peek_next_command
We actually expect to see an application/x-git-upload-pack-result
but we lied and said we Accept *-response. This was a typo on my
part when I was writing the code.
Fortunately the wrong Accept header had no real impact, as the
deployed git-http-backend servers were not testing the Accept
header before they returned their content.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using multi-pass authentication methods, the curl library may need
to rewind the read buffers used for providing data to HTTP POST, if data
has been output before a 401 error is received.
This is needed only when the first request (when the multi-pass
authentication method isn't initialized and hasn't received its challenge
yet) for a certain curl session is a chunked HTTP POST.
As long as the current rpc read buffer is the first one, we're able to
rewind without need for additional buffering.
The curl library currently starts sending data without waiting for a
response to the Expect: 100-continue header, due to a bug in curl that
exists up to curl version 7.19.7.
If the HTTP server doesn't handle Expect: 100-continue headers properly
(e.g. Lighttpd), the library has to start sending data without knowing
if the request will be successfully authenticated. In this case, this
rewinding solution is not sufficient - the whole request will be sent
before the 401 error is received.
Signed-off-by: Martin Storsjo <martin@martin.st>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the extraneous semicolon (';') at the end of the if statement
that allowed the code in its block to execute regardless of the
condition.
This fixes pushing to a smart http backend with chunked encoding.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This works around a bug in curl versions up to 7.19.4, where disabling the
CURLOPT_NOBODY option sets the internal state incorrectly considering that
CURLOPT_PUT was enabled earlier.
The bug is discussed at http://curl.haxx.se/bug/view.cgi?id=2727981 and is
corrected in the latest version of curl in CVS.
This bug usually has no impact on git, but may surface if using multi-pass
authentication methods.
Signed-off-by: Martin Storsjo <martin@martin.st>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/smart-http: (37 commits)
http-backend: Let gcc check the format of more printf-type functions.
http-backend: Fix access beyond end of string.
http-backend: Fix bad treatment of uintmax_t in Content-Length
t5551-http-fetch: Work around broken Accept header in libcurl
t5551-http-fetch: Work around some libcurl versions
http-backend: Protect GIT_PROJECT_ROOT from /../ requests
Git-aware CGI to provide dumb HTTP transport
http-backend: Test configuration options
http-backend: Use http.getanyfile to disable dumb HTTP serving
test smart http fetch and push
http tests: use /dumb/ URL prefix
set httpd port before sourcing lib-httpd
t5540-http-push: remove redundant fetches
Smart HTTP fetch: gzip requests
Smart fetch over HTTP: client side
Smart push over HTTP: client side
Discover refs via smart HTTP server when available
http-backend: more explict LocationMatch
http-backend: add example for gitweb on same URL
http-backend: use mod_alias instead of mod_rewrite
...
Conflicts:
.gitignore
remote-curl.c
The git-remote-curl backend detects if the remote server supports
the git-upload-pack service, and if so, runs git-fetch-pack locally
in a pipe to generate the want/have commands.
The advertisements from the server that were obtained during the
discovery are passed into git-fetch-pack before the POST request
starts, permitting server capability discovery and enablement.
Common objects that are discovered are appended onto the request as
have lines and are sent again on the next request. This allows the
remote side to reinitialize its in-memory list of common objects
during the next request.
Because all requests are relatively short, below git-remote-curl's
1 MiB buffer limit, requests will use the standard Content-Length
header and be valid HTTP/1.0 POST requests. This makes the fetch
client more tolerant of proxy servers which don't support HTTP/1.1
or the chunked transfer encoding.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-remote-curl backend detects if the remote server supports
the git-receive-pack service, and if so, runs git-send-pack in a
pipe to dump the command and pack data as a single POST request.
The advertisements from the server that were obtained during the
discovery are passed into git-send-pack before the POST request
starts. This permits git-send-pack to operate largely unmodified.
For smaller packs (those under 1 MiB) a HTTP/1.0 POST with a
Content-Length is used, permitting interaction with any server.
The 1 MiB limit is arbitrary, but is sufficent to fit most deltas
created by human authors against text sources with the occasional
small binary file (e.g. few KiB icon image). The configuration
option http.postBuffer can be used to increase (or shink) this
buffer if the default is not sufficient.
For larger packs which cannot be spooled entirely into the helper's
memory space (due to http.postBuffer being too small), the POST
request requires HTTP/1.1 and sets "Transfer-Encoding: chunked".
This permits the client to upload an unknown amount of data in one
HTTP transaction without needing to pregenerate the entire pack
file locally.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The upload-pack requests are mostly plain text and they compress
rather well. Deflating them with Content-Encoding: gzip can easily
drop the size of the request by 50%, reducing the amount of data
to transfer as we negotiate the common commits.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of loading the cached info/refs, try to use the smart HTTP
version when the server supports it. Since the smart variant is
actually the pkt-line stream from the start of either upload-pack
or receive-pack we need to parse these through get_remote_heads,
which requires a background thread to feed its pipe.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's okay to use the curl helper without a local repository, so long
as you don't use "fetch". There aren't any git programs that would try
to use it, and it doesn't make sense to try it (since there's nowhere
to write the results), but we may as well be clear.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The remote helper interface now supports the push capability,
which can be used to ask the implementation to push one or more
specs to the remote repository. For remote-curl we implement this
by calling the existing WebDAV based git-http-push executable.
Internally the helper interface uses the push_refs transport hook
so that the complexity of the refspec parsing and matching can be
reused between remote implementations. When possible however the
helper protocol uses source ref name rather than the source SHA-1,
thereby allowing the helper to access this name if it is useful.
>From Clemens Buchacher <drizzd@aon.at>:
update http tests according to remote-curl capabilities
o Pushing packed refs is now fixed.
o The transport helper fails if refs are already up-to-date. Add
a test for that.
o The transport helper will notice if refs are already
up-to-date. We therefore need to update server info in the
unpacked-refs test.
o The transport helper will purge deleted branches automatically.
o Use a variable ($ORIG_HEAD) instead of full SHA-1 name.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Daniel Barkalow <barkalow@iabervon.org>
CC: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some transports, like the native pack transport implemented by
fetch-pack, support useful features like depth or include tags.
These should be exposed if the underlying helper knows how to
use them.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some network protocols (e.g. native git://) are able to fetch more
than one ref at a time and reduce the overall transfer cost by
combining the requests into a single exchange. Instead of feeding
each fetch request one at a time to the helper, feed all of them
at once so the helper can decide whether or not it should batch them.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will need the walker, url and remote in other functions as the
code grows larger to support smart HTTP. Extract this out into a
set of globals we can easily reference once configured.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use -b <branch>, we may checkout something else than what the
remote's HEAD references, but we still used remote_head to supply the
new ref value to the post-checkout hook, which is wrong.
So instead of using remote_head to find the value to be passed to the
post-checkout hook, we have to use our_head_points_at, which is always
correctly setup, even if -b is not used.
This also fixes a segfault when "clone -b <branch>" is used with a
remote repo that doesn't have a valid HEAD, as in such a case
remote_head is NULL, but we still tried to access it.
Reported-by: Devin Cofer <ranguvar@archlinux.us>
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All programs, in particular also the stand-alone programs (non-builtins)
must call git_extract_argv0_path(argv[0]) in order to help builds that
derive the installation prefix at runtime, such as the MinGW build.
Without this call, the program segfaults (or raises an assertion
failure).
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Tested-by: Michael Wookey <michaelwookey@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the transport native helper mechanism to fetch by http (and ftp, etc).
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>