Communication between the HTTP server and http_backend process can
lead to a dead-lock when relaying a large ref negotiation request.
Diagnose the situation better, and mitigate it by reading such a
request first into core (to a reasonable limit).
* jk/http-backend-deadlock:
http-backend: spool ref negotiation requests to buffer
t5551: factor out tag creation
http-backend: fix die recursion with custom handler
When http-backend spawns "upload-pack" to do ref
negotiation, it streams the http request body to
upload-pack, who then streams the http response back to the
client as it reads. In theory, git can go full-duplex; the
client can consume our response while it is still sending
the request. In practice, however, HTTP is a half-duplex
protocol. Even if our client is ready to read and write
simultaneously, we may have other HTTP infrastructure in the
way, including the webserver that spawns our CGI, or any
intermediate proxies.
In at least one documented case[1], this leads to deadlock
when trying a fetch over http. What happens is basically:
1. Apache proxies the request to the CGI, http-backend.
2. http-backend gzip-inflates the data and sends
the result to upload-pack.
3. upload-pack acts on the data and generates output over
the pipe back to Apache. Apache isn't reading because
it's busy writing (step 1).
This works fine most of the time, because the upload-pack
output ends up in a system pipe buffer, and Apache reads
it as soon as it finishes writing. But if both the request
and the response exceed the system pipe buffer size, then we
deadlock (Apache blocks writing to http-backend,
http-backend blocks writing to upload-pack, and upload-pack
blocks writing to Apache).
We need to break the deadlock by spooling either the input
or the output. In this case, it's ideal to spool the input,
because Apache does not start reading either stdout _or_
stderr until we have consumed all of the input. So until we
do so, we cannot even get an error message out to the
client.
The solution is fairly straight-forward: we read the request
body into an in-memory buffer in http-backend, freeing up
Apache, and then feed the data ourselves to upload-pack. But
there are a few important things to note:
1. We limit the in-memory buffer to prevent an obvious
denial-of-service attack. This is a new hard limit on
requests, but it's unlikely to come into play. The
default value is 10MB, which covers even the ridiculous
100,000-ref negotation in the included test (that
actually caps out just over 5MB). But it's configurable
on the off chance that you don't mind spending some
extra memory to make even ridiculous requests work.
2. We must take care only to buffer when we have to. For
pushes, the incoming packfile may be of arbitrary
size, and we should connect the input directly to
receive-pack. There's no deadlock problem here, though,
because we do not produce any output until the whole
packfile has been read.
For upload-pack's initial ref advertisement, we
similarly do not need to buffer. Even though we may
generate a lot of output, there is no request body at
all (i.e., it is a GET, not a POST).
[1] http://article.gmane.org/gmane.comp.version-control.git/269020
Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes sure that AsciiDoc does not turn them into links.
Regular AsciiDoc does not catch these cases, but AsciiDoctor
does treat them as links.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We decided at 48bb914e (doc: drop author/documentation sections from
most pages, 2011-03-11) to remove "author" and "documentation"
sections from our documentation. Remove a few stragglers.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When setting up a "half-auth" repository in which reads can
be done anonymously but writes require authentication, it is
best if the server can require authentication for both the
ref advertisement and the actual receive-pack POSTs. This
alleviates the need for the admin to set http.receivepack in
the repositories, and means that the client is challenged
for credentials immediately, instead of partway through the
push process (and git clients older than v1.7.11.7 had
trouble handling these challenges).
Since detecting a push during the ref advertisement requires
matching the query string, and this is non-trivial to do in
Apache, we have traditionally punted and instructed users to
just protect "/git-receive-pack$". This patch provides the
mod_rewrite recipe to actually match the ref advertisement,
which is preferred.
While we're at it, let's add the recipe to our test scripts
so that we can be sure that it works, and doesn't get broken
(either by our changes or by changes in Apache).
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The examples in the documentation are all for Apache. Let's
at least cover the basics: an anonymous server, an
authenticated server, and a "half auth" server with
anonymous read and authenticated write.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the http-backend is set up to allow anonymous read but
authenticated write, the http-backend manual suggests
catching only the "/git-receive-pack" POST of the packfile,
not the initial "info/refs?service=git-receive-pack" GET in
which we advertise refs.
This does work and is secure, as we do not allow any write
during the info/refs request, and the information in the ref
advertisement is the same that you would get from a fetch.
However, the configuration required by the server is
slightly more complex. The default `http.receivepack`
setting is to allow pushes if the webserver tells us that
the user authenticated, and otherwise to return a 403
("Forbidden"). That works fine if authentication is turned
on completely; the initial request requires authentication,
and http-backend realizes it is OK to do a push.
But for this "half-auth" state, no authentication has
occurred during the initial ref advertisement. The
http-backend CGI therefore does not think that pushing
should be enabled, and responds with a 403. The client
cannot continue, even though the server would have allowed
it to run if it had provided credentials.
It would be much better if the server responded with a 401,
asking for credentials during the initial contact. But
git-http-backend does not know about the server's auth
configuration (so a 401 would be confusing in the case of a
true anonymous server). Unfortunately, configuring Apache to
recognize the query string and apply the auth appropriately
to receive-pack (but not upload-pack) initial requests is
non-trivial.
The site admin can work around this by just turning on
http.receivepack explicitly in its repositories. Let's
document this workaround.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document the namespace mechanism in a new gitnamespaces(7) page.
Reference it from receive-pack and upload-pack.
Document the new --namespace option and GIT_NAMESPACE environment
variable in git(1), and reference gitnamespaces(7).
Add a sample Apache configuration to http-backend(1) to support
namespaced repositories, and reference gitnamespaces(7).
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the description of http.getanyfile, replace the vague "older Git
clients" with the earliest release whose client is able to use the
upload pack service.
Signed-off-by: Greg Bacon <gbacon@dbresearch.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* remotes/trast-doc/for-next:
Documentation: spell 'git cmd' without dash throughout
Documentation: format full commands in typewriter font
Documentation: warn prominently against merging with dirty trees
Documentation/git-merge: reword references to "remote" and "pull"
Conflicts:
Documentation/config.txt
Documentation/git-config.txt
Documentation/git-merge.txt
The documentation was quite inconsistent when spelling 'git cmd' if it
only refers to the program, not to some specific invocation syntax:
both 'git-cmd' and 'git cmd' spellings exist.
The current trend goes towards dashless forms, and there is precedent
in 647ac70 (git-svn.txt: stop using dash-form of commands.,
2009-07-07) to actively eliminate the dashed variants.
Replace 'git-cmd' with 'git cmd' throughout, except where git-shell,
git-cvsserver, git-upload-pack, git-receive-pack, and
git-upload-archive are concerned, because those really live in the
$PATH.
Similar to how git-daemon checks whether a repository is OK to be
exported, smart-http should also check. This check can be satisfied
in two different ways: the environmental variable GIT_HTTP_EXPORT_ALL
may be set to export all repositories, or the individual repository
may have the file git-daemon-export-ok.
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Tarmigan Casebolt <tarmigan+git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some repository owners may wish to enable smart HTTP, but disallow
dumb content serving. Disallowing dumb serving might be because
the owners want to rely upon reachability to control which objects
clients may access from the repository, or they just want to
encourage clients to use the more bandwidth efficient transport.
If http.getanyfile is set to false the backend CGI will return with
'403 Forbidden' when an object file is accessed by a dumb client.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the git-http-backend examples, only match git-receive-pack within
/git/.
Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the git-http-backend documentation, add an example of how to set up
gitweb and git-http-backend on the same URL by using a series of
mod_alias commands.
Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the git-http-backend documentation, use mod_alias exlusively, instead
of using a combination of mod_alias and mod_rewrite. This makes the
example slightly shorted and a bit more clear.
Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify some of the git-http-backend documentation, particularly:
* In the Description, state that smart/dumb HTTP fetch and smart HTTP
push are supported, state that authenticated clients allow push, and
remove the note that this is only suited for read-only updates.
* At the start of Examples, state explicitly what URL is mapping to what
location on disk.
Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new environment variable, GIT_PROJECT_ROOT, to override the
method of using PATH_TRANSLATED to find the git repository on disk.
This makes it much easier to configure the web server, especially when
the web server's DocumentRoot does not contain the git repositories,
which is the usual case.
Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Requests for $GIT_URL/git-receive-pack and $GIT_URL/git-upload-pack
are forwarded to the corresponding backend process by directly
executing it and leaving stdin and stdout connected to the invoking
web server. Prior to starting the backend process the HTTP response
headers are sent, thereby freeing the backend from needing to know
about the HTTP protocol.
Requests that are encoded with Content-Encoding: gzip are
automatically inflated before being streamed into the backend.
This is primarily useful for the git-upload-pack backend, which
receives highly repetitive text data from clients that easily
compresses to 50% of its original size.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-http-backend CGI can be configured into any Apache server
using ScriptAlias, such as with the following configuration:
LoadModule cgi_module /usr/libexec/apache2/mod_cgi.so
LoadModule alias_module /usr/libexec/apache2/mod_alias.so
ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/
Repositories are accessed via the translated PATH_INFO.
The CGI is backwards compatible with the dumb client, allowing all
older HTTP clients to continue to download repositories which are
managed by the CGI.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>