1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-01 06:47:52 +01:00
git/http-backend.c

704 lines
16 KiB
C
Raw Normal View History

#include "cache.h"
#include "refs.h"
#include "pkt-line.h"
#include "object.h"
#include "tag.h"
#include "exec_cmd.h"
#include "run-command.h"
#include "string-list.h"
#include "url.h"
#include "argv-array.h"
static const char content_type[] = "Content-Type";
static const char content_length[] = "Content-Length";
static const char last_modified[] = "Last-Modified";
static int getanyfile = 1;
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
static unsigned long max_request_buffer = 10 * 1024 * 1024;
static struct string_list *query_params;
struct rpc_service {
const char *name;
const char *config_name;
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
unsigned buffer_input : 1;
signed enabled : 2;
};
static struct rpc_service rpc_service[] = {
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
{ "upload-pack", "uploadpack", 1, 1 },
{ "receive-pack", "receivepack", 0, -1 },
};
static struct string_list *get_parameters(void)
{
if (!query_params) {
const char *query = getenv("QUERY_STRING");
query_params = xcalloc(1, sizeof(*query_params));
while (query && *query) {
char *name = url_decode_parameter_name(&query);
char *value = url_decode_parameter_value(&query);
struct string_list_item *i;
i = string_list_lookup(query_params, name);
if (!i)
i = string_list_insert(query_params, name);
else
free(i->util);
i->util = value;
}
}
return query_params;
}
static const char *get_parameter(const char *name)
{
struct string_list_item *i;
i = string_list_lookup(get_parameters(), name);
return i ? i->util : NULL;
}
__attribute__((format (printf, 2, 3)))
static void format_write(int fd, const char *fmt, ...)
{
static char buffer[1024];
va_list args;
unsigned n;
va_start(args, fmt);
n = vsnprintf(buffer, sizeof(buffer), fmt, args);
va_end(args);
if (n >= sizeof(buffer))
die("protocol error: impossibly long line");
write_or_die(fd, buffer, n);
}
static void http_status(unsigned code, const char *msg)
{
format_write(1, "Status: %u %s\r\n", code, msg);
}
static void hdr_str(const char *name, const char *value)
{
format_write(1, "%s: %s\r\n", name, value);
}
static void hdr_int(const char *name, uintmax_t value)
{
format_write(1, "%s: %" PRIuMAX "\r\n", name, value);
}
static void hdr_date(const char *name, unsigned long when)
{
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25 18:55:02 +02:00
const char *value = show_date(when, 0, DATE_MODE(RFC2822));
hdr_str(name, value);
}
static void hdr_nocache(void)
{
hdr_str("Expires", "Fri, 01 Jan 1980 00:00:00 GMT");
hdr_str("Pragma", "no-cache");
hdr_str("Cache-Control", "no-cache, max-age=0, must-revalidate");
}
static void hdr_cache_forever(void)
{
unsigned long now = time(NULL);
hdr_date("Date", now);
hdr_date("Expires", now + 31536000);
hdr_str("Cache-Control", "public, max-age=31536000");
}
static void end_headers(void)
{
write_or_die(1, "\r\n", 2);
}
__attribute__((format (printf, 1, 2)))
static NORETURN void not_found(const char *err, ...)
{
va_list params;
http_status(404, "Not Found");
hdr_nocache();
end_headers();
va_start(params, err);
if (err && *err)
vfprintf(stderr, err, params);
va_end(params);
exit(0);
}
__attribute__((format (printf, 1, 2)))
static NORETURN void forbidden(const char *err, ...)
{
va_list params;
http_status(403, "Forbidden");
hdr_nocache();
end_headers();
va_start(params, err);
if (err && *err)
vfprintf(stderr, err, params);
va_end(params);
exit(0);
}
static void select_getanyfile(void)
{
if (!getanyfile)
forbidden("Unsupported service: getanyfile");
}
static void send_strbuf(const char *type, struct strbuf *buf)
{
hdr_int(content_length, buf->len);
hdr_str(content_type, type);
end_headers();
write_or_die(1, buf->buf, buf->len);
}
static void send_local_file(const char *the_type, const char *name)
{
char *p = git_pathdup("%s", name);
size_t buf_alloc = 8192;
char *buf = xmalloc(buf_alloc);
int fd;
struct stat sb;
fd = open(p, O_RDONLY);
if (fd < 0)
not_found("Cannot open '%s': %s", p, strerror(errno));
if (fstat(fd, &sb) < 0)
die_errno("Cannot stat '%s'", p);
hdr_int(content_length, sb.st_size);
hdr_str(content_type, the_type);
hdr_date(last_modified, sb.st_mtime);
end_headers();
for (;;) {
ssize_t n = xread(fd, buf, buf_alloc);
if (n < 0)
die_errno("Cannot read '%s'", p);
if (!n)
break;
write_or_die(1, buf, n);
}
close(fd);
free(buf);
free(p);
}
static void get_text_file(char *name)
{
select_getanyfile();
hdr_nocache();
send_local_file("text/plain", name);
}
static void get_loose_object(char *name)
{
select_getanyfile();
hdr_cache_forever();
send_local_file("application/x-git-loose-object", name);
}
static void get_pack_file(char *name)
{
select_getanyfile();
hdr_cache_forever();
send_local_file("application/x-git-packed-objects", name);
}
static void get_idx_file(char *name)
{
select_getanyfile();
hdr_cache_forever();
send_local_file("application/x-git-packed-objects-toc", name);
}
static void http_config(void)
{
int i, value = 0;
struct strbuf var = STRBUF_INIT;
git_config_get_bool("http.getanyfile", &getanyfile);
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
git_config_get_ulong("http.maxrequestbuffer", &max_request_buffer);
for (i = 0; i < ARRAY_SIZE(rpc_service); i++) {
struct rpc_service *svc = &rpc_service[i];
strbuf_addf(&var, "http.%s", svc->config_name);
if (!git_config_get_bool(var.buf, &value))
svc->enabled = value;
strbuf_reset(&var);
}
strbuf_release(&var);
}
static struct rpc_service *select_service(const char *name)
{
const char *svc_name;
struct rpc_service *svc = NULL;
int i;
if (!skip_prefix(name, "git-", &svc_name))
forbidden("Unsupported service: '%s'", name);
for (i = 0; i < ARRAY_SIZE(rpc_service); i++) {
struct rpc_service *s = &rpc_service[i];
if (!strcmp(s->name, svc_name)) {
svc = s;
break;
}
}
if (!svc)
forbidden("Unsupported service: '%s'", name);
if (svc->enabled < 0) {
const char *user = getenv("REMOTE_USER");
svc->enabled = (user && *user) ? 1 : 0;
}
if (!svc->enabled)
forbidden("Service not enabled: '%s'", svc->name);
return svc;
}
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
/*
* This is basically strbuf_read(), except that if we
* hit max_request_buffer we die (we'd rather reject a
* maliciously large request than chew up infinite memory).
*/
static ssize_t read_request(int fd, unsigned char **out)
{
size_t len = 0, alloc = 8192;
unsigned char *buf = xmalloc(alloc);
if (max_request_buffer < alloc)
max_request_buffer = alloc;
while (1) {
ssize_t cnt;
cnt = read_in_full(fd, buf + len, alloc - len);
if (cnt < 0) {
free(buf);
return -1;
}
/* partial read from read_in_full means we hit EOF */
len += cnt;
if (len < alloc) {
*out = buf;
return len;
}
/* otherwise, grow and try again (if we can) */
if (alloc == max_request_buffer)
die("request was larger than our maximum size (%lu);"
" try setting GIT_HTTP_MAX_REQUEST_BUFFER",
max_request_buffer);
alloc = alloc_nr(alloc);
if (alloc > max_request_buffer)
alloc = max_request_buffer;
REALLOC_ARRAY(buf, alloc);
}
}
static void inflate_request(const char *prog_name, int out, int buffer_input)
{
2011-06-10 20:52:15 +02:00
git_zstream stream;
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
unsigned char *full_request = NULL;
unsigned char in_buf[8192];
unsigned char out_buf[8192];
unsigned long cnt = 0;
memset(&stream, 0, sizeof(stream));
git_inflate_init_gzip_only(&stream);
while (1) {
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
ssize_t n;
if (buffer_input) {
if (full_request)
n = 0; /* nothing left to read */
else
n = read_request(0, &full_request);
stream.next_in = full_request;
} else {
n = xread(0, in_buf, sizeof(in_buf));
stream.next_in = in_buf;
}
if (n <= 0)
die("request ended in the middle of the gzip stream");
stream.avail_in = n;
while (0 < stream.avail_in) {
int ret;
stream.next_out = out_buf;
stream.avail_out = sizeof(out_buf);
ret = git_inflate(&stream, Z_NO_FLUSH);
if (ret != Z_OK && ret != Z_STREAM_END)
die("zlib error inflating request, result %d", ret);
n = stream.total_out - cnt;
if (write_in_full(out, out_buf, n) != n)
die("%s aborted reading request", prog_name);
cnt += n;
if (ret == Z_STREAM_END)
goto done;
}
}
done:
git_inflate_end(&stream);
close(out);
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
free(full_request);
}
static void copy_request(const char *prog_name, int out)
{
unsigned char *buf;
ssize_t n = read_request(0, &buf);
if (n < 0)
die_errno("error reading request body");
if (write_in_full(out, buf, n) != n)
die("%s aborted reading request", prog_name);
close(out);
free(buf);
}
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
static void run_service(const char **argv, int buffer_input)
{
const char *encoding = getenv("HTTP_CONTENT_ENCODING");
const char *user = getenv("REMOTE_USER");
const char *host = getenv("REMOTE_ADDR");
int gzipped_request = 0;
struct child_process cld = CHILD_PROCESS_INIT;
if (encoding && !strcmp(encoding, "gzip"))
gzipped_request = 1;
else if (encoding && !strcmp(encoding, "x-gzip"))
gzipped_request = 1;
if (!user || !*user)
user = "anonymous";
if (!host || !*host)
host = "(none)";
if (!getenv("GIT_COMMITTER_NAME"))
argv_array_pushf(&cld.env_array, "GIT_COMMITTER_NAME=%s", user);
if (!getenv("GIT_COMMITTER_EMAIL"))
argv_array_pushf(&cld.env_array,
"GIT_COMMITTER_EMAIL=%s@http.%s", user, host);
cld.argv = argv;
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
if (buffer_input || gzipped_request)
cld.in = -1;
cld.git_cmd = 1;
if (start_command(&cld))
exit(1);
close(1);
if (gzipped_request)
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
inflate_request(argv[0], cld.in, buffer_input);
else if (buffer_input)
copy_request(argv[0], cld.in);
else
close(0);
if (finish_command(&cld))
exit(1);
}
static int show_text_ref(const char *name, const struct object_id *oid,
int flag, void *cb_data)
{
const char *name_nons = strip_namespace(name);
struct strbuf *buf = cb_data;
struct object *o = parse_object(oid->hash);
if (!o)
return 0;
strbuf_addf(buf, "%s\t%s\n", oid_to_hex(oid), name_nons);
if (o->type == OBJ_TAG) {
o = deref_tag(o, name, 0);
if (!o)
return 0;
strbuf_addf(buf, "%s\t%s^{}\n", sha1_to_hex(o->sha1),
name_nons);
}
return 0;
}
static void get_info_refs(char *arg)
{
const char *service_name = get_parameter("service");
struct strbuf buf = STRBUF_INIT;
hdr_nocache();
if (service_name) {
const char *argv[] = {NULL /* service name */,
"--stateless-rpc", "--advertise-refs",
".", NULL};
struct rpc_service *svc = select_service(service_name);
strbuf_addf(&buf, "application/x-git-%s-advertisement",
svc->name);
hdr_str(content_type, buf.buf);
end_headers();
packet_write(1, "# service=git-%s\n", svc->name);
packet_flush(1);
argv[0] = svc->name;
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
run_service(argv, 0);
} else {
select_getanyfile();
for_each_namespaced_ref(show_text_ref, &buf);
send_strbuf("text/plain", &buf);
}
strbuf_release(&buf);
}
static int show_head_ref(const char *refname, const struct object_id *oid,
int flag, void *cb_data)
{
struct strbuf *buf = cb_data;
if (flag & REF_ISSYMREF) {
struct object_id unused;
const char *target = resolve_ref_unsafe(refname,
RESOLVE_REF_READING,
unused.hash, NULL);
const char *target_nons = strip_namespace(target);
strbuf_addf(buf, "ref: %s\n", target_nons);
} else {
strbuf_addf(buf, "%s\n", oid_to_hex(oid));
}
return 0;
}
static void get_head(char *arg)
{
struct strbuf buf = STRBUF_INIT;
select_getanyfile();
head_ref_namespaced(show_head_ref, &buf);
send_strbuf("text/plain", &buf);
strbuf_release(&buf);
}
static void get_info_packs(char *arg)
{
size_t objdirlen = strlen(get_object_directory());
struct strbuf buf = STRBUF_INIT;
struct packed_git *p;
size_t cnt = 0;
select_getanyfile();
prepare_packed_git();
for (p = packed_git; p; p = p->next) {
if (p->pack_local)
cnt++;
}
strbuf_grow(&buf, cnt * 53 + 2);
for (p = packed_git; p; p = p->next) {
if (p->pack_local)
strbuf_addf(&buf, "P %s\n", p->pack_name + objdirlen + 6);
}
strbuf_addch(&buf, '\n');
hdr_nocache();
send_strbuf("text/plain; charset=utf-8", &buf);
strbuf_release(&buf);
}
static void check_content_type(const char *accepted_type)
{
const char *actual_type = getenv("CONTENT_TYPE");
if (!actual_type)
actual_type = "";
if (strcmp(actual_type, accepted_type)) {
http_status(415, "Unsupported Media Type");
hdr_nocache();
end_headers();
format_write(1,
"Expected POST with Content-Type '%s',"
" but received '%s' instead.\n",
accepted_type, actual_type);
exit(0);
}
}
static void service_rpc(char *service_name)
{
const char *argv[] = {NULL, "--stateless-rpc", ".", NULL};
struct rpc_service *svc = select_service(service_name);
struct strbuf buf = STRBUF_INIT;
strbuf_reset(&buf);
strbuf_addf(&buf, "application/x-git-%s-request", svc->name);
check_content_type(buf.buf);
hdr_nocache();
strbuf_reset(&buf);
strbuf_addf(&buf, "application/x-git-%s-result", svc->name);
hdr_str(content_type, buf.buf);
end_headers();
argv[0] = svc->name;
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
run_service(argv, svc->buffer_input);
strbuf_release(&buf);
}
2015-05-15 08:29:27 +02:00
static int dead;
static NORETURN void die_webcgi(const char *err, va_list params)
{
2015-05-15 08:29:27 +02:00
if (dead <= 1) {
vreportf("fatal: ", err, params);
http_status(500, "Internal Server Error");
hdr_nocache();
end_headers();
}
exit(0); /* we successfully reported a failure ;-) */
}
2015-05-15 08:29:27 +02:00
static int die_webcgi_recursing(void)
{
return dead++ > 1;
}
static char* getdir(void)
{
struct strbuf buf = STRBUF_INIT;
char *pathinfo = getenv("PATH_INFO");
char *root = getenv("GIT_PROJECT_ROOT");
char *path = getenv("PATH_TRANSLATED");
if (root && *root) {
if (!pathinfo || !*pathinfo)
die("GIT_PROJECT_ROOT is set but PATH_INFO is not");
if (daemon_avoid_alias(pathinfo))
die("'%s': aliased", pathinfo);
end_url_with_slash(&buf, root);
if (pathinfo[0] == '/')
pathinfo++;
strbuf_addstr(&buf, pathinfo);
return strbuf_detach(&buf, NULL);
} else if (path && *path) {
return xstrdup(path);
} else
die("No GIT_PROJECT_ROOT or PATH_TRANSLATED from server");
return NULL;
}
static struct service_cmd {
const char *method;
const char *pattern;
void (*imp)(char *);
} services[] = {
{"GET", "/HEAD$", get_head},
{"GET", "/info/refs$", get_info_refs},
{"GET", "/objects/info/alternates$", get_text_file},
{"GET", "/objects/info/http-alternates$", get_text_file},
{"GET", "/objects/info/packs$", get_info_packs},
{"GET", "/objects/[0-9a-f]{2}/[0-9a-f]{38}$", get_loose_object},
{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.pack$", get_pack_file},
{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
{"POST", "/git-upload-pack$", service_rpc},
{"POST", "/git-receive-pack$", service_rpc}
};
int main(int argc, char **argv)
{
char *method = getenv("REQUEST_METHOD");
char *dir;
struct service_cmd *cmd = NULL;
char *cmd_arg = NULL;
int i;
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 00:14:42 +01:00
git_setup_gettext();
git_extract_argv0_path(argv[0]);
set_die_routine(die_webcgi);
2015-05-15 08:29:27 +02:00
set_die_is_recursing_routine(die_webcgi_recursing);
if (!method)
die("No REQUEST_METHOD from server");
if (!strcmp(method, "HEAD"))
method = "GET";
dir = getdir();
for (i = 0; i < ARRAY_SIZE(services); i++) {
struct service_cmd *c = &services[i];
regex_t re;
regmatch_t out[1];
if (regcomp(&re, c->pattern, REG_EXTENDED))
die("Bogus regex in service table: %s", c->pattern);
if (!regexec(&re, dir, 1, out, 0)) {
size_t n;
if (strcmp(method, c->method)) {
const char *proto = getenv("SERVER_PROTOCOL");
if (proto && !strcmp(proto, "HTTP/1.1")) {
http_status(405, "Method Not Allowed");
hdr_str("Allow", !strcmp(c->method, "GET") ?
"GET, HEAD" : c->method);
} else
http_status(400, "Bad Request");
hdr_nocache();
end_headers();
return 0;
}
cmd = c;
n = out[0].rm_eo - out[0].rm_so;
cmd_arg = xmemdupz(dir + out[0].rm_so + 1, n - 1);
dir[out[0].rm_so] = 0;
break;
}
regfree(&re);
}
if (!cmd)
not_found("Request not supported: '%s'", dir);
setup_path();
if (!enter_repo(dir, 0))
not_found("Not a git repository: '%s'", dir);
if (!getenv("GIT_HTTP_EXPORT_ALL") &&
access("git-daemon-export-ok", F_OK) )
not_found("Repository not exported: '%s'", dir);
http_config();
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
max_request_buffer = git_env_ulong("GIT_HTTP_MAX_REQUEST_BUFFER",
max_request_buffer);
cmd->imp(cmd_arg);
return 0;
}