The smudge/clean filter API expect an external process is spawned
to filter the contents for each path that has a filter defined. A
new type of "process" filter API has been added to allow the first
request to run the filter for a path to spawn a single process, and
all filtering need is served by this single process for multiple
paths, reducing the process creation overhead.
* ls/filter-process:
contrib/long-running-filter: add long running filter example
convert: add filter.<driver>.process option
convert: prepare filter.<driver>.process option
convert: make apply_filter() adhere to standard Git error handling
pkt-line: add functions to read/write flush terminated packet streams
pkt-line: add packet_write_gently()
pkt-line: add packet_flush_gently()
pkt-line: add packet_write_fmt_gently()
pkt-line: extract set_packet_header()
pkt-line: rename packet_write() to packet_write_fmt()
run-command: add clean_on_exit_handler
run-command: move check_pipe() from write_or_die to run_command
convert: modernize tests
convert: quote filter names in error messages
"git daemon" used fixed-length buffers to turn URL to the
repository the client asked for into the server side directory
path, using snprintf() to avoid overflowing these buffers, but
allowed possibly truncated paths to the directory. This has been
tightened to reject such a request that causes overlong path to be
required to serve.
* jk/daemon-path-ok-check-truncation:
daemon: detect and reject too-long paths
When we are checking the path via path_ok(), we use some
fixed PATH_MAX buffers. We write into them via snprintf(),
so there's no possibility of overflow, but it does mean we
may silently truncate the path, leading to potentially
confusing errors when the partial path does not exist.
We're better off to reject the path explicitly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.
packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent update to "git daemon" tries to enable the socket-level
KEEPALIVE, but when it is spawned via inetd, the standard input
file descriptor may not necessarily be connected to a socket.
Suppress an ENOTSOCK error from setsockopt().
* ew/daemon-socket-keepalive:
Windows: add missing definition of ENOTSOCK
daemon: ignore ENOTSOCK from setsockopt
There are certain house-keeping tasks that need to be performed at
the very beginning of any Git program, and programs that are not
built-in commands had to do them exactly the same way as "git"
potty does. It was easy to make mistakes in one-off standalone
programs (like test helpers). A common "main()" function that
calls cmd_main() of individual program has been introduced to
make it harder to make mistakes.
* jk/common-main:
mingw: declare main()'s argv as const
common-main: call git_setup_gettext()
common-main: call restore_sigpipe_to_default()
common-main: call sanitize_stdfds()
common-main: call git_extract_argv0_path()
add an extra level of indirection to main()
In inetd mode, we are not guaranteed stdin or stdout is a
socket; callers could filter the data through a pipe
or be testing with regular files.
This prevents t5802 from polluting syslog.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This should be part of every program, as otherwise users do
not get translated error messages. However, some external
commands forgot to do so (e.g., git-credential-store). This
fixes them, and eliminates the repeated code in programs
that did remember to use it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is setup that should be done in every program for
safety, but we never got around to adding it everywhere (so
builtins benefited from the call in git.c, but any external
commands did not). Putting it in the common main() gives us
this safety everywhere.
Note that the case in daemon.c is a little funny. We wait
until we know whether we want to daemonize, and then either:
- call daemonize(), which will close stdio and reopen it to
/dev/null under the hood
- sanitize_stdfds(), to fix up any odd cases
But that is way too late; the point of sanitizing is to give
us reliable descriptors on 0/1/2, and we will already have
executed code, possibly called die(), etc. The sanitizing
should be the very first thing that happens.
With this patch, git-daemon will sanitize first, and can
remove the call in the non-daemonize case. It does mean that
daemonize() may just end up closing the descriptors we
opened, but that's not a big deal (it's not wrong to do so,
nor is it really less optimal than the case where our parent
process redirected us from /dev/null ahead of time).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Every program which links against libgit.a must call this
function, or risk hitting an assert() in system_path() that
checks whether we have configured argv0_path (though only
when RUNTIME_PREFIX is defined, so essentially only on
Windows).
Looking at the diff, you can see that putting it into the
common main() saves us having to do it individually in each
of the external commands. But what you can't see are the
cases where we _should_ have been doing so, but weren't
(e.g., git-credential-store, and all of the t/helper test
programs).
This has been an accident-waiting-to-happen for a long time,
but wasn't triggered until recently because it involves one
of those programs actually calling system_path(). That
happened with git-credential-store in v2.8.0 with ae5f677
(lazily load core.sharedrepository, 2016-03-11). The
program:
- takes a lock file, which...
- opens a tempfile, which...
- calls adjust_shared_perm to fix permissions, which...
- lazy-loads the config (as of ae5f677), which...
- calls system_path() to find the location of
/etc/gitconfig
On systems with RUNTIME_PREFIX, this means credential-store
reliably hits that assert() and cannot be used.
We never noticed in the test suite, because we set
GIT_CONFIG_NOSYSTEM there, which skips the system_path()
lookup entirely. But if we were to tweak git_config() to
find /etc/gitconfig even when we aren't going to open it,
then the test suite shows multiple failures (for
credential-store, and for some other test helpers). I didn't
include that tweak here because it's way too specific to
this particular call to be worth carrying around what is
essentially dead code.
The implementation is fairly straightforward, with one
exception: there is exactly one caller (git.c) that actually
cares about the result of the function, and not the
side-effect of setting up argv0_path. We can accommodate
that by simply replacing the value of argv[0] in the array
we hand down to cmd_main().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are certain startup tasks that we expect every git
process to do. In some cases this is just to improve the
quality of the program (e.g., setting up gettext()). In
others it is a requirement for using certain functions in
libgit.a (e.g., system_path() expects that you have called
git_extract_argv0_path()).
Most commands are builtins and are covered by the git.c
version of main(). However, there are still a few external
commands that use their own main(). Each of these has to
remember to include the correct startup sequence, and we are
not always consistent.
Rather than just fix the inconsistencies, let's make this
harder to get wrong by providing a common main() that can
run this standard startup.
We basically have two options to do this:
- the compat/mingw.h file already does something like this by
adding a #define that replaces the definition of main with a
wrapper that calls mingw_startup().
The upside is that the code in each program doesn't need
to be changed at all; it's rewritten on the fly by the
preprocessor.
The downside is that it may make debugging of the startup
sequence a bit more confusing, as the preprocessor is
quietly inserting new code.
- the builtin functions are all of the form cmd_foo(),
and git.c's main() calls them.
This is much more explicit, which may make things more
obvious to somebody reading the code. It's also more
flexible (because of course we have to figure out _which_
cmd_foo() to call).
The downside is that each of the builtins must define
cmd_foo(), instead of just main().
This patch chooses the latter option, preferring the more
explicit approach, even though it is more invasive. We
introduce a new file common-main.c, with the "real" main. It
expects to call cmd_main() from whatever other objects it is
linked against.
We link common-main.o against anything that links against
libgit.a, since we know that such programs will need to do
this setup. Note that common-main.o can't actually go inside
libgit.a, as the linker would not pick up its main()
function automatically (it has no callers).
The rest of the patch is just adjusting all of the various
external programs (mostly in t/helper) to use cmd_main().
I've provided a global declaration for cmd_main(), which
means that all of the programs also need to match its
signature. In particular, many functions need to switch to
"const char **" instead of "char **" for argv. This effect
ripples out to a few other variables and functions, as well.
This makes the patch even more invasive, but the end result
is much better. We should be treating argv strings as const
anyway, and now all programs conform to the same signature
(which also matches the way builtins are defined).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While --init-timeout and --timeout options exist and I've never
run git-daemon without them, some users may forget to set them
and encounter hung daemon processes when connections fail.
Enable socket-level timeouts so the kernel can send keepalive
probes as necessary to detect failed connections.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
There are many manual argv allocations that predate the
argv_array API. Switching to that API brings a few
advantages:
1. We no longer have to manually compute the correct final
array size (so it's one less thing we can screw up).
2. In many cases we had to make a separate pass to count,
then allocate, then fill in the array. Now we can do it
in one pass, making the code shorter and easier to
follow.
3. argv_array handles memory ownership for us, making it
more obvious when things should be free()d and and when
not.
Most of these cases are pretty straightforward. In some, we
switch from "run_command_v" to "run_command" which lets us
directly use the argv_array embedded in "struct
child_process".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strbuf_getline() interface allows a byte other than LF or NUL as
the line terminator, but this is only because I wrote these
codepaths anticipating that there might be a value other than NUL
and LF that could be useful when I introduced line_termination long
time ago. No useful caller that uses other value has emerged.
By now, it is clear that the interface is overly broad without a
good reason. Many codepaths have hardcoded preference to read
either LF terminated or NUL terminated records from their input, and
then call strbuf_getline() with LF or NUL as the third parameter.
This step introduces two thin wrappers around strbuf_getline(),
namely, strbuf_getline_lf() and strbuf_getline_nul(), and
mechanically rewrites these call sites to call either one of
them. The changes contained in this patch are:
* introduction of these two functions in strbuf.[ch]
* mechanical conversion of all callers to strbuf_getline() with
either '\n' or '\0' as the third parameter to instead call the
respective thin wrapper.
After this step, output from "git grep 'strbuf_getline('" would
become a lot smaller. An interim goal of this series is to make
this an empty set, so that we can have strbuf_getline_crlf() take
over the shorter name strbuf_getline().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git daemon" uses "run_command()" without "finish_command()", so it
needs to release resources itself, which it forgot to do.
* rs/daemon-plug-child-leak:
daemon: plug memory leak
run-command: factor out child_process_clear()
Call child_process_clear() when a child ends to release the memory
allocated for its environment. This is necessary because unlike all
other users of start_command() we don't call finish_command(), which
would have taken care of that for us.
This leak was introduced by f063d38b (daemon: use cld->env_array
when re-spawning).
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This avoids an ugly strcat into a fixed-size buffer. It's
not wrong (the buffer is plenty large enough for an IPv6
address plus some minor formatting), but it takes some
effort to verify that.
Unfortunately we are still stuck with some fixed-size
buffers to hold the output of inet_ntop. But at least we now
pass very easy-to-verify parameters, rather than doing a
manual computation to account for other data in the buffer.
As a side effect, this also fixes the case where we might
pass an uninitialized portbuf buffer through the
environment. This probably couldn't happen in practice, as
it would mean that addr->sa_family was neither AF_INET nor
AF_INET6 (and that is all we are listening on).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.
However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the callsites covered by this change call write_file() or
write_file_gently() to create a one-liner file. Drop the caller
supplied LF and let these callees to append it as necessary.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All callers except three passed 1 for the "fatal" parameter to ask
this function to die upon error, but to a casual reader of the code,
it was not all obvious what that 1 meant. Instead, split the
function into two based on a common write_file_v() that takes the
flag, introduce write_file_gently() as a new way to attempt creating
a file without dying on error, and make three callers to call it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git daemon" fails to build from the source under NO_IPV6
configuration (regression in 2.4).
* jc/daemon-no-ipv6-for-2.4.1:
daemon: unbreak NO_IPV6 build regression
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.
* nd/multiple-work-trees: (41 commits)
prune --worktrees: fix expire vs worktree existence condition
t1501: fix test with split index
t2026: fix broken &&-chain
t2026 needs procondition SANITY
git-checkout.txt: a note about multiple checkout support for submodules
checkout: add --ignore-other-wortrees
checkout: pass whole struct to parse_branchname_arg instead of individual flags
git-common-dir: make "modules/" per-working-directory directory
checkout: do not fail if target is an empty directory
t2025: add a test to make sure grafts is working from a linked checkout
checkout: don't require a work tree when checking out into a new one
git_path(): keep "info/sparse-checkout" per work-tree
count-objects: report unused files in $GIT_DIR/worktrees/...
gc: support prune --worktrees
gc: factor out gc.pruneexpire parsing code
gc: style change -- no SP before closing parenthesis
checkout: clean up half-prepared directories in --to mode
checkout: reject if the branch is already checked out elsewhere
prune: strategies for linked checkouts
checkout: support checking out into a new working directory
...
When 01cec54e (daemon: deglobalize hostname information, 2015-03-07)
wrapped the global variables such as hostname inside a struct, it
forgot to convert one location that spelled "hostname" that needs to
be updated to "hi->hostname".
This was inside NO_IPV6 block, and was not caught by anybody.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the variables related to the client-supplied hostname into its own
struct, let execute() own an instance of that instead of storing the
information in global variables and pass the struct to any function that
needs to access it as a parameter.
The lifetime of the variables is easier to see this way. Allocated
memory is released within execute(). The strbufs don't have to be reset
anymore because they are written to only once at most: parse_host_arg()
is only called once by execute() and lookup_hostname() guards against
being called twice using hostname_lookup_done.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert hostname, canon_hostname, ip_address and tcp_port to strbuf.
This allows to get rid of the helpers strbuf_addstr_or_null() and STRARG
because a strbuf always represents a valid (initially empty) string.
sanitize_client() is not needed anymore and sanitize_client_strbuf()
takes its place and name.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "interpolated-path" option of "git daemon" inserted any string
client declared on the "host=" capability request without checking.
Sanitize and limit %H and %CH to a saner and a valid DNS name.
* jk/daemon-interpolate:
daemon: sanitize incoming virtual hostname
t5570: test git-daemon's --interpolated-path option
git_connect: let user override virtual-host we send to daemon
"git daemon" looked up the hostname even when "%CH" and "%IP"
interpolations are not requested, which was unnecessary.
* rs/daemon-interpolate:
daemon: use callback to build interpolated path
daemon: look up client-supplied hostname lazily
Provide a callback function for strbuf_expand() instead of using the
helper strbuf_expand_dict_cb(). While the resulting code is longer, it
only looks up the canonical hostname and IP address if at least one of
the placeholders %CH and %IP are used with --interpolated-path.
Use a struct for passing the directory to the callback function instead
of passing it directly to avoid having to cast away its const qualifier.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Look up canonical hostname and IP address using getaddrinfo(3) or
gethostbyname(3) only if --interpolated-path or --access-hook were
specified.
Do that by introducing getter functions for canon_hostname and
ip_address and using them for all read accesses. These wrappers call
the new helper lookup_hostname(), which sets the variables only at its
first call.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use the daemon_avoid_alias function to make sure that the
pathname the user gives us is sane. However, after applying
that check, we might then interpolate the path using a
string given by the server admin, but which may contain more
untrusted data from the client. We should be sure to
sanitize this data, as well.
We cannot use daemon_avoid_alias here, as it is more strict
than we need in requiring a leading '/'. At the same time,
we can be much more strict here. We are interpreting a
hostname, which should not contain slashes or excessive runs
of dots, as those things are not allowed in DNS names.
Note that in addition to cleansing the hostname field, we
must check the "canonical hostname" (%CH) as well as the
port (%P), which we take as a raw string. For the canonical
hostname, this comes from an actual DNS lookup on the
accessed IP, which makes it a much less likely vector for
problems. But it does not hurt to sanitize it in the same
way. Unfortunately we cannot test this case easily, as it
would involve a custom hostname lookup.
We do not need to check %IP, as it comes straight from
inet_ntop, so must have a sane form.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes common problems in these code about error handling,
forgetting to close the file handle after fprintf() fails, or not
printing out the error string..
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git daemon" (with NO_IPV6 build configuration) used to incorrectly
use the hostname even when gethostbyname() reported that the given
hostname is not found.
* rs/daemon-fixes:
daemon: remove write-only variable maxfd
daemon: fix error message after bind()
daemon: handle gethostbyname() error
It became unused when 6573faff (NO_IPV6 support for git daemon) replaced
select() with poll().
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user-supplied hostname can't be found then we should not use it.
We already avoid doing that in the non-NO_IPV6 case by checking if the
return value of getaddrinfo() is zero (success). Do the same in the
NO_IPV6 case and make sure the return value of gethostbyname() isn't
NULL before dereferencing this pointer.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most struct child_process variables are cleared using memset first after
declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead. That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_bool()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git replace" an "--edit" mode.
* cc/replace-edit:
replace: use argv_array in export_object
avoid double close of descriptors handed to run_command
replace: replace spaces with tabs in indentation
When a file descriptor is given to run_command via the
"in", "out", or "err" parameters, run_command takes
ownership. The descriptor will be closed in the parent
process whether the process is spawned successfully or not,
and closing it again is wrong.
In practice this has not caused problems, because we usually
close() right after start_command returns, meaning no other
code has opened a descriptor in the meantime. So we just get
EBADF and ignore it (rather than accidentally closing
somebody else's descriptor!).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like earlier cases, we can use skip_prefix to avoid magic
numbers that must match the length of starts_with prefixes.
However, the numbers are a little more complicated here, as
we keep parsing past the prefix. We can solve it by keeping
a running pointer as we parse; its final value is the
location we want.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a common idiom to match a prefix and then skip past it
with a magic number, like:
if (starts_with(foo, "bar"))
foo += 3;
This is easy to get wrong, since you have to count the
prefix string yourself, and there's no compiler check if the
string changes. We can use skip_prefix to avoid the magic
numbers here.
Note that some of these conversions could be much shorter.
For example:
if (starts_with(arg, "--foo=")) {
bar = arg + 6;
continue;
}
could become:
if (skip_prefix(arg, "--foo=", &bar))
continue;
However, I have left it as:
if (skip_prefix(arg, "--foo=", &v)) {
bar = v;
continue;
}
to visually match nearby cases which need to actually
process the string. Like:
if (skip_prefix(arg, "--foo=", &v)) {
bar = atoi(v);
continue;
}
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
None of these strings is modified; marking them as const
will help later refactoring.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two implementations of the same function; let's drop
that to one. We take the name from daemon.c, but the
implementation (which is just slightly more efficient) from
the config code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use strcmp() instead of starts_with()/!prefixcmp() to stop accepting
--informative-errors-just-a-little
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>