Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.
Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).
Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on NTFS nor FAT32.
In practice this probably doesn't matter, though, as
the restricted names are rather obscure and almost
certainly would never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow paths with a ".git" component to be added to
the index, as that would mean repository contents could
overwrite our repository files. However, asking "is this
path the same as .git" is not as simple as strcmp() on some
filesystems.
On NTFS (and FAT32), there exist so-called "short names" for
backwards-compatibility: 8.3 compliant names that refer to the same files
as their long names. As ".git" is not an 8.3 compliant name, a short name
is generated automatically, typically "git~1".
Depending on the Windows version, any combination of trailing spaces and
periods are ignored, too, so that both "git~1." and ".git." still refer
to the Git directory. The reason is that 8.3 stores file names shorter
than 8 characters with trailing spaces. So literally, it does not matter
for the short name whether it is padded with spaces or whether it is
shorter than 8 characters, it is considered to be the exact same.
The period is the separator between file name and file extension, and
again, an empty extension consists just of spaces in 8.3 format. So
technically, we would need only take care of the equivalent of this
regex:
(\.git {0,4}|git~1 {0,3})\. {0,3}
However, there are indications that at least some Windows versions might
be more lenient and accept arbitrary combinations of trailing spaces and
periods and strip them out. So we're playing it real safe here. Besides,
there can be little doubt about the intention behind using file names
matching even the more lenient pattern specified above, therefore we
should be fine with disallowing such patterns.
Extra care is taken to catch names such as '.\\.git\\booh' because the
backslash is marked as a directory separator only on Windows, and we want
to use this new helper function also in fsck on other platforms.
A big thank you goes to Ed Thomson and an unnamed Microsoft engineer for
the detailed analysis performed to come up with the corresponding fixes
for libgit2.
This commit adds a function to detect whether a given file name can refer
to the Git directory by mistake.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on HFS+. In practice
this probably doesn't matter, though, as the restricted
names are rather obscure and almost certainly would
never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow paths with a ".git" component to be added to
the index, as that would mean repository contents could
overwrite our repository files. However, asking "is this
path the same as .git" is not as simple as strcmp() on some
filesystems.
HFS+'s case-folding does more than just fold uppercase into
lowercase (which we already handle with strcasecmp). It may
also skip past certain "ignored" Unicode code points, so
that (for example) ".gi\u200ct" is mapped ot ".git".
The full list of folds can be found in the tables at:
https://www.opensource.apple.com/source/xnu/xnu-1504.15.3/bsd/hfs/hfscommon/Unicode/UCStringCompareData.h
Implementing a full "is this path the same as that path"
comparison would require us importing the whole set of
tables. However, what we want to do is much simpler: we
only care about checking ".git". We know that 'G' is the
only thing that folds to 'g', and so on, so we really only
need to deal with the set of ignored code points, which is
much smaller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We complain about ".git" in a tree because it cannot be
loaded into the index or checked out. Since we now also
reject ".GIT" case-insensitively, fsck should notice the
same, so that errors do not propagate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check that fsck notices and complains about confusing
paths in trees. However, there are a few shortcomings:
1. We check only for these paths as file entries, not as
intermediate paths (so ".git" and not ".git/foo").
2. We check "." and ".." together, so it is possible that
we notice only one and not the other.
3. We repeat a lot of boilerplate.
Let's use some loops to be more thorough in our testing, and
still end up with shorter code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow ".git" to enter into the index as a path
component, because checking out the result to the working
tree may causes confusion for subsequent git commands.
However, on case-insensitive file systems, ".Git" or ".GIT"
is the same. We should catch and prevent those, too.
Note that technically we could allow this for repos on
case-sensitive filesystems. But there's not much point. It's
unlikely that anybody cares, and it creates a repository
that is unexpectedly non-portable to other systems.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We should prevent nonsense paths from entering the index in
the first place, as they can cause confusing results if they
are ever checked out into the working tree. We already do
so, but we never tested it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpack_trees tries to write an entry to the index,
add_index_entry may report an error to stderr, but we ignore
its return value. This leads to us returning a successful
exit code for an operation that partially failed. Let's make
sure to propagate this code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GnuPG homedir is generated on the fly and keys are imported from
armored key file. Make comment match available key info and new key
generation procedure.
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add --[no-]xmailer that allows a user to disable adding the 'X-Mailer:'
header to the email being sent.
Signed-off-by: Luis Henriques <henrix@camandro.org>
Acked-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The main hunk loop for add--interactive will loop if it does
not get a known input. This is a good thing if the user
typed some invalid input. However, if we have an
uncorrectable read error, we'll end up looping infinitely.
We can fix this by noticing read errors (i.e., <STDIN>
returns undef) and breaking out of the loop.
One easy way to trigger this is if you have an editor that
does not take over the terminal (e.g., one that spawns a
window in an existing process and waits), start the editor
with the hunk-edit command, and hit ^C to send SIGINT. The
editor process dies due to SIGINT, but the perl
add--interactive process does not (perl suspends SIGINT for
the duration of our system() call).
We return to the main loop, but further reads from stdin
don't work. The SIGINT _also_ killed our parent git process,
which orphans our process group, meaning that further reads
from the terminal will always fail. We loop infinitely,
getting EIO on each read.
Note that there are several other spots where we read from
stdin, too. However, in each of those cases, we do something
sane when the read returns undef (breaking out of the loop,
taking the input as "no", etc). They don't need similar
treatment.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation in the completion scripts for Bash and Zsh state the wrong filenames.
Signed-off-by: Peter van der Does <peter@avirtualhome.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).
Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More specifically:
* Add "\" to the list of characters not allowed in a token (see RFC 2047
errata).
* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
removing duplication, this also makes unquote_rfc2047 more stringent.
* Allow both "q" and "Q" to identify the encoding.
* Allow lowercase hexadecimal digits in the "Q" encoding.
And, more on the cosmetic side:
* Change the "encoded-text" regex to exclude rather than include characters,
for clarity and consistency with "token".
Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function uses (non-local) $f to store the value of its first parameter.
This can interfere with the user's environment.
Signed-off-by: Justin Guenther <jguenther@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git 2.0 was supposed to make the "simple" mode for the default of
"git push", but it didn't.
* jk/push-simple:
push: truly use "simple" as default, not "upstream"
The build procedure did not bother fixing perl and python scripts
when NO_PERL and NO_PYTHON build-time configuration changed.
* jk/rebuild-perl-scripts-with-no-perl-seting-change:
Makefile: have python scripts depend on NO_PYTHON setting
Makefile: simplify by using SCRIPT_{PERL,SH}_GEN macros
Makefile: have perl scripts depend on NO_PERL setting
Some tests that depend on perl lacked PERL prerequisite to protect
them, breaking build with NO_PERL configuration.
* jk/no-perl-tests:
t960[34]: mark cvsimport tests as requiring perl
t0090: mark add-interactive test with PERL prerequisite
It is distracting to let the GPG message while setting up the test
gpghome leak into the test output, especially without running these
tests with "-v" option.
The splitting of RFC1991 prerequiste part is about future-proofing.
When we want to define other kinds of specific prerequisites in the
future, we'd prefer to see it done separately from the basic set-up
code.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Importing PGP key public and security ring works, but we do not have
all secret keys in one binary blob and all public keys in another.
Instead import public and secret keys for one key pair from a text
file that holds ASCII-armored export of them.
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call strbuf_complete_line() instead of open-coding it. Also remove
surrounding comments indicating the intent to complete a line since
this information is already included in the function name.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GnuPG >= 2.1.0 no longer supports RFC1991, so skip these tests.
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GnuPG 2.1 homedir looks different, so just create it on the fly by
importing needed private and public keys and ownertrust.
This solves an issue with gnupg 2.1 running interactive pinentry
when old secret key is present.
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To figure out the author ident for a commit, we call
determine_author_info(). This function collects information
from the environment, other commits (in the case of
"--amend" or "-c/-C"), and the "--author" option. It then
uses fmt_ident to generate the final ident string that goes
into the commit object. fmt_ident is therefore responsible
for any quality or validation checks on what is allowed to
go into a commit.
Before returning, though, we call split_ident_line on the
result, and feed the individual components to hooks via the
GIT_AUTHOR_* variables. Furthermore, we do extra validation
by feeding the split to sane_ident_split(), which is pickier
than fmt_ident (in particular, it will complain about an empty
email field). If this parsing or validation fails, we skip
updating the environment variables.
This is bad, because it means that hooks may silently see a
different ident than what we are putting into the commit. We
should drop the extra sane_ident_split checks entirely, and
take whatever fmt_ident has fed us (and what will go into
the commit object).
If parsing fails, we should actually abort here rather than
continuing (and feeding the hooks bogus data). However,
split_ident_line should never fail here. The ident was just
generated by fmt_ident, so we know that it's sane. We can
use assert_split_ident to double-check this.
Note that we also teach that assertion to check that we
found a date (it always should, but until now, no caller
cared whether we found a date or not). Checking the return
value of sane_ident_split is enough to ensure we have the
name/email pointers set, and checking date_begin is enough
to know that all of the date/tz variables are set.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we generate the commit-message template, we try to
report an author or committer ident that will be of interest
to the user: an author that does not match the committer, or
a committer that was auto-configured.
When doing so, if we encounter what we consider to be a
bogus ident, we immediately die. This is a bad idea, because
our use of the idents here is purely informational. Any
ident rules should be enforced elsewhere, because commits
that do not invoke the editor will not even hit this code
path (e.g., "git commit -mfoo" would work, but "git commit"
would not). So at best, we are redundant with other checks,
and at worse, we actively prevent commits that should
otherwise be allowed.
We should therefore do the minimal parsing we can to get a
value and not do any validation (i.e., drop the call to
sane_ident_split()).
In theory we could notice when even our minimal parsing
fails to work, and do the sane thing for each check (e.g.,
if we have an author but can't parse the committer, assume
they are different and print the author). But we can
actually simplify this even further.
We know that the author and committer strings we are parsing
have been generated by us earlier in the program, and
therefore they must be parseable. We could just call
split_ident_line without even checking its return value,
knowing that it will put _something_ in the name/mail
fields. Of course, to protect ourselves against future
changes to the code, it makes sense to turn this into an
assert, so we are not surprised if our assumption fails.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git is compiled with "-fsanitize=address" (using clang
or gcc >= 4.8), all invocations of git will check for buffer
overflows. This is similar to running with valgrind, except
that it is more thorough (because of the compiler support,
function-local buffers can be checked, too) and runs much
faster (making it much less painful to run the whole test
suite with the checks turned on).
Unlike valgrind, the magic happens at compile-time, so we
don't need the same infrastructure in the test suite that we
did to support --valgrind. But there are two things we can
help with:
1. On some platforms, the leak-detector is on by default,
and causes every invocation of "git init" (and thus
every test script) to fail. Since running git with
the leak detector is pointless, let's shut it off
automatically in the tests, unless the user has already
configured it.
2. When apache runs a CGI, it clears the environment of
unknown variables. This means that the $ASAN_OPTIONS
config doesn't make it to git-http-backend, and it
dies due to the leak detector. Let's mark the variable
as OK for apache to pass.
With these two changes, running
make CC=clang CFLAGS=-fsanitize=address test
works out of the box.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If "git update-ref --stdin" was given a "verify" command with no
"<newvalue>" at all (not even zeros), the code was mistakenly setting
have_old=0 (and leaving old_sha1 uninitialized). But this is
incorrect: this command is supposed to verify that the reference
doesn't exist. So in this case we really need old_sha1 to be set to
null_sha1 and have_old to be set to 1.
Moreover, since have_old was being set to zero, *no* check of the old
value was being done, so the new value of the reference was being set
unconditionally to the value in new_sha1. new_sha1, in turn, was set
to null_sha1 in the expectation that that was the old value and it
shouldn't be changed. But because the precondition was not being
checked, the result was that the reference was being deleted
unconditionally.
So, if <oldvalue> is missing, set have_old unconditionally and set
old_sha1 to null_sha1.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Acked-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>