Our is_hfs_dotgit function relies on the hackily-implemented
next_hfs_char to give us the next character that an HFS+
filename comparison would look at. It's hacky because it
doesn't implement the full case-folding table of HFS+; it
gives us just enough to see if the path matches ".git".
At the end of next_hfs_char, we use tolower() to convert our
32-bit code point to lowercase. Our tolower() implementation
only takes an 8-bit char, though; it throws away the upper
24 bits. This means we can't have any false negatives for
is_hfs_dotgit. We only care about matching 7-bit ASCII
characters in ".git", and we will correctly process 'G' or
'g'.
However, we _can_ have false positives. Because we throw
away the upper bits, code point \u{0147} (for example) will
look like 'G' and get downcased to 'g'. It's not known
whether a sequence of code points whose truncation ends up
as ".git" is meaningful in any language, but it does not
hurt to be more accurate here. We can just pass out the full
32-bit code point, and compare it manually to the upper and
lowercase characters we care about.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New tag object format validation added in 2.2 showed garbage
after a tagname it reported in its error message.
* js/fsck-tag-validation:
index-pack: terminate object buffers with NUL
fsck: properly bound "invalid tag name" error message
Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.
Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).
Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We complain about ".git" in a tree because it cannot be
loaded into the index or checked out. Since we now also
reject ".GIT" case-insensitively, fsck should notice the
same, so that errors do not propagate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check that fsck notices and complains about confusing
paths in trees. However, there are a few shortcomings:
1. We check only for these paths as file entries, not as
intermediate paths (so ".git" and not ".git/foo").
2. We check "." and ".." together, so it is possible that
we notice only one and not the other.
3. We repeat a lot of boilerplate.
Let's use some loops to be more thorough in our testing, and
still end up with shorter code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we detect an invalid tag-name header in a tag object,
like, "tag foo bar\n", we feed the pointer starting at "foo
bar" to a printf "%s" formatter. This shows the name, as we
want, but then it keeps printing the rest of the tag buffer,
rather than stopping at the end of the line.
Our tests did not notice because they look only for the
matching line, but the bug is that we print much more than
we wanted to. So we also adjust the test to be more exact.
Note that when fscking tags with "index-pack --strict", this
is even worse. index-pack does not add a trailing
NUL-terminator after the object, so we may actually read
past the buffer and print uninitialized memory. Running
t5302 with valgrind does notice the bug for that reason.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using "hash-object --literally", test one of the new breakages
js/fsck-tag-validation topic teaches "fsck" to catch is caught.
* jc/hash-object-fsck-tag:
t1450: make sure fsck detects a malformed tagger line
Teach "git fsck" to inspect the contents of annotated tag objects.
* js/fsck-tag-validation:
Make sure that index-pack --strict checks tag objects
Add regression tests for stricter tag fsck'ing
fsck: check tag objects' headers
Make sure fsck_commit_buffer() does not run out of the buffer
fsck_object(): allow passing object data separately from the object itself
Refactor type_from_string() to allow continuing after detecting an error
With "hash-object --literally", write a tag object that is not
supposed to pass one of the new checks added to "fsck", and make
sure that the new check catches the breakage.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fsck tries hard to detect missing objects, and will complain
(and exit non-zero) about any inter-object links that are
missing. However, it will not exit non-zero for any missing
ref tips, meaning that a severely broken repository may
still pass "git fsck && echo ok".
The problem is that we use for_each_ref to iterate over the
ref tips, which hides broken tips. It does at least print an
error from the refs.c code, but fsck does not ever see the
ref and cannot note the problem in its exit code. We can solve
this by using for_each_rawref and noting the error ourselves.
In addition to adding tests for this case, we add tests for
all types of missing-object links (all of which worked, but
which we were not testing).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The intent of the new test case is to catch general breakages in
the fsck_tag() function, not so much to test it extensively, trying to
strike the proper balance between thoroughness and speed.
While it *would* have been nice to test the code path where fsck_object()
encounters an invalid tag object, this is not possible using git fsck: tag
objects are parsed already before fsck'ing (and the parser already fails
upon such objects).
Even worse: we would not even be able write out invalid tag objects
because git hash-object parses those objects, too, unless we resorted to
really ugly hacks such as using something like this in the unit tests
(essentially depending on Perl *and* Compress::Zlib):
hash_invalid_object () {
contents="$(printf '%s %d\0%s' "$1" ${#2} "$2")" &&
sha1=$(echo "$contents" | test-sha1) &&
suffix=${sha1#??} &&
mkdir -p .git/objects/${sha1%$suffix} &&
echo "$contents" |
perl -MCompress::Zlib -e 'undef $/; print compress(<>)' \
> .git/objects/${sha1%$suffix}/$suffix &&
echo $sha1
}
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon finding a corrupt loose object, we forgot to note the error to
signal it with the exit status of the entire process.
[jc: adjusted t1450 and added another test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.
Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.
Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.
For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).
For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having a ".git" entry inside a tree can cause confusing
results on checkout. At the top-level, you could not
checkout such a tree, as it would complain about overwriting
the real ".git" directory. In a subdirectory, you might
check it out, but performing operations in the subdirectory
would confusingly consider the in-tree ".git" directory as
the repository.
The regular git tools already make it hard to accidentally
add such an entry to a tree, and do not allow such entries
to enter the index at all. Teaching fsck about it provides
an additional safety check, and let's us avoid propagating
any such bogosity when transfer.fsckObjects is on.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A tree with meta-paths like '.' or '..' does not work well
with git; the index will refuse to load it or check it out
to the filesystem (and even if we did not have that safety,
it would look like we were overwriting an untracked
directory). For the same reason, it is difficult to create
such a tree with regular git.
Let's warn about these dubious entries during fsck, just in
case somebody has created a bogus tree (and this also lets
us prevent them from propagating when transfer.fsckObjects
is set).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a tag T points at an object X that is of a type that is
different from what the tag records as, fsck should report it as an
error.
However, depending on the order X and T are checked individually,
the actual error message can be different. If X is checked first,
fsck remembers X's type and then when it checks T, it notices that T
records X as a wrong type (i.e. the complaint is about a broken tag
T). If T is checked first, on the other hand, fsck remembers that we
need to verify X is of the type tag records, and when it later
checks X, it notices that X is of a wrong type (i.e. the complaint
is about a broken object X).
The important thing is that fsck notices such an error and diagnoses
the issue on object X, but the test was expecting that we happen to
check objects in the order to make us detect issues with tag T, not
with object X. Remove this unwarranted assumption.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
Short of somebody happening to beat the 1 in 2^160 odds of
actually generating content that hashes to the null sha1, we
should never see this value in a tree entry. So let's have
fsck warn if it it seen.
As in the previous commit, we test both blob and submodule
entries to future-proof the test suite against the
implementation depending on connectivity to notice the
error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default output from "fsck" is often overwhelmed by informational
message on dangling objects, especially if you do not repack often, and a
real error can easily be buried.
Add "--no-dangling" option to omit them, and update the user manual to
demonstrate its use.
Based on a patch by Clemens Buchacher, but reverted the part to change
the default to --no-dangling, which is unsuitable for the first patch.
The usual three-step procedure to break the backward compatibility over
time needs to happen on top of this, if we were to go in that direction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git rev-list passes rev_list_info, not rev_list objects. Without this
fix, rev-list enables or disables the --verify-objects option depending
on a read from an undefined memory location.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck allows a name with > character in it like "name> <email>". Also for
"name email>" fsck says "missing space before email".
More precisely, it seeks for a first '<', checks that ' ' preceeds it.
Then seeks to '<' or '>' and checks that it is the '>'. Missing space is
reported if either '<' is not found or it's not preceeded with ' '.
Change it to following. Seek to '<' or '>', check that it is '<' and is
preceeded with ' '. Seek to '<' or '>' and check that it is '>'. So now
"name> <email>" is rejected as "bad name". More strict name check is the
only change in what is accepted.
Report 'missing space' only if '<' is found and is not preceeded with a
space.
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck reports "missing space before <email>" for committer string equal
to "name email>" or to "". It'd be nicer to say "missing email" for
the second string and "name is bad" (has > in it) for the first one.
Add a failing test for these messages.
For "name> <email>" no error is reported. Looks like a bug, so add
such a failing test."
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Breaks in a test assertion's && chain can potentially hide
failures from earlier commands in the chain.
Commands intended to fail should be marked with !, test_must_fail, or
test_might_fail. The examples in this patch do not require that.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fsck test is generally careful to remove the corrupt objects
it inserts, but dangling objects are left behind due to some typos
and omissions. It is better to clean up more completely, to
simplify the addition of later tests. So:
- guard setup and cleanup with test_expect_success to catch
typos and errors;
- check both stdout and stderr when checking for empty fsck
output;
- use test_cmp empty file in place of test $(wc -l <file) = 0,
for better debugging output when running tests with -v;
- add a remove_object () helper and use it to replace broken
object removal code that forgot about the fanout in
.git/objects;
- disable gc.auto, to avoid tripping up object removal if the
number of objects ever reaches that threshold.
- use test_when_finished to ensure cleanup tasks are run and
succeed when tests fail;
- add a new final test that no breakage or dangling objects
was left behind.
While at it, add a brief description to test_description of the
history that is expected to persist between tests.
Part of a campaign to clean up subshell usage in tests.
Cc: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
daae1922 (fsck: check ident lines in commit objects, 2010-04-24)
taught fsck to expect commit objects to have the form
tree <object name>
<parents>
author <valid ident string>
committer <valid ident string>
log message
The check is overly strict: for example, it errors out with the
message “expected blank line” for perfectly valid commits with an
"encoding ISO-8859-1" line.
Later it might make sense to teach fsck about the rest of the header
and warn about unrecognized header lines, but for simplicity, let’s
accept arbitrary trailing lines for now.
Reported-by: Tuncer Ayaz <tuncer.ayaz@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check that email addresses do not contain <, >, or newline so they can
be quickly scanned without trouble. The copy() function in ident.c
already ensures that ordinary git commands will not write email
addresses without this property.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Almost exactly a year ago in 02a6552 (Test fsck a bit harder), I
introduced two testcases that were expecting failure.
However, the only bug was that the testcases wrote *blobs* because I
forgot to pass -t tag to hash-object. Fix this, and then adjust the
rest of the test to properly check the result.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-fsck, of all tools, has very few tests. This adds some more:
* a corrupted object;
* a branch pointing to a non-commit;
* a tag pointing to a nonexistent object;
* and a tag pointing to an object of a type other than what the tag
itself claims.
Only the first two are caught. At least the third probably should,
too, but currently slips through.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck" used to validate only loose objects that are local and nothing
else by default. This is not just too little when a repository is
borrowing objects from other object stores, but also caused the
connectivity check to mistakenly declare loose objects borrowed from them
to be missing.
The rationale behind the default mode that validates only loose objects is
because these objects are still young and more unlikely to have been
pushed to other repositories yet. That holds for loose objects borrowed
from alternate object stores as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default we looked at all refs but not HEAD. The only thing that made
fsck not lose sight of commits that are only reachable from a detached
HEAD was the reflog for the HEAD.
This fixes it, with a new test.
Signed-off-by: Junio C Hamano <gitster@pobox.com>