Some functions from the refs module were still declared in cache.h.
Move them to refs.h.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes users want to report a bug they experience on
their repository, but they are not at liberty to share the
contents of the repository. It would be useful if they could
produce a repository that has a similar shape to its history
and tree, but without leaking any information. This
"anonymized" repository could then be shared with developers
(assuming it still replicates the original problem).
This patch implements an "--anonymize" option to
fast-export, which generates a stream that can recreate such
a repository. Producing a single stream makes it easy for
the caller to verify that they are not leaking any useful
information. You can get an overview of what will be shared
by running a command like:
git fast-export --anonymize --all |
perl -pe 's/\d+/X/g' |
sort -u |
less
which will show every unique line we generate, modulo any
numbers (each anonymized token is assigned a number, like
"User 0", and we replace it consistently in the output).
In addition to anonymizing, this produces test cases that
are relatively small (compared to the original repository)
and fast to generate (compared to using filter-branch, or
modifying the output of fast-export yourself). Here are
numbers for git.git:
$ time git fast-export --anonymize --all \
--tag-of-filtered-object=drop >output
real 0m2.883s
user 0m2.828s
sys 0m0.052s
$ gzip output
$ ls -lh output.gz | awk '{print $5}'
2.9M
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths. Use this to optimize the code paths to
validate GPG signatures in commit objects.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.
For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.
This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these sites assumes that commit->buffer is valid.
Since they would segfault if this was not the case, they are
likely to be correct in practice. However, we can
future-proof them by using get_commit_buffer.
And as a side effect, we abstract away the final bare uses
of commit->buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So that we can convert the exported ref names.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't want to pass arguments specific to fast-export to
setup_revisions.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.
* cc/starts-n-ends-with:
replace {pre,suf}fixcmp() with {starts,ends}_with()
strbuf: introduce starts_with() and ends_with()
builtin/remote: remove postfixcmp() and use suffixcmp() instead
environment: normalize use of prefixcmp() by removing " != 0"
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/robustify-parse-commit:
checkout: do not die when leaving broken detached HEAD
use parse_commit_or_die instead of custom message
use parse_commit_or_die instead of segfaulting
assume parse_commit checks for NULL commit
assume parse_commit checks commit->object.parsed
log_tree_diff: die when we fail to parse a commit
Some unchecked calls to parse_commit should obviously die on
error, because their next step is to start looking at the
parsed fields, which will cause a segfault. These are
obvious candidates for parse_commit_or_die, which will be a
strict improvement in behavior.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert most uses of OPT_BOOLEAN/OPTION_BOOLEAN that can use
OPT_BOOL/OPTION_BOOLEAN which have much saner semantics, and turn
remaining ones into OPT_SET_INT, OPT_COUNTUP, etc. as necessary.
* sb/parseopt-boolean-removal:
revert: use the OPT_CMDMODE for parsing, reducing code
checkout-index: fix negations of even numbers of -n
config parsing options: allow one flag multiple times
hash-object: replace stdin parsing OPT_BOOLEAN by OPT_COUNTUP
branch, commit, name-rev: ease up boolean conditions
checkout: remove superfluous local variable
log, format-patch: parsing uses OPT__QUIET
Replace deprecated OPT_BOOLEAN by OPT_BOOL
Remove deprecated OPTION_BOOLEAN for parsing arguments
Split into a separate helper function get_commit() so that the part that
finds the relevant commit, and the part that does something with it
(handle tag object, etc.) are in different places.
No functional changes.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to pass it around everywhere. This would make easier
further refactoring that makes use of this variable.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.
This patch does not change semantics of any command intentionally.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's wrong to call get_sha1() if they should be SHA-1s, plus
inefficient.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't need the parsed objects at this point, merely the
information that they have marks.
Seems to be three times faster in my setup with lots of objects.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We read from the marks file and keep only marked commits, but in
order to find the type of object, we are parsing the whole thing,
which is slow, specially in big repositories with lots of big files.
There's no need for that, we can query the object information with
sha1_object_info().
Before this, loading the objects of a fresh emacs import, with 260598
blobs took 14 minutes, after this patch, it takes 3 seconds.
This is the way fast-import does it. Also die if the object is not
found (like fast-import).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This issues a warning while stripping signatures from signed tags, which
allows us to use it as default behaviour for remote helpers which cannot
specify how to handle signed tags.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Correct common spelling mistakes in comments and tests
kwset: fix spelling in comments
precompose-utf8: fix spelling of "want" in error message
compat/nedmalloc: fix spelling in comments
compat/regex: fix spelling and grammar in comments
obstack: fix spelling of similar
contrib/subtree: fix spelling of accidentally
git-remote-mediawiki: spelling fixes
doc: various spelling fixes
fast-export: fix argument name in error messages
Documentation: distinguish between ref and offset deltas in pack-format
i18n: make the translation of -u advice in one go
The --signed-tags argument is plural, while error messages referred
to --signed-tag (singular). Tweak error messages to correspond to the
argument.
Signed-off-by: Paul Price <price@astro.princeton.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-export can fail because of some pruned-reference when importing a
mark file.
The problem happens in the following scenario:
$ git fast-export --export-marks=MARKS master
(rewrite master)
$ git prune
$ git fast-export --import-marks=MARKS master
This might fail if some references have been removed by prune
because some marks will refer to no longer existing commits.
git-fast-export will not need these objects anyway as they were no
longer reachable.
We still need to update last_numid so we don't change the mapping
between marks and objects for remote-helpers.
Unfortunately, the mark file should not be rewritten without lost marks
if no new objects has been exported, as we could lose track of the last
last_numid.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Reviewed-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fast-export wants to export a blob object, it first
calls parse_object to get a "struct object" and check
whether we have already shown the object. If we haven't
shown it, we then use read_sha1_file to pull it from disk
and write it out.
That means we load each blob from disk twice: once for
parse_object to find its type and check its sha1, and a
second time when we actually output it. We can drop this to
a single load by using lookup_object to check the SHOWN
flag, and then checking the signature on and outputting a
single buffer.
This provides modest speedups on git.git (best-of-five, "git
fast-export HEAD >/dev/null"):
[before] [after]
real 0m14.347s real 0m13.780s
user 0m14.084s user 0m13.620s
sys 0m0.208s sys 0m0.100s
and somewhat more on more blob-heavy repos (this is a
repository full of media files):
[before] [after]
real 0m52.236s real 0m44.451s
user 0m50.568s user 0m43.000s
sys 0m1.536s sys 0m1.284s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The handle_object function is rather vaguely named; it only
operates on blobs, and its purpose is to export the blob to
the output stream. Let's call it "export_blob" to make it
more clear what it does.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an object has already been exported (and thus is in the marks) it's
flagged as SHOWN, so it will not be exported again, even if in a later
time it's exported through a different ref.
We don't need the object to be exported again, but we want the ref
updated, which doesn't happen.
Since we can't know if a ref was exported or not, let's just assume that
if the commit was marked (flags & SHOWN), the user still wants the ref
updated.
IOW: If it's specified in the command line, it will get updated,
regardless of whether or not the object was marked.
So:
% git branch test master
% git fast-export $mark_flags master
% git fast-export $mark_flags test
Would export 'test' properly.
Additionally, this fixes issues with remote helpers; now they can push
refs whose objects have already been exported, and a few other issues as
well. Update the tests accordingly.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They have been marked as UNINTERESTING for a reason, lets respect
that. Currently the first ref is handled properly, but not the
rest. Assuming that all the refs point at the same commit in the
following example:
% git fast-export master ^uninteresting ^foo ^bar
reset refs/heads/bar
from :0
reset refs/heads/foo
from :0
reset refs/heads/uninteresting
from :0
% git fast-export ^uninteresting ^foo ^bar master
reset refs/heads/master
from :0
reset refs/heads/bar
from :0
reset refs/heads/foo
from :0
Clearly this is wrong; the negative refs should be ignored.
After this patch:
% git fast-export ^uninteresting ^foo ^bar master
# nothing
% git fast-export master ^uninteresting ^foo ^bar
# nothing
And even more, it would only happen if the ref is pointing to exactly
the same commit, but not otherwise:
% git fast-export ^next next
reset refs/heads/next
from :0
% git fast-export ^next next^{commit}
# nothing
% git fast-export ^next next~0
# nothing
% git fast-export ^next next~1
# nothing
% git fast-export ^next next~2
# nothing
The reason this happens is that before traversing the commits,
fast-export checks if any of the refs point to the same object, and any
duplicated ref gets added to a list in order to issue 'reset' commands
after the traversing. Unfortunately, it's not even checking if the
commit is flagged as UNINTERESTING. The fix of course, is to check it.
However, in order to do it properly we need to get the UNINTERESTING
flag from the command line, not from the commit object, because
"^foo bar" will mark the commit 'bar' uninteresting if foo and bar
points at the same commit. rev_cmdline_info, which was introduced
exactly to handle this situation, contains all the information we
need for get_tags_and_duplicates(), plus the ref flag. This way the
rest of the positive refs will remain untouched; it's only the
negative ones that change in behavior.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting 'commit' to 'commit' is a no-op. It might have been there to
avoid a compiler warning, but if so, it was the compiler to blame, and
it's certainly not there any more.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to be able to import, and then export, using the same marks, so
that we don't push things that the other side already received.
Unfortunately, fast-export doesn't store blobs in the marks, but
fast-import does. This creates a mismatch when fast export is reusing a
mark that was previously stored by fast-import.
There is no point in one tool saving blobs, and the other not, but for
now let's just check in fast-export that the objects are indeed commits.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fast-export" produced an input stream for fast-import without
properly quoting pathnames when they contain SPs in them.
* js/fast-export-paths-with-spaces:
fast-export: quote paths with spaces
A path containing a space must be quoted when used as an
argument to either the copy or rename commands (because
unlike other commands, the path is not the final thing on
the line for those commands).
Commit 6280dfdc3b (fast-export: quote paths in output,
2011-08-05) previously attempted to fix fast-export's
quoting by passing all paths through quote_c_style().
However, that function does not consider the space to be a
character which requires quoting, so let's special-case the
space inside print_path(). This will cause space-containing
paths to also be quoted in other commands where such quoting
is not strictly necessary, but it does not hurt to do so.
The test from 6280dfdc3b did not detect this because, while
it does introduce renames in the export stream, it does not
actually turn on rename detection, so they were presented as
pairs of deletions/adds. Using "-M" reveals the bug.
Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
PARSE_OPT_NEGHELP is confusing because short options defined with that
flag do the opposite of what the helptext says. It is also not needed
anymore now that options starting with no- can be negated by removing
that prefix. Convert its only two users to OPT_NEGBIT() and OPT_BOOL()
and then remove support for PARSE_OPT_NEGHELP.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/fast-export.c we'd assign to variables of the
tag_of_filtered_mode enum type with constants defined for the
signed_tag_mode enum.
We'd get the intended value since both the value we were assigning
with and the one we actually wanted had the same positional within
their respective enums, but doing it this way makes no sense.
This issue was spotted by Sun Studio 12 Update 1:
"builtin/fast-export.c", line 54: warning: enum type mismatch: op "=" (E_ENUM_TYPE_MISMATCH_OP)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many pathnames in a fast-import stream need to be quoted. In
particular:
1. Pathnames at the end of an "M" or "D" line need quoting
if they contain a LF or start with double-quote.
2. Pathnames on a "C" or "R" line need quoting as above,
but also if they contain spaces.
For (1), we weren't quoting at all. For (2), we put
double-quotes around the paths to handle spaces, but ignored
the possibility that they would need further quoting.
This patch checks whether each pathname needs c-style
quoting, and uses it. This is slightly overkill for (1),
which doesn't actually need to quote many characters that
vanilla c-style quoting does. However, it shouldn't hurt, as
any implementation needs to be ready to handle quoted
strings anyway.
In addition to adding a test, we have to tweak a test which
blindly assumed that case (2) would always use
double-quotes, whether it needed to or not.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If fast-export is being used to generate a fast-import stream that
will be used afterwards it is desirable to indicate the end of the
stream with the new 'done' command.
Add a flag that causes fast-export to end with 'done'.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mg/placeholders-are-lowercase:
Make <identifier> lowercase in Documentation
Make <identifier> lowercase as per CodingGuidelines
Make <identifier> lowercase as per CodingGuidelines
Make <identifier> lowercase as per CodingGuidelines
CodingGuidelines: downcase placeholders in usage messages
t9350 sets up a commit where a file is both copied and renamed. The output
of fast-export for this commit should look like this:
author ...
committer ...
from :19
C "file2" "file4"
R "file2" "file5"
The order of the two modification lines is derived from the result that
the diff machinery produces.
060df62 (fast-export: Fix output order of D/F changes) inserted a qsort
call that modifies the order of the diff result. Unfortunately, qsort need
not be stable. Therefore, it is possible that the 'R' line appears before
the 'C' line and the resulting fast-import stream is incorrect.
Fix it by forcing that the rename entry is printed after all other
modification lines with the same file name.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* en/d-f-conflict-fix:
merge-recursive: Avoid excessive output for and reprocessing of renames
merge-recursive: Fix multiple file rename across D/F conflict
t6031: Add a testcase covering multiple renames across a D/F conflict
merge-recursive: Fix typo
Mark tests that use symlinks as needing SYMLINKS prerequisite
t/t6035-merge-dir-to-symlink.sh: Remove TODO on passing test
fast-import: Improve robustness when D->F changes provided in wrong order
fast-export: Fix output order of D/F changes
merge_recursive: Fix renames across paths below D/F conflicts
merge-recursive: Fix D/F conflicts
Add a rename + D/F conflict testcase
Add additional testcases for D/F conflicts
Conflicts:
merge-recursive.c