Convert as many instances of unsigned char [20] as possible. Update the
callers of notes_cache_get and notes_cache_put to use the new interface.
Among the functions updated are callers of
lookup_commit_reference_gently, which we will soon convert.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from unsigned char [40] to struct object_id continues.
* bc/object-id:
Documentation: update and rename api-sha1-array.txt
Rename sha1_array to oid_array
Convert sha1_array_for_each_unique and for_each_abbrev to object_id
Convert sha1_array_lookup to take struct object_id
Convert remaining callers of sha1_array_lookup to object_id
Make sha1_array_append take a struct object_id *
sha1-array: convert internal storage for struct sha1_array to object_id
builtin/pull: convert to struct object_id
submodule: convert check_for_new_submodule_commits to object_id
sha1_name: convert disambiguate_hint_fn to take object_id
sha1_name: convert struct disambiguate_state to object_id
test-sha1-array: convert most code to struct object_id
parse-options-cb: convert sha1_array_append caller to struct object_id
fsck: convert init_skiplist to struct object_id
builtin/receive-pack: convert portions to struct object_id
builtin/pull: convert portions to struct object_id
builtin/diff: convert to struct object_id
Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
Define new hash-size constants for allocating memory
To generate a patch id, we format the diff header into a
fixed-size buffer, and then feed the result to our sha1
computation. The fixed buffer has size '4*PATH_MAX + 20',
which in theory accommodates the four filenames plus some
extra data. Except:
1. The filenames may not be constrained to PATH_MAX. The
static value may not be a real limit on the current
filesystem. Moreover, we may compute patch-ids for
names stored only in git, without touching the current
filesystem at all.
2. The 20 bytes is not nearly enough to cover the
extra content we put in the buffer.
As a result, the data we feed to the sha1 computation may be
truncated, and it's possible that a commit with a very long
filename could erroneously collide in the patch-id space
with another commit. For instance, if one commit modified
"really-long-filename/foo" and another modified "bar" in the
same directory.
In practice this is unlikely. Because the filenames are
repeated, and because there's a single cutoff at the end of
the buffer, the offending filename would have to be on the
order of four times larger than PATH_MAX.
We could fix this by moving to a strbuf. However, we can
observe that the purpose of formatting this in the first
place is to feed it to git_SHA1_Update(). So instead, let's
just feed each part of the formatted string directly. This
actually ends up more readable, and we can even factor out
some duplicated bits from the various conditional branches.
Technically this may change the output of patch-id for very
long filenames, but it's not worth making an exception for
this in the --stable output. It was a bug, and one that only
affected an unlikely set of paths. And anyway, the exact
value would have varied from platform to platform depending
on the value of PATH_MAX, so there is no "stable" value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we will likely be introducing a new hash function at some point,
and that hash function might be longer than 40 hex characters, use the
constant GIT_MAX_HEXSZ, which is designed to be suitable for
allocations, instead of GIT_SHA1_HEXSZ. This will ease the transition
down the line by distinguishing between places where we need to allocate
memory suitable for the largest hash from those where we need to handle
the current hash.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prefix_filename() function returns a pointer to static
storage, which makes it easy to use dangerously. We already
fixed one buggy caller in hash-object recently, and the
calls in apply.c are suspicious (I didn't dig in enough to
confirm that there is a bug, but we call the function once
in apply_all_patches() and then again indirectly from
parse_chunk()).
Let's make it harder to get wrong by allocating the return
value. For simplicity, we'll do this even when the prefix is
empty (and we could just return the original file pointer).
That will cause us to allocate sometimes when we wouldn't
otherwise need to, but this function isn't called in
performance critical code-paths (and it already _might_
allocate on any given call, so a caller that cares about
performance is questionable anyway).
The downside is that the callers need to remember to free()
the result to avoid leaking. Most of them already used
xstrdup() on the result, so we know they are OK. The
remainder have been converted to use free() as appropriate.
I considered retaining a prefix_filename_unsafe() for cases
where we know the static lifetime is OK (and handling the
cleanup is awkward). This is only a handful of cases,
though, and it's not worth the mental energy in worrying
about whether the "unsafe" variant is OK to use in any
situation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function takes the prefix as a ptr/len pair, but in
every caller the length is exactly strlen(ptr). Let's
simplify the interface and just take the string. This saves
callers specifying it (and in some cases handling a NULL
prefix).
In a handful of cases we had the length already without
calling strlen, so this is technically slower. But it's not
likely to matter (after all, if the prefix is non-empty
we'll allocate and copy it into a buffer anyway).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --quiet" relies on the size field in diff_filespec to be
correctly populated, but diff_populate_filespec() helper function
made an incorrect short-cut when asked only to populate the size
field for paths that need to go through convert_to_git() (e.g. CRLF
conversion).
* jc/diff-populate-filespec-size-only-fix:
diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()
Callers of diff_populate_filespec() can choose to ask only for the
size of the blob without grabbing the blob data, and the function,
after running lstat() when the filespec points at a working tree
file, returns by copying the value in size field of the stat
structure into the size field of the filespec when this is the case.
However, this short-cut cannot be taken if the contents from the
path needs to go through convert_to_git(), whose resulting real blob
data may be different from what is in the working tree file.
As "git diff --quiet" compares the .size fields of filespec
structures to skip content comparison, this bug manifests as a
false "there are differences" for a file that needs eol conversion,
for example.
Reported-by: Mike Crowe <mac@mcrowe.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log --graph" did not work well with "--name-only", even though
other forms of "diff" output were handled correctly.
* jk/log-graph-name-only:
diff: print line prefix for --name-only output
If you run "git log --graph --name-only", the pathnames are
not indented to go along with their matching commits (unlike
all of the other diff formats). We need to output the line
prefix for each item before writing it.
The tests cover both --name-status and --name-only. The
former actually gets this right already, because it builds
on the --raw format functions. It's only --name-only which
uses its own code (and this fix mirrors the code in
diff_flush_raw()).
Note that the tests don't follow our usual style of setting
up the "expect" output inside the test block. This matches
the surrounding style, but more importantly it is easier to
read: we don't have to worry about embedded single-quotes,
and the leading indentation is more obvious.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Patch generated by Coccinelle and contrib/coccinelle/object_id.cocci.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the macro SWAP to exchange the value of pairs of variables instead
of swapping them manually with the help of a temporary variable. The
resulting code is shorter and easier to read.
The two cases were not transformed by the semantic patch swap.cocci
because it's extra careful and handles only cases where the types of all
variables are the same -- and here we swap two ints and use an unsigned
temporary variable for that. Nevertheless the conversion is safe, as
the value range is preserved with and without the patch.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the semantic patch swap.cocci to convert hand-rolled swaps to use
the macro SWAP. The resulting code is shorter and easier to read, the
object code is effectively unchanged.
The patch for object.c had to be hand-edited in order to preserve the
comment before the change; Coccinelle tried to eat it for some reason.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --inter-hunk-context= option was added in commit 6d0e674a57
("diff: add option to show context between close hunks"). This patch
allows configuring a default for this option.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" and its family had two experimental heuristics to shift
the contents of a hunk to make the patch easier to read. One of
them turns out to be better than the other, so leave only the
"--indent-heuristic" option and remove the other one.
* jc/retire-compaction-heuristics:
diff: retire "compaction" heuristics
When a patch inserts a block of lines, whose last lines are the
same as the existing lines that appear before the inserted block,
"git diff" can choose any place between these existing lines as the
boundary between the pre-context and the added lines (adjusting the
end of the inserted block as appropriate) to come up with variants
of the same patch, and some variants are easier to read than others.
We have been trying to improve the choice of this boundary, and Git
2.11 shipped with an experimental "compaction-heuristic". Since
then another attempt to improve the logic further resulted in a new
"indent-heuristic" logic. It is agreed that the latter gives better
result overall, and the former outlived its usefulness.
Retire "compaction", and keep "indent" as an experimental feature.
The latter hopefully will be turned on by default in a future
release, but that should be done as a separate step.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two different places where the --no-abbrev option is parsed,
and two different places where SHA-1s are abbreviated. We normally parse
--no-abbrev with setup_revisions(), but in the no-index case, "git diff"
calls diff_opt_parse() directly, and diff_opt_parse() didn't handle
--no-abbrev until now. (It did handle --abbrev, however.) We normally
abbreviate SHA-1s with find_unique_abbrev(), but commit 4f03666 ("diff:
handle sha1 abbreviations outside of repository, 2016-10-20) recently
introduced a special case when you run "git diff" outside of a
repository.
setup_revisions() does also call diff_opt_parse(), but not for --abbrev
or --no-abbrev, which it handles itself. setup_revisions() sets
rev_info->abbrev, and later copies that to diff_options->abbrev. It
handles --no-abbrev by setting abbrev to zero. (This change doesn't
touch that.)
Setting abbrev to zero was broken in the outside-of-a-repository special
case, which until now resulted in a truly zero-length SHA-1, rather than
taking zero to mean do not abbreviate. The only way to trigger this bug,
however, was by running "git diff --raw" without either the --abbrev or
--no-abbrev options, because 1) without --raw it doesn't respect abbrev
(which is bizarre, but has been that way forever), 2) we silently clamp
--abbrev=0 to MINIMUM_ABBREV, and 3) --no-abbrev wasn't handled until
now.
The outside-of-a-repository case is one of three no-index cases. The
other two are when one of the files you're comparing is outside of the
repository you're in, and the --no-index option.
Signed-off-by: Jack Bates <jack@nottheoilrig.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The delta_limit parameter to diffcore_count_changes() has been unused
since commit ba23bbc8e ("diffcore-delta: make change counter to byte
oriented again.", 2006-03-04).
Remove the parameter and adjust all callers.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* rs/cocci:
use strbuf_add_unique_abbrev() for adding short hashes, part 3
remove unnecessary NULL check before free(3)
coccicheck: make transformation for strbuf_addf(sb, "...") more precise
use strbuf_add_unique_abbrev() for adding short hashes, part 2
use strbuf_addstr() instead of strbuf_addf() with "%s", part 2
gitignore: ignore output files of coccicheck make target
use strbuf_addstr() for adding constant strings to a strbuf, part 2
add coccicheck make target
contrib/coccinelle: fix semantic patch for oid_to_hex_r()
When new paths were added by "git add -N" to the index, it was
enough to circumvent the check by "git commit" to refrain from
making an empty commit without "--allow-empty". The same logic
prevented "git status" to show such a path as "new file" in the
"Changes not staged for commit" section.
* nd/ita-empty-commit:
commit: don't be fooled by ita entries when creating initial commit
commit: fix empty commit creation when there's no changes but ita entries
diff: add --ita-[in]visible-in-index
diff-lib: allow ita entries treated as "not yet exist in index"
Update "git diff --no-index" codepath not to try to peek into .git/
directory that happens to be under the current directory, when we
know we are operating outside any repository.
* jk/no-looking-at-dotgit-outside-repo:
diff: handle sha1 abbreviations outside of repository
diff_aligned_abbrev: use "struct oid"
diff_unique_abbrev: rename to diff_aligned_abbrev
find_unique_abbrev: use 4-buffer ring
test-*-cache-tree: setup git dir
read info/{attributes,exclude} only when in repository
Allow the default abbreviation length, which has historically been
7, to scale as the repository grows. The logic suggests to use 12
hexdigits for the Linux kernel, and 9 to 10 for Git itself.
* lt/abbrev-auto:
abbrev: auto size the default abbreviation
abbrev: prepare for new world order
abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
When generating diffs outside a repository (e.g., with "diff
--no-index"), we may write abbreviated sha1s as part of
"--raw" output or the "index" lines of "--patch" output.
Since we have no object database, we never find any
collisions, and these sha1s get whatever static abbreviation
length is configured (typically 7).
However, we do blindly look in ".git/objects" to see if any
objects exist, even though we know we are not in a
repository. This is usually harmless because such a
directory is unlikely to exist, but could be wrong in rare
circumstances.
Let's instead notice when we are not in a repository and
behave as if the object database is empty (i.e., just use
the default abbrev length). It would perhaps make sense to
be conservative and show full sha1s in that case, but
showing the default abbreviation is what we've always done
(and is certainly less ugly).
Note that this does mean that:
cd /not/a/repo
GIT_OBJECT_DIRECTORY=/some/real/objdir git diff --no-index ...
used to look for collisions in /some/real/objdir but now
does not. This could be considered either a bugfix (we do
not look at objects if we have no repository) or a
regression, but it seems unlikely that anybody would care
much either way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we're modifying this function anyway, it's a good time
to update it to the more modern "struct oid". We can also
drop some of the magic numbers in favor of GIT_SHA1_HEXSZ,
along with some descriptive comments.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word "align" describes how the function actually differs
from find_unique_abbrev, and will make it less confusing
when we add more diff-specific abbrevation functions that do
not do this alignment.
Since this is a globally available function, let's also move
its descriptive comment to the header file, where we
typically document function interfaces.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More i18n.
* va/i18n:
i18n: diff: mark warnings for translation
i18n: credential-cache--daemon: mark advice for translation
i18n: convert mark error messages for translation
i18n: apply: mark error message for translation
i18n: apply: mark error messages for translation
i18n: apply: mark info messages for translation
i18n: apply: mark plural string for translation
"git diff/log --ws-error-highlight=<kind>" lacked the corresponding
configuration variable to set it by default.
* jc/ws-error-highlight:
diff: introduce diff.wsErrorHighlight option
diff.c: move ws-error-highlight parsing helpers up
diff.c: refactor parse_ws_error_highlight()
t4015: split out the "setup" part of ws-error-highlight test
The option --ita-invisible-in-index exposes the "ita_invisible_in_index"
diff flag to outside to allow easier experimentation with this new mode.
The "plan" is to make --ita-invisible-in-index default to keep consistent
behavior with 'status' and 'commit', but a bunch other commands like
'apply', 'merge', 'reset'.... need to be taken into consideration as well.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark rename_limit_warning and degrade_cc_to_c_warning and
rename_limit_warning for translation.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We call "qsort(array, nelem, sizeof(array[0]), fn)", and most of
the time third parameter is redundant. A new QSORT() macro lets us
omit it.
* rs/qsort:
show-branch: use QSORT
use QSORT, part 2
coccicheck: use --all-includes by default
remove unnecessary check before QSORT
use QSORT
add QSORT
Code clean-up with help from coccinelle tool continues.
* rs/cocci:
coccicheck: make transformation for strbuf_addf(sb, "...") more precise
use strbuf_add_unique_abbrev() for adding short hashes, part 2
use strbuf_addstr() instead of strbuf_addf() with "%s", part 2
gitignore: ignore output files of coccicheck make target
With the preparatory steps, it has become trivial to teach the
system a new diff.wsErrorHighlight configuration that gives the
default value for --ws-error-highlight command line option.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These need to be usable from git_diff_ui_config() code to help
parsing a configuration variable, so move them up.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the function to parse_ws_error_highlight_opt(), because it is
meant to parse a command line option, and then refactor the meat of
the function into a helper function that reports the parsed result
which is typically a small unsigned int (these are OR'ed bitmask
after all), or a negative offset that indicates where in the input
string a parse error happened.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code that sets custom abbreviation length, in response to
command line argument, often does something like this:
if (skip_prefix(arg, "--abbrev=", &arg))
abbrev = atoi(arg);
else if (!strcmp("--abbrev", &arg))
abbrev = DEFAULT_ABBREV;
/* make the value sane */
if (abbrev < 0 || 40 < abbrev)
abbrev = ... some sane value ...
However, it is pointless to sanity-check and tweak the value
obtained from DEFAULT_ABBREV. We are going to allow it to be
initially set to -1 to signal that the default abbreviation length
must be auto sized upon the first request to abbreviate, based on
the number of objects in the repository, and when that happens,
rejecting or tweaking a negative value to a "saner" one will
negatively interfere with the auto sizing. The codepaths for
git rev-parse --short <object>
git diff --raw --abbrev
do exactly that; allow them to pass possibly negative abbrevs
intact, that will come from DEFAULT_ABBREV in the future.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is used to add "..." to displayed object names in
"diff --raw --abbrev[=<n>]" output. It bases its behaviour on an
untold assumption that the abbreviation length requested by the
caller is "reasonble", i.e. most of the objects will abbreviate
within the requested length and the resulting length would never
exceed it by more than a few hexdigits (otherwise the resulting
columns would not align). Explain that in a comment.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region. This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.
* js/regexec-buf:
regex: use regexec_buf()
regex: add regexec_buf() that can work on a non NUL-terminated string
regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT. The resulting code is
shorter and supports empty arrays with NULL pointers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
instead of taking detours through find_unique_abbrev() and its static
buffer. This is shorter and a bit more efficient.
1eb47f167d already converted six cases,
this patch covers three more.
A semantic patch for Coccinelle is included for easier checking for
new cases that might be introduced in the future.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region. This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.
* js/regexec-buf:
regex: use regexec_buf()
regex: add regexec_buf() that can work on a non NUL-terminated string
regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
Even more i18n.
* va/i18n-more:
i18n: stash: mark messages for translation
i18n: notes-merge: mark die messages for translation
i18n: ident: mark hint for translation
i18n: i18n: diff: mark die messages for translation
i18n: connect: mark die messages for translation
i18n: commit: mark message for translation
Output from "git diff" can be made easier to read by selecting
which lines are common and which lines are added/deleted
intelligently when the lines before and after the changed section
are the same. A command line option is added to help with the
experiment to find a good heuristics.
* mh/diff-indent-heuristic:
blame: honor the diff heuristic options and config
parse-options: add parse_opt_unknown_cb()
diff: improve positioning of add/delete blocks in diffs
xdl_change_compact(): introduce the concept of a change group
recs_match(): take two xrecord_t pointers as arguments
is_blank_line(): take a single xrecord_t as argument
xdl_change_compact(): only use heuristic if group can't be matched
xdl_change_compact(): fix compaction heuristic to adjust ixo
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.
We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).
Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3). By converting them to use regexec_buf(), the code has
become much cleaner.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While marking individual messages for translation, consolidate some
messages "option 'foo' requires a value" that is used for many
options into one by introducing a helper function to die with the
message with the option name embedded in it, and ask the translators
to localize that single message instead.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "unsigned char sha1[20]" to "struct object_id" conversion
continues. Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.
It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2"). Extra sets of eyes double-checking for
mismerges are highly appreciated.
* bc/object-id:
builtin/reset: convert to use struct object_id
builtin/commit-tree: convert to struct object_id
builtin/am: convert to struct object_id
refs: add an update_ref_oid function.
sha1_name: convert get_sha1_mb to struct object_id
builtin/update-index: convert file to struct object_id
notes: convert init_notes to use struct object_id
builtin/rm: convert to use struct object_id
builtin/blame: convert file to use struct object_id
Convert read_mmblob to take struct object_id.
notes-merge: convert struct notes_merge_pair to struct object_id
builtin/checkout: convert some static functions to struct object_id
streaming: make stream_blob_to_fd take struct object_id
builtin: convert textconv_object to use struct object_id
builtin/cat-file: convert some static functions to struct object_id
builtin/cat-file: convert struct expand_data to use struct object_id
builtin/log: convert some static functions to use struct object_id
builtin/blame: convert struct origin to use struct object_id
builtin/apply: convert static functions to struct object_id
cache: convert struct cache_entry to use struct object_id
Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".
Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.
It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
}
if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
$smtp_server = $_;
The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
$initial_reply_to =~ s/(^\s+|\s+$)//g;
}
+if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.
The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.
Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.
This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.
Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:
1. It measures the following characteristics of a proposed splitting
position in a `struct split_measurement`:
* the number of blank lines above the proposed split
* whether the line directly after the split is blank
* the number of blank lines following that line
* the indentation of the nearest non-blank line above the split
* the indentation of the line directly below the split
* the indentation of the nearest non-blank line after that line
2. It combines the measured attributes using a bunch of
empirically-optimized weighting factors to derive a `struct
split_score` that measures the "badness" of splitting the text at
that position.
3. It combines the `split_score` for the top and the bottom of the
slider at each of its possible positions, and selects the position
that has the best `split_score`.
I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:
| repository | count | Git 2.9.0 | compaction | compaction-fixed | indent-1 | indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking | 109 | 89 (81.7%) | 37 (33.9%) | 37 (33.9%) | 2 (1.8%) | 2 (1.8%) |
| alamofire | 30 | 18 (60.0%) | 14 (46.7%) | 15 (50.0%) | 0 (0.0%) | 0 (0.0%) |
| angular | 184 | 127 (69.0%) | 39 (21.2%) | 23 (12.5%) | 5 (2.7%) | 5 (2.7%) |
| animate | 313 | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) |
| ant | 380 | 356 (93.7%) | 152 (40.0%) | 148 (38.9%) | 15 (3.9%) | 15 (3.9%) | *
| bugzilla | 306 | 263 (85.9%) | 109 (35.6%) | 99 (32.4%) | 14 (4.6%) | 15 (4.9%) | *
| corefx | 126 | 91 (72.2%) | 22 (17.5%) | 21 (16.7%) | 6 (4.8%) | 6 (4.8%) |
| couchdb | 78 | 44 (56.4%) | 26 (33.3%) | 28 (35.9%) | 6 (7.7%) | 6 (7.7%) | *
| cpython | 937 | 158 (16.9%) | 50 (5.3%) | 49 (5.2%) | 5 (0.5%) | 5 (0.5%) | *
| discourse | 160 | 95 (59.4%) | 42 (26.2%) | 36 (22.5%) | 18 (11.2%) | 13 (8.1%) |
| docker | 307 | 194 (63.2%) | 198 (64.5%) | 253 (82.4%) | 8 (2.6%) | 8 (2.6%) | *
| electron | 163 | 132 (81.0%) | 38 (23.3%) | 39 (23.9%) | 6 (3.7%) | 6 (3.7%) |
| git | 536 | 470 (87.7%) | 73 (13.6%) | 78 (14.6%) | 16 (3.0%) | 16 (3.0%) | *
| gitflow | 127 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| ionic | 133 | 89 (66.9%) | 29 (21.8%) | 38 (28.6%) | 1 (0.8%) | 1 (0.8%) |
| ipython | 482 | 362 (75.1%) | 167 (34.6%) | 169 (35.1%) | 11 (2.3%) | 11 (2.3%) | *
| junit | 161 | 147 (91.3%) | 67 (41.6%) | 66 (41.0%) | 1 (0.6%) | 1 (0.6%) | *
| lighttable | 15 | 5 (33.3%) | 0 (0.0%) | 2 (13.3%) | 0 (0.0%) | 0 (0.0%) |
| magit | 88 | 75 (85.2%) | 11 (12.5%) | 9 (10.2%) | 1 (1.1%) | 0 (0.0%) |
| neural-style | 28 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| nodejs | 781 | 649 (83.1%) | 118 (15.1%) | 111 (14.2%) | 4 (0.5%) | 5 (0.6%) | *
| phpmyadmin | 491 | 481 (98.0%) | 75 (15.3%) | 48 (9.8%) | 2 (0.4%) | 2 (0.4%) | *
| react-native | 168 | 130 (77.4%) | 79 (47.0%) | 81 (48.2%) | 0 (0.0%) | 0 (0.0%) |
| rust | 171 | 128 (74.9%) | 30 (17.5%) | 27 (15.8%) | 16 (9.4%) | 14 (8.2%) |
| spark | 186 | 149 (80.1%) | 52 (28.0%) | 52 (28.0%) | 2 (1.1%) | 2 (1.1%) |
| tensorflow | 115 | 66 (57.4%) | 48 (41.7%) | 48 (41.7%) | 5 (4.3%) | 5 (4.3%) |
| test-more | 19 | 15 (78.9%) | 2 (10.5%) | 2 (10.5%) | 1 (5.3%) | 1 (5.3%) | *
| test-unit | 51 | 34 (66.7%) | 14 (27.5%) | 8 (15.7%) | 2 (3.9%) | 2 (3.9%) | *
| xmonad | 23 | 22 (95.7%) | 2 (8.7%) | 2 (8.7%) | 1 (4.3%) | 1 (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals | 6668 | 4391 (65.9%) | 1496 (22.4%) | 1491 (22.4%) | 150 (2.2%) | 144 (2.2%) |
| totals (training set) | 4552 | 3195 (70.2%) | 1053 (23.1%) | 1061 (23.3%) | 86 (1.9%) | 88 (1.9%) |
| totals (test set) | 2116 | 1196 (56.5%) | 443 (20.9%) | 430 (20.3%) | 64 (3.0%) | 56 (2.6%) |
In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are
* "repository" - the name of the repository used. I used the diffs
between successive non-merge commits on the HEAD branch of the
corresponding repository.
* "count" - the number of sliders that were human-rated. I chose most,
but not all, sliders to rate from those among which the various
algorithms gave different answers.
* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.
* "compaction" - the heuristic used by `git diff --compaction-heuristic`
in Git 2.9.0.
* "compaction-fixed" - the heuristic used by `git diff
--compaction-heuristic` after the fixes from earlier in this patch
series. Note that the results are not dramatically different than
those for "compaction". Both produce non-ideal diffs only about 1/3 as
often as the default `git diff`.
* "indent-1" - the new `--indent-heuristic` algorithm, using the first
set of weighting factors, determined as described above.
* "indent-2" - the new `--indent-heuristic` algorithm, using the final
set of weighting factors, determined as described below.
* `*` - indicates that repo was part of training set used to determine
the first set of weighting factors.
The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.
The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)
The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].
This patch adds a new command-line option `--indent-heuristic`, and a
new configuration setting `diff.indentHeuristic`, that activate this
heuristic. This interface is only meant for testing purposes, and should
be finalized before including this change in any release.
[1] https://github.com/mhagger/diff-slider-tools
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.
* jk/diff-submodule-diff-inline:
diff: teach diff to display submodule difference with an inline diff
submodule: refactor show_submodule_summary with helper function
submodule: convert show_submodule_summary to use struct object_id *
allow do_submodule_path to work even if submodule isn't checked out
diff: prepare for additional submodule formats
graph: add support for --line-prefix on all graph-aware output
diff.c: remove output_prefix_length field
cache: add empty_tree_oid object and helper function
When `len < 1`, len has to be 0 or negative, emit_line will then remove the
first character and by then `len` would be negative. As this doesn't
happen, it is safe to assume it is dead code.
This continues to simplify the code, which was started in b8d9c1a66b
(2009-09-03, diff.c: the builtin_diff() deals with only two-file
comparison).
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We keep the actual data in the diff options, which are just as accessible.
Remove the pointer stored in struct emit_callback for readability.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The value of `ecbdata->opt` is accessible via the short variable `o`
already, so let's use that instead.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git-diff and friends a new format for displaying the difference
of a submodule. The new format is an inline diff of the contents of the
submodule between the commit range of the update. This allows the user
to see the actual code change caused by a submodule update.
Add tests for the new format and option.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we're going to be changing this function in a future patch, lets
go ahead and convert this to use object_id now.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future patch will add a new format for displaying the difference of
a submodule. Make it easier by changing how we store the current
selected format. Replace the DIFF_OPT flag with an enumeration, as each
format will be mutually exclusive.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.
To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.
This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.
This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.
Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"diff/log --stat" has a logic that determines the display columns
available for the diffstat part of the output and apportions it for
pathnames and diffstat graph automatically.
5e71a84a (Add output_prefix_length to diff_options, 2012-04-16)
added the output_prefix_length field to diff_options structure to
allow this logic to subtract the display columns used for the
history graph part from the total "terminal width"; this matters
when the "git log --graph -p" option is in use.
The field must be set to the number of display columns needed to
show the output from the output_prefix() callback, which is error
prone. As there is only one user of the field, and the user has the
actual value of the prefix string, let's get rid of the field and
have the user count the display width itself.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.
* kw/patch-ids-optim:
rebase: avoid computing unnecessary patch IDs
patch-ids: add flag to create the diff patch id using header only data
patch-ids: replace the seen indicator with a commit pointer
patch-ids: stop using a hand-rolled hashmap implementation
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta. This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization. The optimization has been disabled when
the conversion is necessary.
* jk/diff-do-not-reuse-wtf-needs-cleaning:
diff: do not reuse worktree files that need "clean" conversion
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta. This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization. The optimization has been disabled when
the conversion is necessary.
* jk/diff-do-not-reuse-wtf-needs-cleaning:
diff: do not reuse worktree files that need "clean" conversion
This will allow a diff patch id to be created using only the header data
so that the contents of the file will not have to be loaded.
Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When accessing a blob for a diff, we may try to reuse file
contents in the working tree, under the theory that it is
faster to mmap those file contents than it would be to
extract the content from the object database.
When we have to filter those contents, though, that
assumption does not hold. Even for our internal conversions
like CRLF, we have to allocate and fill a new buffer anyway.
But much worse, for external clean filters we have to exec
an arbitrary script, and we have no idea how expensive it
may be to run.
So let's skip this optimization when conversion into git's
"clean" form is required. This applies whenever the
"want_file" flag is false. When it's true, the caller
actually wants the smudged worktree contents, which the
reused file by definition already has (in fact, this is a
key optimization going the other direction, since reusing
the worktree file there lets us skip smudge filters).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the callers of this function use struct object_id, so convert it
to use struct object_id in its arguments and internally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that this struct's sha1 member is called "oid", update the comment
and the sha1_valid member to be called "oid_valid" instead. The
following Coccinelle semantic patch was used to implement this, followed
by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1_valid
+ o.oid_valid
@@
struct diff_filespec *p;
@@
- p->sha1_valid
+ p->oid_valid
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct diff_filespec's sha1 member to use a struct object_id
called "oid" instead. The following Coccinelle semantic patch was used
to implement this, followed by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1
+ o.oid.hash
@@
struct diff_filespec *p;
@@
- p->sha1
+ p->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hashcpy with null_sha1 as the source is equivalent to hashclr. In
addition to being simpler, using hashclr may give the compiler a chance
to optimize better. Convert instances of hashcpy with the source
argument of null_sha1 to hashclr.
This transformation was implemented using the following semantic patch:
@@
expression E1;
@@
-hashcpy(E1, null_sha1);
+hashclr(E1);
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --output=<file> --color=auto" used to show the ANSI color
sequence in the resulting file when the standard output is connected
to a terminal, because --color=auto check always checks the standard
output, not the actual file that receives the output.
We could correct this by using freopen(3) to redirect the standard
output to the specified file, which is in like with how format-patch
used to match the world order, but following the same reasoning as
the earlier "format-patch: explicitly switch off color when writing
to files", let's be more strict by bypassing the "auto" check when
the --output=<file> option is in use.
Strictly speaking, this is a backwards-incompatible change, but
it is highly unlikely that any user would want to see ANSI color
sequences in a file.
The reason this was not caught earlier is most likely that either
--output=<file> is not used, or only when stdout is redirected
anyway.
Users can still give --color=always if they want a colored diff in
the resulting file.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It turns out that the earlier effort to update the heuristics may
want to use a bit more time to mature. Turn it off by default.
* jk/diff-compact-heuristic:
diff: disable compaction heuristic for now
http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net
reports that a change to add a new "function" with common ending
with the existing one at the end of the file is shown like this:
def foo
do_foo_stuff()
+ common_ending()
+end
+
+def bar
+ do_bar_stuff()
+
common_ending()
end
when the new heuristic is in use. In reality, the change is to add
the blank line before "def bar" and everything below, which is what
the code without the new heuristic shows.
Disable the heuristics by default, and resurrect the documentation
for the option and the configuration variables, while clearly
marking the feature as still experimental.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Patch output from "git diff" and friends has been tweaked to be
more readable by using a blank line as a strong hint that the
contents before and after it belong to a logically separate unit.
* jk/diff-compact-heuristic:
diff: undocument the compaction heuristic knobs for experimentation
xdiff: implement empty line chunk heuristic
xdiff: add recs_match helper function
In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:
...
/*
+ *
+ *
+ */
+
+/*
...
instead of the more readable equivalent of
...
+/*
+ *
+ *
+ */
+
/*
...
Implement the following heuristic to (optionally) produce the desired
output.
If there are diff chunks which can be shifted around, shift each hunk
such that the last common empty line is below the chunk with the rest
of the context above.
This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, it can be disabled via diff.compactionHeuristic.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The end-user facing Porcelain level commands like "diff" and "log"
now enables the rename detection by default.
* mm/diff-renames-default:
diff: activate diff.renames by default
log: introduce init_log_defaults()
t: add tests for diff.renames (true/false/unset)
t4001-diff-rename: wrap file creations in a test
Documentation/diff-config: fix description of diff.renames
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
The memory ownership rule of fill_textconv() API, which was a bit
tricky, has been documented a bit better.
* jk/more-comments-on-textconv:
diff: clarify textconv interface
Rename detection is a very convenient feature, and new users shouldn't
have to dig in the documentation to benefit from it.
Potential objections to activating rename detection are that it
sometimes fail, and it is sometimes slow. But rename detection is
already activated by default in several cases like "git status" and "git
merge", so activating diff.renames does not fundamentally change the
situation. When the rename detection fails, it now fails consistently
between "git diff" and "git status".
This setting does not affect plumbing commands, hence well-written
scripts will not be affected.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate 100 bytes to hold the "Submodule commit ..."
text. This is enough, but it's not immediately obvious that
this is the case, and we have to repeat the magic 100 twice.
We could get away with xstrfmt here, but we want to know the
size, as well, so let's use a real strbuf. And while we're
here, we can clean up the logic around size_only. It
currently sets and clears the "data" field pointlessly, and
leaves the "should_free" flag on even after we have cleared
the data.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The memory allocation scheme for the textconv interface is a
bit tricky, and not well documented. It was originally
designed as an internal part of diff.c (matching
fill_mmfile), but gradually was made public.
Refactoring it is difficult, but we can at least improve the
situation by documenting the intended flow and enforcing it
with an in-code assertion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
After switching to use the tempfile module in commit 284098f1
(diff: use tempfile module), no declarations from sigchain.h are used in
diff.c anymore. Thus, remove the #include.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
Before sha1_to_hex_r() existed, a simple way to get hex
sha1 into a buffer was with:
strcpy(buf, sha1_to_hex(sha1));
This isn't wrong (assuming the buf is 41 characters), but it
makes auditing the code base for bad strcpy() calls harder,
as these become false positives.
Let's convert them to sha1_to_hex_r(), and likewise for
some calls to find_unique_abbrev(). While we're here, we'll
double-check that all of the buffers are correctly sized,
and use the more obvious GIT_SHA1_HEXSZ constant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later. This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).
In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper). But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.
Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.
We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).
There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.
However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gitmodules API accessed from the C code learned to cache stuff
lazily.
* hv/submodule-config:
submodule: allow erroneous values for the fetchRecurseSubmodules option
submodule: use new config API for worktree configurations
submodule: extract functions for config set and lookup
submodule: implement a config API for lookup of .gitmodules values
The "lockfile" API has been rebuilt on top of a new "tempfile" API.
* mh/tempfile:
credential-cache--daemon: use tempfile module
credential-cache--daemon: delete socket from main()
gc: use tempfile module to handle gc.pid file
lock_repo_for_gc(): compute the path to "gc.pid" only once
diff: use tempfile module
setup_temporary_shallow(): use tempfile module
write_shared_index(): use tempfile module
register_tempfile(): new function to handle an existing temporary file
tempfile: add several functions for creating temporary files
prepare_tempfile_object(): new function, extracted from create_tempfile()
tempfile: a new module for handling temporary files
commit_lock_file(): use get_locked_file_path()
lockfile: add accessor get_lock_file_path()
lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
create_bundle(): duplicate file descriptor to avoid closing it twice
lockfile: move documentation to lockfile.h and lockfile.c
We remove the extracted functions and directly parse into and read out
of the cache. This allows us to have one unified way of accessing
submodule configuration values specific to single submodules. Regardless
whether we need to access a configuration from history or from the
worktree.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also add some code comments explaining how the fields in "struct
diff_tempfile" are used.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new configuration variable to enable "--follow" automatically
when "git log" is run with one pathspec argument.
* dt/log-follow-config:
log: add "log.follow" configuration variable
Check if a matched token is followed by a delimiter before advancing the
pointer arg. This avoids accepting composite words like "allnew" or
"defaultcontext" and misparsing them as "new" or "context".
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
People who work on projects with mostly linear history with frequent
whole file renames may want to always use "git log --follow" when
inspecting the life of the content that live in a single path.
Teach the command to behave as if "--follow" was given from the
command line when log.follow configuration variable is set *and*
there is one (and only one) path on the command line.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"color.diff.plain" was a misnomer; give it 'color.diff.context' as
a more logical synonym.
* jk/color-diff-plain-is-context:
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
diff: accept color.diff.context as a synonym for "plain"
"color.diff.plain" was a misnomer; give it 'color.diff.context' as
a more logical synonym.
* jk/color-diff-plain-is-context:
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
diff: accept color.diff.context as a synonym for "plain"
Allow whitespace breakages in deleted and context lines to be also
painted in the output.
* jc/diff-ws-error-highlight:
diff.c: --ws-error-highlight=<kind> option
diff.c: add emit_del_line() and emit_context_line()
t4015: separate common setup and per-test expectation
t4015: modernise style
The latter is a much more descriptive name (and we support
"color.diff.context" now). This also updates the name of any
local variables which were used to store the color.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The term "plain" is a bit ambiguous; let's allow the more
specific "context", but keep "plain" around for
compatibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, we only cared about whitespace breakages introduced
in new lines. Some people want to paint whitespace breakages on old
lines, too. When they see a whitespace breakage on a new line, they
can spot the same kind of whitespace breakage on the corresponding
old line and want to say "Ah, those breakages are there but they
were inherited from the original, so let's not touch them for now."
Introduce `--ws-error-highlight=<kind>` option, that lets them pass
a comma separated list of `old`, `new`, and `context` to specify
what lines to highlight whitespace errors on.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, we only had emit_add_line() helper, which knows how
to find and paint whitespace breakages on the given line, because we
only care about whitespace breakages introduced in new lines. The
context lines and old (i.e. deleted) lines are emitted with a
simpler emit_line_0() that paints the entire line in plain or old
colors.
Identify callers of emit_line_0() that show deleted lines and
context lines, have them call new helpers, emit_del_line() and
emit_context_line(), so that we can later tweak what is done to
these two classes of lines.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --shortstat --dirstat=changes" showed a dirstat based on
lines that was never asked by the end user in addition to the
dirstat that the user asked for.
* mk/diff-shortstat-dirstat-fix:
diff --shortstat --dirstat: remove duplicate output
"git diff --shortstat --dirstat=changes" showed a dirstat based on
lines that was never asked by the end user in addition to the
dirstat that the user asked for.
* mk/diff-shortstat-dirstat-fix:
diff --shortstat --dirstat: remove duplicate output
Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --shortstat is used in conjunction with --dirstat=changes, git diff will
output the dirstat information twice: first as calculated by the 'lines'
algorithm, then as calculated by the 'changes' algorithm:
$ git diff --dirstat=changes,10 --shortstat v2.2.0..v2.2.1
23 files changed, 453 insertions(+), 54 deletions(-)
33.5% Documentation/RelNotes/
26.2% t/
46.6% Documentation/RelNotes/
16.6% t/
The same duplication happens for --shortstat together with --dirstat=files, but
not for --shortstat together with --dirstat=lines.
Limit output to only include one dirstat part, calculated as specified
by the --dirstat parameter. Also, add test for this.
Signed-off-by: Mårten Kongstad <marten.kongstad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally the color-parsing function was used only for
config variables. It made sense to pass the variable name so
that the die() message could be something like:
$ git -c color.branch.plain=bogus branch
fatal: bad color value 'bogus' for variable 'color.branch.plain'
These days we call it in other contexts, and the resulting
error messages are a little confusing:
$ git log --pretty='%C(bogus)'
fatal: bad color value 'bogus' for variable '--pretty format'
$ git config --get-color foo.bar bogus
fatal: bad color value 'bogus' for variable 'command line'
This patch teaches color_parse to complain only about the
value, and then return an error code. Config callers can
then propagate that up to the config parser, which mentions
the variable name. Other callers can provide a custom
message. After this patch these three cases now look like:
$ git -c color.branch.plain=bogus branch
error: invalid color value: bogus
fatal: unable to parse 'color.branch.plain' from command-line config
$ git log --pretty='%C(bogus)'
error: invalid color value: bogus
fatal: unable to parse --pretty format
$ git config --get-color foo.bar bogus
error: invalid color value: bogus
fatal: unable to parse default color value
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach a few codepaths to punt (instead of dying) when large blobs
that would not fit in core are involved in the operation.
* nd/large-blobs:
diff: shortcut for diff'ing two binary SHA-1 objects
diff --stat: mark any file larger than core.bigfilethreshold binary
diff.c: allow to pass more flags to diff_populate_filespec
sha1_file.c: do not die failing to malloc in unpack_compressed_entry
wrapper.c: introduce gentle xmallocz that does not die()
Most struct child_process variables are cleared using memset first after
declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead. That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are given two SHA-1 and asked to determine if they are different
(but not _what_ differences), we know right away by comparing SHA-1.
A side effect of this patch is, because large files are marked binary,
diff-tree will not need to unpack them. 'diff-index --cached' will not
either. But 'diff-files' still does.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Too large files may lead to failure to allocate memory. If it happens
here, it could impact quite a few commands that involve
diff. Moreover, too large files are inefficient to compare anyway (and
most likely non-text), so mark them binary and skip looking at their
content.
Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid code duplication and let strbuf_addstr() call strlen() for us.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch" did not enforce the rule that the "--follow"
option from the log/diff family of commands must be used with
exactly one pathspec.
* jk/diff-follow-must-take-one-pathspec:
move "--follow needs one pathspec" rule to diff_setup_done
As in earlier commits, the diff option parser uses
starts_with to find that an argument starts with "--stat-",
and then adds strlen("stat-") to find the rest of the
option.
However, in this case the starts_with and the strlen are
separated across functions, making it easy to call the
latter without the former. Let's use skip_prefix instead of
raw pointer arithmetic to catch such a case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a common idiom to match a prefix and then skip past it
with strlen, like:
if (starts_with(foo, "bar"))
foo += strlen("bar");
This avoids magic numbers, but means we have to repeat the
string (and there is no compiler check that we didn't make a
typo in one of the strings).
We can use skip_prefix to handle this case without repeating
ourselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a common idiom to match a prefix and then skip past it
with a magic number, like:
if (starts_with(foo, "bar"))
foo += 3;
This is easy to get wrong, since you have to count the
prefix string yourself, and there's no compiler check if the
string changes. We can use skip_prefix to avoid the magic
numbers here.
Note that some of these conversions could be much shorter.
For example:
if (starts_with(arg, "--foo=")) {
bar = arg + 6;
continue;
}
could become:
if (skip_prefix(arg, "--foo=", &bar))
continue;
However, I have left it as:
if (skip_prefix(arg, "--foo=", &v)) {
bar = v;
continue;
}
to visually match nearby cases which need to actually
process the string. Like:
if (skip_prefix(arg, "--foo=", &v)) {
bar = atoi(v);
continue;
}
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function originally took a whole config variable name
("var") and an offset ("ofs"). It checked "var+ofs" against
each color slot, but reported errors using the whole "var".
However, since 8b8e862 (ignore unknown color configuration,
2009-12-12), it returns -1 rather than printing its own
error, and therefore only cares about var+ofs. We can drop
the ofs parameter and teach its sole caller to derive the
pointer itself.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up (and a bugfix which has been merged for 2.0).
* jk/external-diff-use-argv-array:
run_external_diff: refactor cmdline setup logic
run_external_diff: hoist common bits out of conditional
run_external_diff: drop fflush(NULL)
run_external_diff: clean up error handling
run_external_diff: use an argv_array for the environment
Instead of running N pair-wise diff-trees when inspecting a
N-parent merge, find the set of paths that were touched by walking
N+1 trees in parallel. These set of paths can then be turned into
N pair-wise diff-tree results to be processed through rename
detections and such. And N=2 case nicely degenerates to the usual
2-way diff-tree, which is very nice.
* ks/tree-diff-nway:
mingw: activate alloca
combine-diff: speed it up, by using multiparent diff tree-walker directly
tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Portable alloca for Git
tree-diff: reuse base str(buf) memory on sub-tree recursion
tree-diff: no need to call "full" diff_tree_sha1 from show_path()
tree-diff: rework diff_tree interface to be sha1 based
tree-diff: diff_tree() should now be static
tree-diff: remove special-case diff-emitting code for empty-tree cases
tree-diff: simplify tree_entry_pathcmp
tree-diff: show_path prototype is not needed anymore
tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
tree-diff: move all action-taking code out of compare_tree_entry()
tree-diff: don't assume compare_tree_entry() returns -1,0,1
tree-diff: consolidate code for emitting diffs and recursion in one place
tree-diff: show_tree() is not needed
tree-diff: no need to pass match to skip_uninteresting()
tree-diff: no need to manually verify that there is no mode change for a path
combine-diff: move changed-paths scanning logic into its own function
combine-diff: move show_log_first logic/action out of paths scanning
xcalloc() takes two arguments: the number of elements and their size.
diffstat_add() passes the arguments in reverse order, passing the
size of a diffstat_file*, followed by the number of diffstat_file* to
be allocated.
Rearrange them so they are in the correct order.
Signed-off-by: Brian Gesiak <modocache@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because of the way "--follow" is implemented, we must have
exactly one pathspec. "git log" enforces this restriction,
but other users of the revision traversal code do not. For
example, "git format-patch --follow" will segfault during
try_to_follow_renames, as we have no pathspecs at all.
We can push this check down into diff_setup_done, which is
probably a better place anyway. It is the diff code that
introduces this restriction, so other parts of the code
should not need to care themselves.
Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Crash fix for codepath that miscounted the necessary size for an
array when spawning an external diff program.
* 'jk/external-diff-use-argv-array' (early part):
run_external_diff: use an argv_array for the command line
The current logic makes it hard to see what gets put onto
the command line in which cases. Pulling out a helper
function lets us see that we have two sets of file data, and
the second set either uses the original name, or the "other"
renamed/copy name.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whether we have diff_filespecs to give to the diff command
or not, we always are going to run the program and pass it
the pathname. Let's pull that duplicated part out of the
conditional to make it more obvious.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fflush was added in d5535ec (Use run_command() to spawn
external diff programs instead of fork/exec., 2007-10-19),
because flushing buffers before forking is a good habit.
But later, 7d0b18a (Add output flushing before fork(),
2008-08-04) added it to the generic run-command interface,
meaning that our flush here is redundant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the external diff reports an error, we try to clean up
and die. However, we can make this process a bit simpler:
1. We do not need to bother freeing memory, since we are
about to exit. Nor do we need to clean up our
tempfiles, since the atexit() handler will do it for
us. So we can die as soon as we see the error.
3. We can just call die() rather than fprintf/exit. This
does technically change our exit code, but the exit
code of "1" is not meaningful here. In fact, it is
probably wrong, since "1" from diff usually means
"completed successfully, but there were differences".
And while we're there, we can mark the error message for
translation, and drop the full stop at the end to make it
more like our other messages.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently use static buffers and a static array for
formatting the environment passed to the external diff.
There's nothing wrong in the code, but it is much easier to
verify that it is correct if we use a dynamic argv_array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently generate the command-line for the external
command using a fixed-length array of size 10. But if there
is a rename, we actually need 11 elements (10 items, plus a
NULL), and end up writing a random NULL onto the stack.
Rather than bump the limit, let's just use an argv_array, which
makes this sort of error impossible.
Noticed-by: Max L <infthi.inbox@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we do not translate diffstat any more, remove the obsolete comments.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.
* jl/nor-or-nand-and:
code and test: fix misuses of "nor"
comments: fix misuses of "nor"
contrib: fix misuses of "nor"
Documentation: fix misuses of "nor"
As was recently shown in "combine-diff: optimize
combine_diff_path sets intersection", combine-diff runs very slowly. In
that commit we optimized paths sets intersection, but that accounted
only for ~ 25% of the slowness, and as my tracing showed, for linux.git
v3.10..v3.11, for merges a lot of time is spent computing
diff(commit,commit^2) just to only then intersect that huge diff to
almost small set of files from diff(commit,commit^1).
In previous commit, we described the problem in more details, and
reworked the diff tree-walker to be general one - i.e. to work in
multiple parent case too. Now is the time to take advantage of it for
finding paths for combine diff.
The implementation is straightforward - if we know, we can get generated
diff paths directly, and at present that means no diff filtering or
rename/copy detection was requested(*), we can call multiparent tree-walker
directly and get ready paths.
(*) because e.g. at present, all diffcore transformations work on
diff_filepair queues, but in the future, that limitation can be
lifted, if filters would operate directly on combine_diff_paths.
Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log")
and with `-c` ("git log -c") and with `-c --merges` ("git log -c --merges")
before and after the patch are as follows:
linux.git v3.10..v3.11
log log -c log -c --merges
before 1.9s 16.4s 15.2s
after 1.9s 2.4s 1.1s
The result stayed the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.
In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.
For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .
There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .
That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.
That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).
That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:
D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths)
where
D(A,P1...Pn) is combined diff between commit A, and parents Pi
and
D(A,Pi) is usual two-tree diff Pi..A
So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.
And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.
The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.
Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:
D(T,P1...Pn) calculation scheme
-------------------------------
D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set)
D(T,Pj) - diff between T..Pj
D(T,P1...Pn) - combined diff from T to parents P1,...,Pn
We start from all trees, which are sorted, and compare their entries in
lock-step:
T P1 Pn
- - -
|t| |p1| |pn|
|-| |--| ... |--| imin = argmin(p1...pn)
| | | | | |
|-| |--| |--|
|.| |. | |. |
. . .
. . .
at any time there could be 3 cases:
1) t < p[imin];
2) t > p[imin];
3) t = p[imin].
Schematic deduction of what every case means, and what to do, follows:
1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓
2) t > p[imin]
2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓
2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓
3) t = p[imin]
3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate
3.2) pi = p[imin] -> investigate δ(t,pi)
|
|
v
3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø ->
⎧δ(t,pi) - if pi=p[imin]
-> D += ⎨
⎩"+t" - if pi>p[imin]
in any case t↓ ∀ pi=p[imin] pi↓
~
For comparison, here is how diff_tree() works:
D(A,B) calculation scheme
-------------------------
A B
- -
|a| |b| a < b -> a ∉ B -> D(A,B) += +a a↓
|-| |-| a > b -> b ∉ A -> D(A,B) += -b b↓
| | | | a = b -> investigate δ(a,b) a↓ b↓
|-| |-|
|.| |.|
. .
. .
~~~~~~~~
This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff
D(A,B)
is by definition the same as combined diff
D(A,[B]),
so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.
What we do is as follows:
1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
a paths generator (new name diff_tree_paths()), with each generated path
being `struct combine_diff_path` with info for path, new sha1,mode and for
every parent which sha1,mode it was in it.
2) From that info, we can still generate usual diff queue with
struct diff_filepairs, via "exporting" generated
combine_diff_path, if we know we run for nparent=1 case.
(see emit_diff() which is now named emit_diff_first_parent_only())
3) In order for diff_can_quit_early(), which checks
DIFF_OPT_TST(opt, HAS_CHANGES))
to work, that exporting have to be happening not in bulk, but
incrementally, one diff path at a time.
For such consumers, there is a new callback in diff_options
introduced:
->pathchange(opt, struct combine_diff_path *)
which, if set to !NULL, is called for every generated path.
(see new compat ll_diff_tree_sha1() wrapper around new paths
generator for setup)
4) The paths generation itself, is reworked from previous
ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
scheme" provided above:
On the start we allocate [nparent] arrays in place what was
earlier just for one parent tree.
then we just generalize loops, and comparison according to the
algorithm.
Some notes(*):
1) alloca(), for small arrays, is used for "runs not slower for
nparent=1 case than before" goal - if we change it to xmalloc()/free()
the timings get ~1% worse. For alloca() we use just-introduced
xalloca/xalloca_free compatibility wrappers, so it should not be a
portability problem.
2) For every parent tree, we need to keep a tag, whether entry from that
parent equals to entry from minimal parent. For performance reasons I'm
keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
Not doing so, we'd need to alloca another [nparent] array, which hurts
performance.
3) For emitted paths, memory could be reused, if we know the path was
processed via callback and will not be needed later. We use efficient
hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
of potential additional slowdown.
4) goto(s) are used in several places, as the code executes a little bit
faster with lowered register pressure.
Also
- we should now check for FIND_COPIES_HARDER not only when two entries
names are the same, and their hashes are equal, but also for a case,
when a path was removed from some of all parents having it.
The reason is, if we don't, that path won't be emitted at all (see
"a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
all paths - with diff or without - to be emitted, to be later analyzed
for being copies sources.
The new check is only necessary for nparent >1, as for nparent=1 case
xmin_eqtotal always =1 =nparent, and a path is always added to diff as
removal.
~~~~~~~~
Timings for
# without -c, i.e. testing only nparent=1 case
`git log --raw --no-abbrev --no-renames`
before and after the patch are as follows:
navy.git linux.git v3.10..v3.11
before 0.611s 1.889s
after 0.619s 1.907s
slowdown 1.3% 0.9%
This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.
The output also stayed the same.
(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
"do no harm for nparent=1 case" rule.
For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.
P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with
`git log --reverse --raw --no-abbrev --no-renames -c`
to extract log of what was created/changed when, as a result building a
map
{} sha1 -> in which commit (and date) a content was added
that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.
That case was my initial motivation for combined diffs speedup.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --external-diff" incorrectly fed the submodule directory
in the working tree to the external diff driver when it knew it is
the same as one of the versions being compared.
* tr/diff-submodule-no-reuse-worktree:
diff: do not reuse_worktree_file for submodules
"git diff --quiet -- pathspec1 pathspec2" sometimes did not return
correct status value.
* nd/diff-quiet-stat-dirty:
diff: do not quit early on stat-dirty files
diff.c: move diffcore_skip_stat_unmatch core logic out for reuse later
Replace open-coded reallocation with ALLOC_GROW() macro.
* dd/use-alloc-grow:
sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
read-cache.c: use ALLOC_GROW() in add_index_entry()
builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
attr.c: use ALLOC_GROW() in handle_attr_line()
dir.c: use ALLOC_GROW() in create_simplify()
reflog-walk.c: use ALLOC_GROW()
replace_object.c: use ALLOC_GROW() in register_replace_object()
patch-ids.c: use ALLOC_GROW() in add_commit()
diffcore-rename.c: use ALLOC_GROW()
diff.c: use ALLOC_GROW()
commit.c: use ALLOC_GROW() in register_commit_graft()
cache-tree.c: use ALLOC_GROW() in find_subtree()
bundle.c: use ALLOC_GROW() in add_to_ref_list()
builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
"git diff --external-diff" incorrectly fed the submodule directory
in the working tree to the external diff driver when it knew it is
the same as one of the versions being compared.
* tr/diff-submodule-no-reuse-worktree:
diff: do not reuse_worktree_file for submodules
Avoid scanning strings twice, once with strchr() and then with
strlen(), by using strchrnul().
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Mani <rohit.mani@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use ALLOC_GROW() instead of open-coding it in diffstat_add() and
diff_q().
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --quiet -- pathspec1 pathspec2" sometimes did not return
correct status value.
* nd/diff-quiet-stat-dirty:
diff: do not quit early on stat-dirty files
diff.c: move diffcore_skip_stat_unmatch core logic out for reuse later
When QUICK is set (i.e. with --quiet) we try to do as little work as
possible, stopping after seeing the first change. stat-dirty is
considered a "change" but it may turn out not, if no actual content is
changed. The actual content test is performed too late in the process
and the shortcut may be taken prematurely, leading to incorrect return
code.
Assume we do "git diff --quiet". If we have a stat-dirty file "a" and
a really dirty file "b". We break the loop in run_diff_files() and
stop after "a" because we have got a "change". Later in
diffcore_skip_stat_unmatch() we find out "a" is actually not
changed. But there's nothing else in the diff queue, we incorrectly
declare "no change", ignoring the fact that "b" is changed.
This also happens to "git diff --quiet HEAD" when it hits
diff_can_quit_early() in oneway_diff().
This patch does the content test earlier in order to keep going if "a"
is unchanged. The test result is cached so that when
diffcore_skip_stat_unmatch() is done in the end, we spend no cycles on
re-testing "a".
Reported-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GIT_EXTERNAL_DIFF calling code attempts to reuse existing worktree
files for the worktree side of diffs, for performance reasons.
However, that code also tries to do the same with submodules. This
results in calls to $GIT_EXTERNAL_DIFF where the old-file is a file of
the form "Submodule commit $sha1", but the new-file is a directory in
the worktree.
Fix it by never reusing a worktree "file" in the submodule case.
Reported-by: Grégory Pakosz <gregory.pakosz@gmail.com>
Signed-off-by: Thomas Rast <tr@thomasrast.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/diff-filespec-cleanup:
diff_filespec: use only 2 bits for is_binary flag
diff_filespec: reorder is_binary field
diff_filespec: drop xfrm_flags field
diff_filespec: drop funcname_pattern_ident field
diff_filespec: reorder dirty_submodule macro definitions
The only mention of this field in the code is by some
debugging code which prints it out (and it will always be
zero, since we never touch it otherwise). It was obsoleted
very early on by 25d5ea4 ([PATCH] Redo rename/copy detection
logic., 2005-05-24).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow "git diff -O<file>" to be configured with a new configuration
variable.
* sb/diff-orderfile-config:
diff: add diff.orderfile configuration variable
diff: let "git diff -O" read orderfile from any file and fail properly
t4056: add new tests for "git diff -O"
Show the total number of paths and the number of paths shown so far
when "git difftool" prompts to launch an external diff tool, which
would give users some sense of progress.
* zk/difftool-counts:
diff.c: fix some recent whitespace style violations
difftool: display the number of files in the diff queue in the prompt
diff.orderfile acts as a default for the -O command line option.
[sb: split up aw's original patch; rework tests and docs, treat option
as pathname]
Signed-off-by: Anders Waldenborg <anders@0x63.nu>
Signed-off-by: Samuel Bronson <naesten@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.
* cc/starts-n-ends-with:
replace {pre,suf}fixcmp() with {starts,ends}_with()
strbuf: introduce starts_with() and ends_with()
builtin/remote: remove postfixcmp() and use suffixcmp() instead
environment: normalize use of prefixcmp() by removing " != 0"
When --prompt option is set, git-difftool displays a prompt for each
modified file to be viewed in an external diff program. At that
point, it could be useful to display a counter and the total number
of files in the diff queue.
Below is the current difftool prompt for the first of 5 modified files:
Viewing: 'diff.c'
Launch 'vimdiff' [Y/n]:
Consider the modified prompt:
Viewing (1/5): 'diff.c'
Launch 'vimdiff' [Y/n]:
The current GIT_EXTERNAL_DIFF mechanism does not tell the number of
paths in the diff queue nor the current counter. To make this
"counter/total" info available for GIT_EXTERNAL_DIFF programs
without breaking existing ones by doing the following:
- Keep track of the number of paths shown so far in diff_options;
- Export two new environment variables from run_external_diff() to
show the total number of paths (from diff_queue_struct) and the
current value of the counter (from diff_options); and
- Update git-difftool--helper to use these two environment variables.
Signed-off-by: Zoltan Klinger <zoltan.klinger@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make "git grep" and "git show" pay attention to --textconv when
dealing with blob objects.
* mg/more-textconv:
grep: honor --textconv for the case rev:path
grep: allow to use textconv filters
t7008: demonstrate behavior of grep with textconv
cat-file: do not die on --textconv without textconv filters
show: honor --textconv for blobs
diff_opt: track whether flags have been set explicitly
t4030: demonstrate behavior of show with textconv
Teach "git diff --diff-filter" to express "I do not want to see
these classes of changes" more directly by listing only the
unwanted ones in lowercase (e.g. "--diff-filter=d" will show
everything but deletion) and deprecate "diff-files -q" which did
the same thing as "--diff-filter=d".
* jc/diff-filter-negation:
diff: deprecate -q option to diff-files
diff: allow lowercase letter to specify what change class to exclude
diff: reject unknown change class given to --diff-filter
diff: preparse --diff-filter string argument
diff: factor out match_filter()
diff: pass the whole diff_options to diffcore_apply_filter()
The condition in the ternary operator was wrong, hence the wrong char
pointer could be used as the parameter for show_submodule_summary.
one->path may be null, but we definitely need a non null path given
to the function.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Acked-By: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The line being changed is deep inside the function builtin_diff.
The variable name_b, which is used to evaluate the ternary expression
must evaluate to true at that position, hence the replacement with
just name_b.
The name_b variable only occurs a few times in that lengthy function:
As a parameter to the function itself:
static void builtin_diff(const char *name_a,
const char *name_b,
...
The next occurrences are at:
/* Never use a non-valid filename anywhere if at all possible */
name_a = DIFF_FILE_VALID(one) ? name_a : name_b;
name_b = DIFF_FILE_VALID(two) ? name_b : name_a;
a_one = quote_two(a_prefix, name_a + (*name_a == '/'));
b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
In the last line of this block 'name_b' is dereferenced and compared
to '/'. This would crash if name_b was NULL. Hence in the following code
we can assume name_b being non-null.
The next occurrence is just as a function argument, which doesn't change
the memory, which name_b points to, so the assumption name_b being not
null still holds:
emit_rewrite_diff(name_a, name_b, one, two,
textconv_one, textconv_two, o);
The next occurrence would be the line of this patch. As name_b still must
be not null, we can remove the ternary operator.
Inside the emit_rewrite_diff function there is a also a line
ecbdata.ws_rule = whitespace_rule(name_b ? name_b : name_a);
which was also simplified as there is also a dereference before the
ternary operator.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Assorted code cleanups and a minor fix.
* sb/misc-fixes:
diff.c: Do not initialize a variable, which gets reassigned anyway.
commit: Fix a memory leak in determine_author_info
daemon.c:handle: Remove unneeded check for null pointer.
"git show -s" was less discoverable than it should be.
* mm/diff-no-patch-synonym-to-s:
Documentation/git-log.txt: capitalize section names
Documentation: move description of -s, --no-patch to diff-options.txt
Documentation/git-show.txt: include common diff options, like git-log.txt
diff: allow --patch & cie to override -s/--no-patch
diff: allow --no-patch as synonym for -s
t4000-diff-format.sh: modernize style
This was inherited from "show-diff -q" that was invented to tell
comparison between the index and the working tree to ignore only
removals in 2005.
These days, it is spelled as "--diff-filter=d".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reimplements the ancient "-q" option to "git diff-files" that
was inherited from "show-diff -q" in terms of "--diff-filter=d". We
will be deprecating the "-q" option, so let's issue a warning when
we do so.
Incidentally this also tentatively fixes "git diff --no-index" to
honor "-q" and hide deletions; the use will get the same warning.
We should remove the support for "-q" in a future version but it is
not that urgent.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All options that trigger a patch output now override --no-patch.
The case of --binary deserves extra attention: the name may suggest that
it turns a normal patch into a binary patch, but it actually already
enables patch output when normally disabled (e.g. "git log --binary"
displays a patch), hence it makes sense for "git show --no-patch
--binary" to display the binary patch.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This follows the usual convention of having a --no-foo option to negate
--foo.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to express "we do not care about deletions", we had to say
"--diff-filter=ACMRTXUB", giving all the possible change class
except for the one we do not want, "D".
This is cumbersome. As all the change classes are in uppercase,
allow their lowercase counterpart to selectively exclude the class
from the output. When such a negated change class is in the input,
start the filter option with the full bits set.
This would allow us to express the old "show-diff -q" with
"git diff-files --diff-filter=d".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to accept "git diff --diff-filter=Q" (note that there is no
such change class 'Q') silently and showed no output (because there
is no such change class 'Q').
Error out when such an input is given.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of running strchr() on the list of status characters over
and over again, parse the --diff-filter option into bitfields and
use the bits to see if the change to the filepair matches the status
requested.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diffcore_apply_filter() checks if a filepair matches the filter
given with the "--diff-filter" option for each input filepairs with
a fairly complex expression in two places.
Create a helper function and call it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --diff-filter=<arg> option given by the user is kept as a
string, and passed to the underlying diffcore_apply_filter()
function as a string for each resulting path we run number of
strchr() to see if each class of change among ACDMRTXUB is meant to
be given.
Change the function signature to pass the whole diff_options, so
that we can pre-parse this string in the next patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" refused to even show difference when core.safecrlf is
set to true (i.e. error out) and there are offending lines in the
working tree files.
* jc/maint-diff-core-safecrlf:
diff: demote core.safecrlf=true to core.safecrlf=warn
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Otherwise the user will not be able to start to guess where in the
contents in the working tree the offending unsafe CR lies.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The goal of the patch is to introduce the GNU diff
-B/--ignore-blank-lines as closely as possible. The short option is not
available because it's already used for "break-rewrites".
When this option is used, git-diff will not create hunks that simply
add or remove empty lines, but will still show empty lines
addition/suppression if they are close enough to "valuable" changes.
There are two differences between this option and GNU diff -B option:
- GNU diff doesn't have "--inter-hunk-context", so this must be handled
- The following sequence looks like a bug (context is displayed twice):
$ seq 5 >file1
$ cat <<EOF >file2
change
1
2
3
4
5
change
EOF
$ diff -u -B file1 file2
--- file1 2013-06-08 22:13:04.471517834 +0200
+++ file2 2013-06-08 22:13:23.275517855 +0200
@@ -1,5 +1,7 @@
+change
1
2
+
3
4
5
@@ -3,3 +5,4 @@
3
4
5
+change
So here is a more thorough description of the option:
- real changes are interesting
- blank lines that are close enough (less than context size) to
interesting changes are considered interesting (recursive definition)
- "context" lines are used around each hunk of interesting changes
- If two hunks are separated by less than "inter-hunk-context", they
will be merged into one.
The implementation does the "interesting changes selection" in a single
pass.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff_opt infrastructure sets flags based on defaults and command
line options. It is impossible to tell whether a flag has been set
as a default or on explicit request. Update the structure so that
this detection is possible:
* Add an extra "opt->touched_flags" that keeps track of all the
fields that have been touched by DIFF_OPT_SET and DIFF_OPT_CLR.
* You may continue setting the default values to the flags, like
commands in the "log" family do in cmd_log_init_defaults(), but
after you finished setting the defaults, you clear the
touched_flags field;
* And then you let the usual callchain call diff_opt_parse(),
allowing the opt->flags be set or unset, while keeping track of
which bits the user touched;
* There is an optional callback "opt->set_default" that is called
at the very beginning to let you inspect touched_flags and update
opt->flags appropriately, before the remainder of the diffcore
machinery is set up, taking the opt->flags value into account.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --diff-algorithm=algo" was understood by the command line
parser, but "git diff --diff-algorithm algo" was not.
* jk/diff-algo-finishing-touches:
diff: allow unstuck arguments with --diff-algorithm
git-merge(1): document diff-algorithm option to merge-recursive
"git diff --diff-algorithm algo" is also understood as "git diff
--diff-algorithm=algo".
* jk/diff-algo-finishing-touches:
diff: allow unstuck arguments with --diff-algorithm
git-merge(1): document diff-algorithm option to merge-recursive
Most of these were found using Lucas De Marchi's codespell tool.
Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "git log -p --submodule=log", the submodule log is not
indented by the graph output, although all other lines are. Fix this by
prepending the current line prefix to each line of the submodule log.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The argument to --diff-algorithm is mandatory, so there is no reason to
require the argument to be stuck to the option with '='. Change this
for consistency with other Git commands.
Note that this does not change the handling of diff-algorithm in
merge-recursive.c since the primary interface to that is via the -X
option to 'git merge' where the unstuck form does not make sense.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ap/maint-diff-rename-avoid-overlap:
tests: make sure rename pretty print works
diff: prevent pprint_rename from underrunning input
diff: Fix rename pretty-print when suffix and prefix overlap
The logic used by "git diff -M --stat" to shorten the names of
files before and after a rename did not work correctly when the
common prefix and suffix between the two filenames overlapped.
* ap/maint-diff-rename-avoid-overlap:
tests: make sure rename pretty print works
diff: prevent pprint_rename from underrunning input
diff: Fix rename pretty-print when suffix and prefix overlap
In the warning message printed when rename or unmodified copy
detection was skipped due to too many files, change "diff.renamelimit"
to "diff.renameLimit", in order to make it consistent with git
documentation, which consistently uses "diff.renameLimit".
Signed-off-by: Max Nanasy <max.nanasy@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic described in d020e27 (diff: Fix rename pretty-print when
suffix and prefix overlap, 2013-02-23) is wrong: The proof in the
comment is valid only if both strings are the same length. *One* of
old/new can reach a-1 (b-1, resp.) if 'a' is a suffix of 'b' (or vice
versa).
Since the intent was to let the loop run down to the '/' at the end of
the common prefix, fix it by making that distinction explicit: if
there is no prefix, allow no underrun.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When considering a rename for two files that have a suffix and a prefix
that can overlap, a confusing line is shown. As an example, renaming
"a/b/b/c" to "a/b/c" shows "a/b/{ => }/b/c".
Currently, what we do is calculate the common prefix ("a/b/"), and the
common suffix ("/b/c"), but the same "/b/" is actually counted both in
prefix and suffix. Then when calculating the size of the non-common part,
we end-up with a negative value which is reset to 0, thus the "{ => }".
Do not allow the common suffix to overlap the common prefix and stop
when reaching a "/" that would be in both.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add diff.algorithm configuration so that the user does not type
"diff --histogram".
* mp/diff-algo-config:
diff: Introduce --diff-algorithm command line option
config: Introduce diff.algorithm variable
git-completion.bash: Autocomplete --minimal and --histogram for git-diff
Refactors a lot of repetitive code sequence from the graph drawing
code and adds it to the combined diff output.
* jk/diff-graph-cleanup:
combine-diff.c: teach combined diffs about line prefix
diff.c: use diff_line_prefix() where applicable
diff: add diff_line_prefix function
diff.c: make constant string arguments const
diff: write prefix to the correct file
graph: output padding for merge subsequent parents
This is a helper function to call the diff output_prefix function and
return its value as a C string, allowing us to greatly simplify
everywhere that needs to get the output prefix.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Write the prefix for an output line to the same file as the actual
content.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since command line options have higher priority than config file
variables and taking previous commit into account, we need a way
how to specify myers algorithm on command line. However,
inventing `--myers` is not the right answer. We need far more
general option, and that is `--diff-algorithm`.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some users or projects prefer different algorithms over others, e.g.
patience over myers or similar. However, specifying appropriate
argument every time diff is to be used is impractical. Moreover,
creating an alias doesn't play nicely with other tools based on diff
(git-show for instance). Hence, a configuration variable which is able
to set specific algorithm is needed. For now, these four values are
accepted: 'myers' (which has the same effect as not setting the config
variable at all), 'minimal', 'patience' and 'histogram'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --stat" miscounted the total number of changed lines when
binary files were involved and hidden beyond --stat-count. It also
miscounted the total number of changed files when there were
unmerged paths.
* lt/diff-stat-show-0-lines:
t4049: refocus tests
diff --shortstat: do not count "unmerged" entries
diff --stat: do not count "unmerged" entries
diff --stat: move the "total count" logic to the last loop
diff --stat: use "file" temporary variable to refer to data->files[i]
diff --stat: status of unmodified pair in diff-q is not zero
test: add failing tests for "diff --stat" to t4049
Even though we show a separate *UNMERGED* entry in the patch and
diffstat output (or in the --raw format, for that matter) in
addition to and separately from the diff against the specified stage
(defaulting to #2) for unmerged paths, they should not be counted in
the total number of files affected---that would lead to counting the
same path twice.
The separation done by the previous step makes this fix simple and
straightforward. Among the filepairs in diff_queue, paths that
weren't modified, and the extra "unmerged" entries do not count as
total number of files.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffstat generation logic, with --stat-count limit, is
implemented as three loops.
- The first counts the width necessary to show stats up to
specified number of entries, and notes up to how many entries in
the data we need to iterate to show the graph;
- The second iterates that many times to draw the graph, adjusts
the number of "total modified files", and counts the total
added/deleted lines for the part that was shown in the graph;
- The third iterates over the remainder and only does the part to
count "total added/deleted lines" and to adjust "total modified
files" without drawing anything.
Move the logic to count added/deleted lines and modified files from
the second loop to the third loop.
This incidentally fixes a bug. The third loop was not filtering
binary changes (counted in bytes) from the total added/deleted as it
should. The second loop implemented this correctly, so if a binary
change appeared earlier than the --stat-count cutoff, the code
counted number of added/deleted lines correctly, but if it appeared
beyond the cutoff, the number of lines would have mixed with the
byte count in the buggy third loop.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow "git diff --submodule=log" to set to be the default via
configuration.
* rr/submodule-diff-config:
submodule: display summary header in bold
diff: rename "set" variable
diff: introduce diff.submodule configuration variable
Documentation: move diff.wordRegex from config.txt to diff-config.txt
We failed to mention a file without any content change but whose
permission bit was modified, or (worse yet) a new file without any
content in the "git diff --stat" output.
* lt/diff-stat-show-0-lines:
Fix "git diff --stat" for interesting - but empty - file changes
Currently, 'git diff --submodule' displays output with a bold diff
header for non-submodules. So this part is in bold:
diff --git a/file1 b/file1
index 30b2f6c..2638038 100644
--- a/file1
+++ b/file1
For submodules, the header looks like this:
Submodule submodule1 012b072..248d0fd:
Unfortunately, it's easy to miss in the output because it's not bold.
Change this.
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once upon a time the builtin_diff function used one color, and the color
variables were called "set" and "reset". Nowadays it is a much longer
function and we use several colors (e.g., "add", "del"). Rename "set" to
"meta" to show that it is the color for showing diff meta-info (it still
does not indicate that it is a "color", but at least it matches the
scheme of the other color variables).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a diff.submodule configuration variable corresponding to the
'--submodule' command-line option of 'git diff'.
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanups so that libgit.a does not depend on anything in the
builtin/ directory.
* nd/builtin-to-libgit:
fetch-pack: move core code to libgit.a
fetch-pack: remove global (static) configuration variable "args"
send-pack: move core code to libgit.a
Move setup_diff_pager to libgit.a
Move print_commit_list to libgit.a
Move estimate_bisect_steps to libgit.a
Move try_merge_command and checkout_fast_forward to libgit.a
This is used by diff-no-index.c, part of libgit.a while it stays in
builtin/diff.c. Move it to diff.c so that we won't get undefined
reference if a program that uses libgit.a happens to pull it in.
While at it, move check_pager from git.c to pager.c. It makes more
sense there and pager.c is also part of libgit.a
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Use string_list_split_in_place() to split the comma-separated
parameters string. This simplifies the code and also fixes a bug: the
old code made calls like
memcmp(p, "lines", p_len)
which needn't work if p_len is different than the length of the
constant string (and could illegally access memory if p_len is larger
than the length of the constant string).
When p_len was less than the length of the constant string, the old
code would have allowed some abbreviations to be accepted (e.g., "cha"
for "changes") but this seems to have been a bug rather than a
feature, because (1) it is not documented; (2) no attempt was made to
handle ambiguous abbreviations, like "c" for "changes" vs
"cumulative".
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
The behavior of "git diff --stat" is rather odd for files that have
zero lines of changes: it will discount them entirely unless they were
renames.
Which means that the stat output will simply not show files that only
had "other" changes: they were created or deleted, or their mode was
changed.
Now, those changes do show up in the summary, but so do renames, so
the diffstat logic is inconsistent. Why does it show renames with zero
lines changed, but not mode changes or added files with zero lines
changed?
So change the logic to not check for "is_renamed", but for
"is_interesting" instead, where "interesting" is judged to be any
action but a pure data change (because a pure data change with zero
data changed really isn't worth showing, if we ever get one in our
diffpairs).
So if you did
chmod +x Makefile
git diff --stat
before, it would show empty (" 0 files changed"), with this it shows
Makefile | 0
1 file changed, 0 insertions(+), 0 deletions(-)
which I think is a more correct diffstat (and then with "--summary" it
shows *what* the metadata change to Makefile was - this is completely
consistent with our handling of renamed files).
Side note: the old behavior was *really* odd. With no changes at all,
"git diff --stat" output was empty. With just a chmod, it said "0
files changed". No way is our legacy behavior sane.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a configuration variable diff.context that tells
Porcelain commands to use a non-default number of context
lines instead of 3 (the default). With this variable, users
do not have to keep repeating "git log -U8" from the command
line; instead, it becomes sufficient to say "git config
diff.context 8" just once.
Signed-off-by: Jeff Muizelaar <jmuizelaar@mozilla.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once you do
$ alias glogone git log --follow
there is no way to say
$ glogone --no-follow ...
Not that "log --follow" is all that useful, but it is cheap to
support the common "you can defeat an undesirable option with a
'no-' variant of it later on the command line" pattern.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Turn many file-scope private symbols to static to reduce the
global namespace contamination.
* jc/make-static:
sequencer.c: mark a private file-scope symbol as static
ident.c: mark private file-scope symbols as static
trace.c: mark a private file-scope symbol as static
wt-status.c: mark a private file-scope symbol as static
read-cache.c: mark a private file-scope symbol as static
strbuf.c: mark a private file-scope symbol as static
sha1-array.c: mark a private file-scope symbol as static
symlinks.c: mark private file-scope symbols as static
notes.c: mark a private file-scope symbol as static
rerere.c: mark private file-scope symbols as static
graph.c: mark private file-scope symbols as static
diff.c: mark a private file-scope symbol as static
commit.c: mark a file-scope private symbol as static
builtin/notes.c: mark file-scope private symbols as static
Earlier we made the diffstat summary line that shows the number of
lines added/deleted localizable, but it was found irritating having
to see them in various languages on a list whose discussion language
is English.
The original had trivial thinko in reverting Q_(), which has been
fixed.
* nd/maint-diffstat-summary:
Revert diffstat back to English
This reverts the i18n part of 7f81463 (Use correct grammar in diffstat
summary line - 2012-02-01) but still keeps the grammar correctness for
English. It also reverts b354f11 (Fix tests under GETTEXT_POISON on
diffstat - 2012-08-27). The result is diffstat always in English
for all commands.
This helps stop users from accidentally sending localized
format-patch'd patches.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-1.7.11:
Almost 1.7.11.6
gitweb: URL-decode $my_url/$my_uri when stripping PATH_INFO
rebase -i: use full onto sha1 in reflog
sh-setup: protect from exported IFS
receive-pack: do not leak output from auto-gc to standard output
t/t5400: demonstrate breakage caused by informational message from prune
setup: clarify error messages for file/revisions ambiguity
send-email: improve RFC2047 quote parsing
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
The output from "git diff -B" for a file that ends with an
incomplete line did not put "\ No newline..." on a line of its own.
* ab/diff-write-incomplete-line:
Fix '\ No newline...' annotation in rewrite diffs
We do not want a link to 0{40} object stored anywhere in our objects.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
When a file that ends with an incomplete line is expressed as a
complete rewrite with the -B option, git diff incorrectly
appends the incomplete line indicator "\ No newline at end of
file" after such a line, rather than writing it on a line of its
own (the output codepath for normal output without -B does not
have this problem). Add a LF after the incomplete line before
writing the "\ No newline ..." out to fix this.
Add a couple of tests to confirm that the indicator comment is
generated on its own line in both plain diff and rewrite mode.
Signed-off-by: Adam Butcher <dev.lists@jessamine.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff_setup_done() has historically returned an error code, but lost
the last nonzero return in 943d5b7 (allow diff.renamelimit to be set
regardless of -M/-C, 2006-08-09). The callers were in a pretty
confused state: some actually checked for the return code, and some
did not.
Let it return void, and patch all callers to take this into account.
This conveniently also gets rid of a handful of different(!) error
messages that could never be triggered anyway.
Note that the function can still die().
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>