This teaches combine-diff machinery to feed a combined merge to a callback
function when DIFF_FORMAT_CALLBACK is specified.
So far, format callback functions are not used for anything but 2-way
diffs. A callback is given a diff_queue_struct, which is an array of
diff_filepair. As its name suggests, a diff_filepair is a _pair_ of
diff_filespec that represents a single preimage and a single postimage.
Since "diff -c" is to compare N parents with a single merge result and
filter out any paths whose result match one (or more) of the parent(s),
its output has to be able to represent N preimages and 1 postimage. For
this reason, a callback function that inspects a diff_filepair that
results from this new infrastructure can and is expected to view the
preimage side (i.e. pair->one) as an array of diff_filespec. Each element
in the array, except for the last one, is marked with "has_more_entries"
bit, so that the same callback function can be used for 2-way diffs and
combined diffs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combined diff machinery can be used to compare:
- a merge commit with its parent commits;
- a working-tree file with multiple stages in an unmerged index; or
- a working-tree file with the HEAD and the index.
The internal function combine-diff.c:show_patch_diff() checked if it needs
to read the "result" from the working tree by looking at the object name
of the result --- if it is null_sha1, it read from the working tree.
This mistook a merge that records a deletion as the conflict resolution
as if it is a cue to read from the working tree. Pass this information
explicitly from the caller instead.
Noticed and reported by Johan Herland.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When doing a combined diff, we did not respect textconv attributes at
all. This generally lead to us printing "Binary files differ" when we
could show a combined diff of the converted text.
This patch converts file contents according to textconv attributes. The
implementation is slightly ugly; because the textconv code is tightly
linked with the diff_filespec code, we temporarily create a diff_filespec
during conversion. In practice, though, this should not create a
performance problem.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combined diff code path is totally different from the
regular diff code path, and didn't handle binary files at
all. The results of a combined diff on a binary file could
range from annoying (since we spewed binary garbage,
possibly upsetting the user's terminal), to wrong (embedded
NULs caused us to show incorrect diffs, with lines truncated
at the NUL character), to potential security problems
(embedded NULs could interfere with "-z" output, possibly
defeating policy hooks which parse diff output).
Instead, we consider a combined diff to be binary if any of
the input blobs is binary. To show a binary combined diff,
we indicate "Binary blobs differ"; the "index" meta line
will show which parents had which blob.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One loop combined both the patch generation and checking
whether there was any mode change to report. Let's factor
that into two separate loops, as we may care about the mode
change even if we are not generating patches (e.g., because
we are showing a binary diff, which will come in a future
patch).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a pretty big logical chunk, so it makes the function
a bit more readable to have it split out. In addition, it
will make it easier to add an alternate code path for binary
diffs in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdi_diff_outf() overrides the structure members of its last parameter,
ignoring any value that callers pass in. It's no surprise then that all
callers pass a pointer to an uninitialized structure. They also don't
read it after the call, so the parameter is neither used for input nor
for output. Turn it into a local variable of xdi_diff_outf().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since the xdiff library had been introduced to git, all its callers
have used the flag XDF_NEED_MINIMAL. It makes sure that the smallest
possible diff is produced, but that takes quite some time if there are
lots of differences that can be expressed in multiple ways.
This flag makes a difference for only 0.1% of the non-merge commits in
the git repo of Linux, both in terms of diff size and execution time.
The patches there are mostly nice and small.
SungHyun Nam however reported a case in a different repo where a diff
took more than 20 times longer to generate with XDF_NEED_MINIMAL than
without. Rebasing became really slow.
This patch removes this flag from all callers. The default of xdiff is
saner because it has minimal to no impact in the normal case of small
diffs and doesn't incur that much of a speed penalty for large ones.
A follow-up patch may introduce a command line option to set the flag if
the user needs it, similar to GNU diff's -d/--minimal.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Consider an evil merge of two commits A and B, both of which have a
file 'foo', but the merge result does not have that file.
The combined-diff code learned in 4462731 (combine-diff: do not punt
on removed or added files., 2006-02-06) to concisely show only the
removal, since that is the evil part and the previous contents are
presumably uninteresting.
However, to diagnose an empty merge result, it overloaded the variable
that holds the file's length. This means that the check also triggers
for truncated files. Consequently, such files were not shown in the
diff at all despite the merge being clearly evil.
Fix this by adding a new variable that distinguishes whether the file
was deleted (which is the case 4462731 handled) or truncated. In the
truncated case, we show the full combined diff again, which is rather
spammy but at least does not hide the evilness.
Reported-by: David Martínez Martí <desarrollo@gestiweb.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inspired by the coloring of quilt.
Introduce a separate color and paint the hunk comment part, i.e. the name
of the function, in a separate color "diff.func" (defaults to plain).
Whitespace between hunk header and hunk comment is printed in plain color.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Trailing whitespace and no newline fix
diff --cc: a lost line at the beginning of the file is shown incorrectly
combine-diff.c: fix performance problem when folding common deleted lines
When combine-diff inspected the diff from one parent to the merge result,
it misinterpreted a header in the form @@ -l,k +0,0 @@.
This hunk header means that K lines were removed from the beginning of the
file, so the lost lines must be queued to the sline that represents the
first line of the merge result, but we incremented our pointer incorrectly
and ended up queuing it to the second line, which in turn made the lossage
appear _after_ the first line.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For a deleted line in a patch with the parent we are looking at, the
append_lost() function finds the same line among a run of lines that were
deleted from the same location by patches from parents we previously
checked. This is so that patches with two parents
@@ -1,4 +1,3 @@ @@ -1,4 +1,3 @@
one one
-two -two
three three
-quatro -fyra
+four +four
can be coalesced into this sequence, reusing one line that describes the
removal of "two" for both parents.
@@@ -1,4 -1,4 +1,3 @@@
one
--two
three
- quatro
-frya
++four
While reading the second patch (that removes "two" and then "fyra"), after
finding where removal of the "two" matches, we need to find existing
removal of "fyra" (if exists) in the removal list, but the match has to
happen after all the existing matches (in this case "two"). The code used
a naïve O(n^2) algorithm to compute this by scanning the whole removal
list over and over again.
This patch remembers where the next scan should be started in the existing
removal list to avoid this.
Noticed by Linus Torvalds.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lots of die() calls did not actually report the kind of error, which
can leave the user confused as to the real problem. Use die_errno()
where we check a system/library call that sets errno on failure, or
one of the following that wrap such calls:
Function Passes on error from
-------- --------------------
odb_pack_keep open
read_ancestry fopen
read_in_full xread
strbuf_read xread
strbuf_read_file open or strbuf_read_file
strbuf_readlink readlink
write_in_full xwrite
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Essentially; s/type* /type */ as per the coding guidelines.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combine diff logic knew only about blobs (and their checked-out form
in the work tree, either regular files or symlinks), and barfed when fed
submodules. This "externalizes" gitlinks in the same way as the normal
patch generation codepath does (i.e. "Subproject commit Xxx\n") to fix the
issue.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kb/checkout-optim:
Revert "lstat_cache(): print a warning if doing ping-pong between cache types"
checkout bugfix: use stat.mtime instead of stat.ctime in two places
Makefile: Set compiler switch for USE_NSEC
Create USE_ST_TIMESPEC and turn it on for Darwin
Not all systems use st_[cm]tim field for ns resolution file timestamp
Record ns-timestamps if possible, but do not use it without USE_NSEC
write_index(): update index_state->timestamp after flushing to disk
verify_uptodate(): add ce_uptodate(ce) test
make USE_NSEC work as expected
fix compile error when USE_NSEC is defined
check_updates(): effective removal of cache entries marked CE_REMOVE
lstat_cache(): print a warning if doing ping-pong between cache types
show_patch_diff(): remove a call to fstat()
write_entry(): use fstat() instead of lstat() when file is open
write_entry(): cleanup of some duplicated code
create_directories(): remove some memcpy() and strchr() calls
unlink_entry(): introduce schedule_dir_for_removal()
lstat_cache(): swap func(length, string) into func(string, length)
lstat_cache(): generalise longest_match_lstat_cache()
lstat_cache(): small cleanup and optimisation
These weren't used outside and can be safely moved
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently inside show_patch_diff() we have an fstat() call after an
ok lstat() call. Since before the call to fstat() we have already
tested for the link case with S_ISLNK(), the fstat() can be removed.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When showing combined diff using work tree contents, use strbuf_readlink()
to read symbolic links.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
We're going to be adding some parameters to this, so we can't have
any uninitialized data in it.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many call sites use strbuf_init(&foo, 0) to initialize local
strbuf variable "foo" which has not been accessed since its
declaration. These can be replaced with a static initialization
using the STRBUF_INIT macro which is just as readable, saves a
function call, and takes up fewer lines.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
With a new configuration "diff.mnemonicprefix", "git diff" shows the
differences between various combinations of preimage and postimage trees
with prefixes different from the standard "a/" and "b/". Hopefully this
will make the distinction stand out for some people.
"git diff" compares the (i)ndex and the (w)ork tree;
"git diff HEAD" compares a (c)ommit and the (w)ork tree;
"git diff --cached" compares a (c)ommit and the (i)ndex;
"git-diff HEAD:file1 file2" compares an (o)bject and a (w)ork tree entity;
"git diff --no-index a b" compares two non-git things (1) and (2).
Because these mnemonics now have meanings, they are swapped when reverse
diff is in effect and this feature is enabled.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the tracked contents have CRLF line endings, colored diff output
shows "^M" at the end of output lines, which is distracting, even though
the pager we use by default ("less") knows to hide them.
The problem is that "less" hides a carriage-return only at the end of the
line, immediately before a line feed. The colored diff output does not
take this into account, and emits four element sequence for each line:
- force this color;
- the line up to but not including the terminating line feed;
- reset color
- line feed.
By including the carriage return at the end of the line in the second
item, we are breaking the smart our pager has in order not to show "^M".
This can be fixed by changing the sequence to:
- force this color;
- the line up to but not including the terminating end-of-line;
- reset color
- end-of-line.
where end-of-line is either a single linefeed or a CRLF pair. When the
output is not colored, "force this color" and "reset color" sequences are
both empty, so we won't have this problem with or without this patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix git-diff to make it produce useful 3-way diffs for merge conflicts in
repositories with autocrlf enabled. Otherwise it always reports that the
whole file was changed, because it uses the contents from the working tree
without necessary conversion.
Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This further enhances xdi_diff_outf() interface so that it takes two
common parameters: the callback function that processes one line at a
time, and a pointer to its application specific callback data structure.
xdi_diff_outf() creates its own "xdiff_emit_state" structure and stashes
these two away inside it, which is used by the lowest level output
function in the xdiff_outf() callchain, consume_one(), to call back to the
application layer. With this restructuring, we lift the requirement that
the caller supplied callback data structure embeds xdiff_emit_state
structure as its first member.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prepare for the need to initialize and release resources for an
xdi_diff with the xdiff_outf output function, make a new function to
wrap this usage.
Old:
ecb.outf = xdiff_outf;
ecb.priv = &state;
...
xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);
New:
xdi_diff_outf(file_p, file_o, &state.xm, &xpp, &xecfg, &ecb);
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we include a few uninteresting lines before the interesting ones as
context, we are only interested in seeing the surviving lines themselves
and not the deleted lines that are before them. Mark the added leading
context lines in give_context() and not show deleted lines form them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These variables were made unnecessary by commit
3969cf7db1.
Signed-off-by: Adam Simpkins <adam@adamsimpkins.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The resulting data is zero terminated after the read loop, but
the subsequent loop that scans for '\n' will overrun the buffer.
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves the logic to quote two paths (prefix + path) in
C-style introduced in the previous commit from the
dump_quoted_path() in combine-diff.c to quote.c, and uses it to
fix rewrite_diff() that never C-quoted the pathnames correctly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier when showing combined diff, the filenames on the ---/+++
header lines were quoted incorrectly. a/ (or b/) prefix was
output literally and then the path was output, with c-quoting.
This fixes the quoting logic, and while at it, adjusts the code
to use the customizable prefix (a_prefix and b_prefix)
introduced recently.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This inserts a new function xdi_diff() that currently does not
do anything other than calling the underlying xdl_diff() to the
callchain of current callers of xdl_diff() function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
reverse_diff was a bit-value in disguise, it's merged in the flags now.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* quote_c_style works on a strbuf instead of a wild buffer.
* quote_c_style is now clever enough to not add double quotes if not needed.
* write_name_quoted inherits those advantages, but also take a different
set of arguments. Now instead of asking for quotes or not, you pass a
"terminator". If it's \0 then we assume you don't want to escape, else C
escaping is performed. In any case, the terminator is also appended to the
stream. It also no longer takes the prefix/prefix_len arguments, as it's
seldomly used, and makes some optimizations harder.
* write_name_quotedpfx is created to work like write_name_quoted and take
the prefix/prefix_len arguments.
Thanks to those API changes, diff.c has somehow lost weight, thanks to the
removal of functions that were wrappers around the old write_name_quoted
trying to give it a semantics like the new one, but performing a lot of
allocations for this goal. Now we always write directly to the stream, no
intermediate allocation is performed.
As a side effect of the refactor in builtin-apply.c, the length of the bar
graphs in diffstats are not affected anymore by the fact that the path was
clipped.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
The instances of xdemitconf_t were initialized member by member.
Instead, initialize them to all zero, so we do not have
to update those places each time we introduce a new member.
[jc: minimally fixed by getting rid of a new global]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes all calls to xread() where the return value is not
stored into an ssize_t. The patch should not have any effect whatsoever,
other than putting better/more appropriate type names on variables.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This enhances the attributes mechanism so that external programs
meant for existing GIT_EXTERNAL_DIFF interface can be specifed
per path.
To configure such a custom diff driver, first define a custom
diff driver in the configuration:
[diff "my-c-diff"]
command = <<your command string comes here>>
Then mark the paths that you want to use this custom driver
using the attribute mechanism.
*.c diff=my-c-diff
The intent of this separation is that the attribute mechanism is
used for specifying the type of the contents, while the
configuration mechanism is used to define what needs to be done
to that type of the contents, which would be specific to both
platform and personal taste.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some systems have sizeof(off_t) == 8 while sizeof(size_t) == 4.
This implies that we are able to access and work on files whose
maximum length is around 2^63-1 bytes, but we can only malloc or
mmap somewhat less than 2^32-1 bytes of memory.
On such a system an implicit conversion of off_t to size_t can cause
the size_t to wrap, resulting in unexpected and exciting behavior.
Right now we are working around all gcc warnings generated by the
-Wshorten-64-to-32 option by passing the off_t through xsize_t().
In the future we should make xsize_t on such problematic platforms
detect the wrapping and die if such a file is accessed.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>