Add the option -W/--function-context to git diff. It is similar to
the same option of git grep and expands the context of change hunks
so that the whole surrounding function is shown. This "natural"
context can allow changes to be understood better.
Note: GNU patch doesn't like diffs generated with the new option;
it seems to expect context lines to be the same before and after
changes. git apply doesn't complain.
This implementation has the same shortcoming as the one in grep,
namely that there is no way to explicitly find the end of a
function. That means that a few lines of extra context are shown,
right up to the next recognized function begins. It's already
useful in its current form, though.
The function get_func_line() in xdiff/xemit.c is extended to work
forward as well as backward to find post-context as well as
pre-context. It returns the position of the first found matching
line. The func_line parameter is made optional, as we don't need
it for -W.
The enhanced function is then used in xdl_emit_diff() to extend
the context as needed. If the added context overlaps with the
next change, it is merged into the current hunk.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code to search for a function line to be shown in the hunk
header into its own function and to make returning the length-limited
result string easier, introduce struct func_line.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
27af01d (xdiff/xprepare: improve O(n*m) performance in
xdl_cleanup_records(), 2011-08-17) was supposed to be a performance
boost only. However, it unexpectedly changed the behaviour of diff.
Revert a part of 27af01d that removes logic that mark lines as
"multi-match" (ie. dis[i] == 2). This was preventing the multi-match
discard heuristic (performed in xdl_cleanup_records() and
xdl_clean_mmatch()) from executing.
Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ensure that the xdl_free_classifier() call on xdlclassifier_t cf is safe
even if xdl_init_classifier() isn't called. This may occur in the case
where diff is run with --histogram and a call to, say, xdl_prepare_ctx()
fails.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rc/histogram-diff:
xdiff/xhistogram: drop need for additional variable
xdiff/xhistogram: rely on xdl_trim_ends()
xdiff/xhistogram: rework handling of recursed results
xdiff: do away with xdl_mmfile_next()
Make test number unique
xdiff/xprepare: use a smaller sample size for histogram diff
xdiff/xprepare: skip classification
teach --histogram to diff
t4033-diff-patience: factor out tests
xdiff/xpatience: factor out fall-back-diff function
xdiff/xprepare: refactor abort cleanups
xdiff/xprepare: use memset()
Conflicts:
xdiff/xprepare.c
In xdl_cleanup_records(), we see O(n*m) performance, where n is the
number of records from xdf->dstart to xdf->dend, and m is the size of a
bucket in xdf->rhash (<= by mlim).
Here, we improve this to O(n) by pre-computing nm (in rcrec->len(1|2))
in xdl_classify_record().
Reported-by: Marat Radchenko <marat@slonopotamus.org>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having an additional variable (ptr) instead of changing line(1|2) and
count(1|2) was for debugging purposes.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend
set by xdl_trim_ends() that similarly tells us where the first unmatched
line from the start and end occurs.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously we were over-complicating matters by trying to combine the
recursed results. Now, terminate immediately if a recursive call failed
and return its result.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given our simple mmfile structure, xdl_mmfile_next() calls are
redundant. Do away with calls to them.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For histogram diff, we can afford a smaller sample size and thus a
poorer estimate of the number of lines, as the hash table (rhash) won't
be filled up/grown. This is safe as the final count of lines (xdf.nrecs)
will be updated correctly anyway by xdl_prepare_ctx().
This gives us a small boost in performance.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff performs "classification" of records (xdl_classify_record()),
replacing hashes (xrecord_t.ha) with a unique identifier of the
record/line and building a hash table (xrecord_t.rhash) of records. This
is then used to "cleanup" records (xdl_cleanup_records()).
We don't need any of that in histogram diff, so we omit calls to these
functions. We also skip allocating memory to the hash table, rhash, as
it is no longer used.
This gives us a small boost in performance.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show
that it is faster than its --patience cousin, as well as the default
Meyers algorithm.
The implementation has been reworked to use structs and pointers,
instead of bitmasks, thus doing away with JGit's 2^28 line limit.
We also use xdiff's default hash table implementation (xdl_hash_bits()
with XDL_HASHLONG()) for convenience.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation for the histogram diff algorithm, which will also
re-use much of the code to call the default Meyers diff algorithm.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Group free()'s that are called when a malloc() fails in
xdl_prepare_ctx(), making for more readable code.
Also add a free() on ha, in case future git hackers add allocs after the
ha malloc.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use memset() instead of a for loop to initialize. This could give a
performance advantage.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ctype functions isspace(), isalnum(), et al take an integer
argument representing an unsigned character, or -1 for EOF. On
platforms with a signed char, it is unsafe to pass a char to them
without casting it to unsigned char first.
Most of git is already shielded against this by the ctype
implementation in git-compat-util.h, but xdiff, which uses libc
ctype.h, ought to be fixed.
Noticed-by: der Mouse <mouse@Rodents-Montreal.ORG>
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For each hunk, xdl_find_func searches the preimage for a function name
until the beginning of the file. If the file does not contain any
function names, this search has complexity O(n^2) in the number of
hunks n.
Instead, inline xdl_find_func() and keep track of up to which line we
have scanned already and the contents of the last funcname line that
we have found.
Noticed and a different approach proposed by Clemens Buchacher.
This alternative solution was done by René Scharfe.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In xdl_recmatch, do the memcmp to check if the two lines are equal before
checking if whitespace flags are set. If the lines are identical, then
there is no need to check if they differ only in whitespace.
This makes the common case (there is no whitespace difference) faster.
It costs the case where lines are the same length and contain
whitespace differences, but the common case is more than 20% faster.
Signed-off-by: Dylan Reid <dgreid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
memset() is heavily optimized, and resulting assembler code
is about 150 lines less for that file.
Signed-off-by: Alexey Mahotkin <squadette@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The labels for the three participants in a potential conflict are all
optional arguments for the xdiff merge routine; if they are NULL, then
xdl_merge() can cope by omitting the labels from its output. Move
them to the xmparam structure to allow new callers to save some
keystrokes where they are not needed.
This also has the virtue of making the xdiff merge interface more
similar to merge_trees, which might make it easier to learn.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ‘git checkout --conflict=diff3’ command can be used to
present conflicts hunks including text from the common ancestor:
<<<<<<< ours
ourside
|||||||
original
=======
theirside
>>>>>>> theirs
The added information is helpful for resolving merges by hand, and
merge tools can usually grok it because it is very similar to the
output from diff3 -m.
A subtle change can help more tools to understand the output. ‘diff3’
includes the name of the merge base on the ||||||| line of the output,
and some tools misparse the conflict hunks without it. Add a new
xmp->ancestor parameter to xdl_merge() for use with conflict style
XDL_MERGE_DIFF3 as a label on the ||||||| line for any conflict hunks.
If xmp->ancestor is NULL, the output format is unchanged. Thus, this
change only provides unexposed plumbing for the new feature; it does
not affect the outward behavior of git.
Requested-by: Stefan Monnier <monnier@iro.umontreal.ca>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Bert Wesarg <Bert.Wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Include the merge level, favor, and style flags into the xmparam_t struct.
This removes the bit twiddling with these three values into the one flags
parameter.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current union merge driver is implemented as an post process. But the
xdl_merge code is quite capable to produce the result by itself. Therefore
move it there.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/conflict-marker-size:
rerere: honor conflict-marker-size attribute
rerere: prepare for customizable conflict marker length
conflict-marker-size: new attribute
rerere: use ll_merge() instead of using xdl_merge()
merge-tree: use ll_merge() not xdl_merge()
xdl_merge(): allow passing down marker_size in xmparam_t
xdl_merge(): introduce xmparam_t for merge specific parameters
git_attr(): fix function signature
Conflicts:
builtin-merge-file.c
ll-merge.c
xdiff/xdiff.h
xdiff/xmerge.c
This allows the callers of xdl_merge() to pass marker_size (defaults to 7)
in xmparam_t argument, to use conflict markers of non-default length.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So far we have only needed to be able to pass an option that is generic to
xdiff family of functions to this function. Extend the interface so that
we can give it merge specific parameters.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes people want their conflicting merges autoresolved by
favouring upstream changes. The standard answer they are given is
to run "git diff --name-only | xargs git checkout MERGE_HEAD --" in
such a case. This is to accept automerge results for the paths that
are fully resolved automatically, while taking their version of the
file in full for paths that have conflicts.
This is problematic on two counts.
One is that this is not exactly what these people want. It discards
all changes they did on their branch for any paths that conflicted.
They usually want to salvage as much automerge result as possible in
a conflicted file, and want to take the upstream change only in the
conflicted part.
This patch teaches two new modes of operation to the lowest-lever
merge machinery, xdl_merge(). Instead of leaving the conflicted
lines from both sides enclosed in <<<, ===, and >>> markers, the
conflicts are resolved favouring our side or their side of changes.
A larger problem is that this tends to encourage a bad workflow by
allowing people to record such a mixed up half-merged result as a
full commit without auditing. This commit does not tackle this
issue at all. In git, we usually give long enough rope to users
with strange wishes as long as the risky features are not enabled by
default, and this is such a risky feature.
Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tf/diff-whitespace-incomplete-line:
xutils: Fix xdl_recmatch() on incomplete lines
xutils: Fix hashing an incomplete line with whitespaces at the end
Thell Fowler noticed that various "ignore whitespace" options to git diff
do not work well on an incomplete line.
The loop control of the function responsible for these bugs was extremely
difficult to follow. This patch restructures the loops for three variants
of "ignore whitespace" logic.
The basic idea of the re-written logic is:
- A loop runs while the characters from both strings we are looking at
match. We declare unmatch immediately when we find something that does
not match and return false from the function. We break out of the loop
if we ran out of either side of the string.
The way we skip spaces inside this loop varies depending on the style
of ignoring whitespaces.
- After the above loop breaks, we know that the parts of the strings we
inspected so far match, ignoring the whitespaces. The lines can match
only if the remainder consists of nothing but whitespaces. This part
of the logic is shared across all three styles.
The new code is more obvious and should be much easier to follow.
Tested-by: Thell Fowler <git@tbfowler.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon seeing a whitespace, xdl_hash_record_with_whitespace() first skipped
the run of whitespaces (excluding LF) that begins there, ensuring that the
pointer points at the last whitespace character in the run, and assumed
that the next character must be LF at the end of the line. This does not
work when hashing an incomplete line, which lacks the LF at the end.
Introduce "at_eol" variable that is true when either we are at the end of
line (looking at LF) or at the end of an incomplete line, and use that
instead throughout the code.
Noticed by Thell Fowler.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cb/maint-1.6.0-xdl-merge-fix:
Change xdl_merge to generate output even for null merges
t6023: merge-file fails to output anything for a degenerate merge
Conflicts:
xdiff/xmerge.c
xdl_merge used to have a check to ensure that there was at least
some change in one or other side being merged but this suppressed
output for the degenerate case when base, local and remote
contents were all identical.
Removing this check enables correct output in the degenerate case
and xdl_free_script handles freeing NULL scripts so there is no
need to have the check for these calls.
Signed-off-by: Charles Bailey <charles@hashpling.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
http-push.c::finish_request():
request is initialized by the for loop
index-pack.c::free_base_data():
b is initialized by the for loop
merge-recursive.c::process_renames():
move compare to narrower scope, and remove unused assignments to it
remove unused variable renames2
xdiff/xdiffi.c::xdl_recs_cmp():
remove unused variable ec
xdiff/xemit.c::xdl_emit_diff():
xche is always overwritten
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code used to misbehave when options to ignore certain whitespaces
(-w -b and --ignore-at-eol) were combined.
Signed-off-by: Keith Cascio <keith@cs.ucla.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patience diff algorithm produces slightly more intuitive output
than the classic Myers algorithm, as it does not try to minimize the
number of +/- lines first, but tries to preserve the lines that are
unique.
To this end, it first determines lines that are unique in both files,
then the maximal sequence which preserves the order (relative to both
files) is extracted.
Starting from this initial set of common lines, the rest of the lines
is handled recursively, with Myers' algorithm as a fallback when
the patience algorithm fails (due to no common unique lines).
This patch includes memory leak fixes by Pierre Habouzit.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge two hunks if there is only the specified number of otherwise unshown
context between them. For --inter-hunk-context=1, the resulting patch has
the same number of lines but shows uninterrupted context instead of a
context header line in between.
Patches generated with this option are easier to read but are also more
likely to conflict if the file to be patched contains other changes.
This patch keeps the default for this option at 0. It is intended to just
make the feature available in order to see its advantages and downsides.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a corner case of large files whose lines do not match uniquely, the
loop to eliminate a line that matches multiple locations adjacent to a run
of lines that do not uniquely match wasted too much cycles. Fix this by
giving up early after scanning 100 lines in both direction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a corner case of large files whose lines do not match uniquely, the
loop to eliminate a line that matches multiple locations adjacent to a run
of lines that do not uniquely match wasted too much cycles. Fix this by
giving up early after scanning 100 lines in both direction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For some users (e.g. git blame), getting textual patch output is just
extra work, as they can get all the information they need from the low-
level diff structures. Allow for an alternate low-level emit function
to be defined to allow bypassing the textual patch generation; set
xemitconf_t's emit_func member to enable this.
The (void (*)()) type is pretty ugly, but the alternative would be to
include most of the private xdiff headers in xdiff.h to get the types
required for the "proper" function prototype. Also, a (void *) won't
work, as ANSI C doesn't allow a function pointer to be cast to an
object pointer.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>