1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-18 15:04:49 +01:00
Commit graph

766 commits

Author SHA1 Message Date
Junio C Hamano
c0599f6993 Merge branch 'jk/diff-no-rename-empty'
Forbids rename detection logic from matching two empty files as renames
during merge-recursive to prevent mismerges.

By Jeff King
* jk/diff-no-rename-empty:
  merge-recursive: don't detect renames of empty files
  teach diffcore-rename to optionally ignore empty content
  make is_empty_blob_sha1 available everywhere
  drop casts from users EMPTY_TREE_SHA1_BIN
2012-04-16 12:41:49 -07:00
Junio C Hamano
3bec29bb07 Merge branch 'tr/maint-word-diff-regex-sticky'
The regexp configured with wordregex was incorrectly reused across files.

By Thomas Rast (2) and Johannes Sixt (1)
* tr/maint-word-diff-regex-sticky:
  diff: tweak a _copy_ of diff_options with word-diff
  diff: refactor the word-diff setup from builtin_diff_cmd
  t4034: diff.*.wordregex should not be "sticky" in --word-diff
2012-04-15 22:51:34 -07:00
Junio C Hamano
86c340e082 Merge branch 'jc/diff-algo-cleanup'
Resurrects the preparatory clean-up patches from another topic that was
discarded, as this would give a saner foundation to build on diff.algo
configuration option series.

* jc/diff-algo-cleanup:
  xdiff: PATIENCE/HISTOGRAM are not independent option bits
  xdiff: remove XDL_PATCH_* macros
2012-04-15 22:51:15 -07:00
Jeff King
90d43b0768 teach diffcore-rename to optionally ignore empty content
Our rename detection is a heuristic, matching pairs of
removed and added files with similar or identical content.
It's unlikely to be wrong when there is actual content to
compare, and we already take care not to do inexact rename
detection when there is not enough content to produce good
results.

However, we always do exact rename detection, even when the
blob is tiny or empty. It's easy to get false positives with
an empty blob, simply because it is an obvious content to
use as a boilerplate (e.g., when telling git that an empty
directory is worth tracking via an empty .gitignore).

This patch lets callers specify whether or not they are
interested in using empty files as rename sources and
destinations. The default is "yes", keeping the original
behavior. It works by detecting the empty-blob sha1 for
rename sources and destinations.

One more flexible alternative would be to allow the caller
to specify a minimum size for a blob to be "interesting" for
rename detection. But that would catch small boilerplate
files, not large ones (e.g., if you had the GPL COPYING file
in many directories).

A better alternative would be to allow a "-rename"
gitattribute to allow boilerplate files to be marked as
such. I'll leave the complexity of that solution until such
time as somebody actually wants it. The complaints we've
seen so far revolve around empty files, so let's start with
the simple thing.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-23 13:52:49 -07:00
Thomas Rast
6440d3417c diff: tweak a _copy_ of diff_options with word-diff
When using word diff, the code sets the word_regex from various
defaults if it was not set already.  The problem is that it does this
on the original diff_options, which will also be used in subsequent
diffs.

This means that when the word_regex is not given on the command line,
only the first diff for which a setting for word_regex (either from
attributes or diff.wordRegex) ever takes effect.  This value then
propagates to the rest of the diff runs and in particular prevents
further attribute lookups.

Fix the problem of changing diff state once and for all, by working
with a _copy_ of the diff_options.

Noticed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-14 14:41:20 -07:00
Thomas Rast
77d1a520fb diff: refactor the word-diff setup from builtin_diff_cmd
Quite a chunk of builtin_diff_cmd deals with word-diff setup, defaults
and such.  This makes the function a bit hard to read, but is also
asymmetric because the corresponding teardown lives in free_diff_words_data
already.

Refactor into a new function init_diff_words_data.  For simplicity,
also shuffle around some functions it depends on.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-14 14:40:15 -07:00
Junio C Hamano
fce8b5d82f Merge branch 'jc/maint-diff-patch-header' into maint
"git diff-index" and its friends at the plumbing level showed the
"diff --git" header and nothing else for a path whose cached stat
info is dirty without actual difference when asked to produce a
patch. This was a longstanding bug that we could have fixed long
time ago.

By Junio C Hamano
* jc/maint-diff-patch-header:
  diff -p: squelch "diff --git" header for stat-dirty paths
  t4011: illustrate "diff-index -p" on stat-dirty paths
  t4011: modernise style
2012-03-12 15:46:32 -07:00
Junio C Hamano
239d6eddcd Merge branch 'jc/maint-diff-patch-header'
By Junio C Hamano
* jc/maint-diff-patch-header:
  diff -p: squelch "diff --git" header for stat-dirty paths
  t4011: illustrate "diff-index -p" on stat-dirty paths
  t4011: modernise style
2012-03-06 14:53:07 -08:00
Junio C Hamano
af050219e4 Merge branch 'zj/diff-stat-dyncol'
By Zbigniew Jędrzejewski-Szmek (8) and Junio C Hamano (1)
* zj/diff-stat-dyncol:
  : This breaks tests. Perhaps it is not worth using the decimal-width stuff
  : for this series, at least initially.
  diff --stat: add config option to limit graph width
  diff --stat: enable limiting of the graph part
  diff --stat: add a test for output with COLUMNS=40
  diff --stat: use a maximum of 5/8 for the filename part
  merge --stat: use the full terminal width
  log --stat: use the full terminal width
  show --stat: use the full terminal width
  diff --stat: use the full terminal width
  diff --stat: tests for long filenames and big change counts
2012-03-06 14:53:06 -08:00
Junio C Hamano
b3f01ff29f diff -p: squelch "diff --git" header for stat-dirty paths
The plumbing "diff" commands look at the working tree files without
refreshing the index themselves for performance reasons (the calling
script is expected to do that upfront just once, before calling one or
more of them).  In the early days of git, they showed the "diff --git"
header before they actually ask the xdiff machinery to produce patches,
and ended up showing only these headers if the real contents are the same
and the difference they noticed was only because the stat info cached in
the index did not match that of the working tree. It was too late for the
implementation to take the header that it already emitted back.

But 3e97c7c (No diff -b/-w output for all-whitespace changes, 2009-11-19)
introduced necessary logic to keep the meta-information headers in a
strbuf and delay their output until the xdiff machinery noticed actual
changes. This was primarily in order to generate patches that ignore
whitespaces. When operating under "-w" mode, we wouldn't know if the
header is needed until we actually look at the resulting patch, so it was
a sensible thing to do, but we did not realize that the same reasoning
applies to stat-dirty paths.

Later, 296c6bb (diff: fix "git show -C -C" output when renaming a binary
file, 2010-05-26) generalized this machinery and added must_show_header
toggle.  This is turned on when the header must be shown even when there
is no patch to be produced, e.g. only the mode was changed, or the path
was renamed, without changing the contents.  However, when it did so, it
still kept the special case for the "-w" mode, which meant that the
plumbing would keep showing these phantom changes.

This corrects this historical inconsistency by allowing the plumbing to
omit paths that are only stat-dirty from its output in the same way as it
handles whitespace only changes under "-w" option.

The change in the behaviour can be seen in the updated test.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-01 12:00:01 -08:00
Zbigniew Jędrzejewski-Szmek
df44483a5d diff --stat: add config option to limit graph width
Config option diff.statGraphWidth=<width> is equivalent to
--stat-graph-width=<width>, except that the config option is ignored
by format-patch.

For the graph-width limiting to be usable, it should happen
'automatically' once configured, hence the config option.
Nevertheless, graph width limiting only makes sense when used on a
wide terminal, so it should not influence the output of format-patch,
which adheres to the 80-column standard.

Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-01 09:15:58 -08:00
Zbigniew Jędrzejewski-Szmek
969fe57b84 diff --stat: enable limiting of the graph part
A new option --stat-graph-width=<width> can be used to limit the width
of the graph part even is more space is available. Up to <width>
columns will be used for the graph.

If commits changing a lot of lines are displayed in a wide terminal
window (200 or more columns), and the +- graph uses the full width,
the output can be hard to comfortably scan with a horizontal movement
of human eyes. Messages wrapped to about 80 columns would be
interspersed with very long +- lines. It makes sense to limit the
width of the graph part to a fixed value (e.g. 70 columns), even if
more columns are available.

Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-01 09:15:47 -08:00
Zbigniew Jędrzejewski-Szmek
1b058bc30d diff --stat: use a maximum of 5/8 for the filename part
The way that available columns are divided between the filename part
and the graph part is modified to use as many columns as necessary for
the filenames and the rest for the graph.

If there isn't enough columns to print both the filename and the
graph, at least 5/8 of available space is devoted to filenames. On a
standard 80 column terminal, or if not connected to a terminal and
using the default of 80 columns, this gives the same partition as
before.

The effect of this change is visible in the patch to the test vector
in t4052; with a small change with long filename, it stops truncating
the name part too short, and also allocates a bit more columns to the
graph for larger changes.

Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-01 09:14:58 -08:00
Zbigniew Jędrzejewski-Szmek
af9fedc128 diff --stat: use the full terminal width
Default to the real terminal width for diff --stat output, instead
of the hard-coded 80 columns.

Some projects (especially in Java), have long filename paths, with
nested directories or long individual filenames. When files are
renamed, the filename part in stat output can be almost useless. If
the middle part between { and } is long (because the file was moved to
a completely different directory), then most of the path would be
truncated.

It makes sense to detect and use the full terminal width and display
full filenames if possible.

The are commands like diff, show, and log, which can adapt the output
to the terminal width. There are also commands like format-patch,
whose output should be independent of the terminal width. Since it is
safer to use the 80-column default, the real terminal width is only
used if requested by the calling code by setting diffopts.stat_width=-1.
Normally this value is 0, and can be set by the user only to a
non-negative value, so -1 is safe to use internally.

This patch only changes the diff builtin to use the full terminal width.

Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-01 09:13:06 -08:00
Junio C Hamano
db65f0fc3b Merge branches zj/decimal-width, zj/term-columns and jc/diff-stat-scaler 2012-02-24 16:07:04 -08:00
Junio C Hamano
a67c235448 Merge branch 'jc/diff-stat-scaler' into maint
* jc/diff-stat-scaler:
  diff --stat: show bars of same length for paths with same amount of changes
2012-02-21 15:00:33 -08:00
Junio C Hamano
8c60fcbcfd Merge branch 'jc/diff-stat-scaler'
* jc/diff-stat-scaler:
  diff --stat: show bars of same length for paths with same amount of changes
2012-02-20 00:15:15 -08:00
Junio C Hamano
307ab20b33 xdiff: PATIENCE/HISTOGRAM are not independent option bits
Because the default Myers, patience and histogram algorithms cannot be in
effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not
independent bits.  Instead of wasting one bit per algorithm, define a few
macros to access the few bits they occupy and update the code that access
them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-19 15:36:55 -08:00
Junio C Hamano
2eeeef24ff diff --stat: show bars of same length for paths with same amount of changes
When commit 3ed74e6 (diff --stat: ensure at least one '-' for deletions,
and one '+' for additions, 2006-09-28) improved the output for files with
tiny modifications, we accidentally broke the logic to ensure that two
equal sized changes are shown with the bars of the same length, even when
rounding errors exist.

Compute the length of the graph bars, using the same "non-zero changes is
shown with at least one column" scaling logic, but by scaling the sum of
additions and deletions to come up with the total length of the bar (this
ensures that two equal sized changes result in bars of the same length),
and then scaling the smaller of the additions or deletions. The other side
is computed as the difference between the two.

This makes the apportioning between additions and deletions less accurate
due to rounding errors, but it is much less noticeable than two files with
the same amount of change showing bars of different length.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-14 14:21:49 -08:00
Junio C Hamano
63d37c3062 Merge branch 'jk/userdiff-config-simplify'
* jk/userdiff-config-simplify:
  drop odd return value semantics from userdiff_config
2012-02-14 12:57:17 -08:00
Jeff King
6680a0874f drop odd return value semantics from userdiff_config
When the userdiff_config function was introduced in be58e70
(diff: unify external diff and funcname parsing code,
2008-10-05), it used a return value convention unlike any
other config callback. Like other callbacks, it used "-1" to
signal error. But it returned "1" to indicate that it found
something, and "0" otherwise; other callbacks simply
returned "0" to indicate that no error occurred.

This distinction was necessary at the time, because the
userdiff namespace overlapped slightly with the color
configuration namespace. So "diff.color.foo" could mean "the
'foo' slot of diff coloring" or "the 'foo' component of the
"color" userdiff driver". Because the color-parsing code
would die on an unknown color slot, we needed the userdiff
code to indicate that it had matched the variable, letting
us bypass the color-parsing code entirely.

Later, in 8b8e862 (ignore unknown color configuration,
2009-12-12), the color-parsing code learned to silently
ignore unknown slots. This means we no longer need to
protect userdiff-matched variables from reaching the
color-parsing code.

We can therefore change the userdiff_config calling
convention to a more normal one. This drops some code from
each caller, which is nice. But more importantly, it reduces
the cognitive load for readers who may wonder why
userdiff_config is unlike every other config callback.

There's no need to add a new test confirming that this
works; t4020 already contains a test that sets
diff.color.external.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 10:44:54 -08:00
Nguyễn Thái Ngọc Duy
7f814632f5 Use correct grammar in diffstat summary line
"git diff --stat" and "git apply --stat" now learn to print the line
"%d files changed, %d insertions(+), %d deletions(-)" in singular form
whenever applicable. "0 insertions" and "0 deletions" are also omitted
unless they are both zero.

This matches how versions of "diffstat" that are not prehistoric produced
their output, and also makes this line translatable.

[jc: with help from Thomas Dickey in archaeology of "diffstat"]
[jc: squashed Jonathan's updates to illustrations in tutorials and a test]

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 23:19:42 -08:00
Junio C Hamano
05c65cb116 Merge branch 'tr/maint-word-diff-incomplete-line'
* tr/maint-word-diff-incomplete-line:
  word-diff: ignore '\ No newline at eof' marker
2012-01-18 15:16:19 -08:00
Thomas Rast
c7c2bc0ac9 word-diff: ignore '\ No newline at eof' marker
The word-diff logic accumulates + and - lines until another line type
appears (normally [ @\]), at which point it generates the word diff.
This is usually correct, but it breaks when the preimage does not have
a newline at EOF:

  $ printf "%s" "a a a" >a
  $ printf "%s\n" "a ab a" >b
  $ git diff --no-index --word-diff a b
  diff --git 1/a 2/b
  index 9f68e94..6a7c02f 100644
  --- 1/a
  +++ 2/b
  @@ -1 +1 @@
  [-a a a-]
   No newline at end of file
  {+a ab a+}

Because of the order of the lines in a unified diff

  @@ -1 +1 @@
  -a a a
  \ No newline at end of file
  +a ab a

the '\' line flushed the buffers, and the - and + lines were never
matched with each other.

A proper fix would defer such markers until the end of the hunk.
However, word-diff is inherently whitespace-ignoring, so as a cheap
fix simply ignore the marker (and hide it from the output).

We use a prefix match for '\ ' to parallel the logic in
apply.c:parse_fragment().  We currently do not localize this string
(just accept other variants of it in git-apply), but this should be
future-proof.

Noticed-by: Ivan Shirokoff <shirokoff@yandex-team.ru>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-12 11:27:41 -08:00
Junio C Hamano
9b55aa03da Merge branch 'rs/diff-whole-function'
* rs/diff-whole-function:
  diff: add option to show whole functions as context
  xdiff: factor out get_func_line()
2011-10-19 10:49:13 -07:00
Junio C Hamano
7a63a920fd Merge branch 'rs/diff-cleanup-records-fix'
* rs/diff-cleanup-records-fix:
  diff: resurrect XDF_NEED_MINIMAL with --minimal
  Revert removal of multi-match discard heuristic in 27af01
2011-10-13 19:03:22 -07:00
Junio C Hamano
7ddd582402 Merge branch 'jc/maint-diffstat-numstat-context'
* jc/maint-diffstat-numstat-context:
  diff: teach --stat/--numstat to honor -U$num
2011-10-10 15:56:18 -07:00
René Scharfe
14937c2c06 diff: add option to show whole functions as context
Add the option -W/--function-context to git diff.  It is similar to
the same option of git grep and expands the context of change hunks
so that the whole surrounding function is shown.  This "natural"
context can allow changes to be understood better.

Note: GNU patch doesn't like diffs generated with the new option;
it seems to expect context lines to be the same before and after
changes.  git apply doesn't complain.

This implementation has the same shortcoming as the one in grep,
namely that there is no way to explicitly find the end of a
function.  That means that a few lines of extra context are shown,
right up to the next recognized function begins.  It's already
useful in its current form, though.

The function get_func_line() in xdiff/xemit.c is extended to work
forward as well as backward to find post-context as well as
pre-context.  It returns the position of the first found matching
line.  The func_line parameter is made optional, as we don't need
it for -W.

The enhanced function is then used in xdl_emit_diff() to extend
the context as needed.  If the added context overlaps with the
next change, it is merged into the current hunk.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-10 12:05:07 -07:00
Junio C Hamano
81b568c839 diff: resurrect XDF_NEED_MINIMAL with --minimal
Earlier, 582aa00 (git diff too slow for a file, 2010-05-02)
unconditionally dropped XDF_NEED_MINIMAL option from the internal xdiff
invocation to help performance on pathological cases, while hinting that a
follow-up patch could reintroduce it with "--minimal" option from the
command line.

Make it so.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-03 11:58:18 -07:00
Junio C Hamano
f01cae918f diff: teach --stat/--numstat to honor -U$num
"git diff -p" piped to external diffstat and "git diff --stat" may see
different patch text (both are valid and describe the same change
correctly) when counting the number of added and deleted lines, arriving
at different results to confuse the users, as --stat/--numstat codepath
always uses the hardcoded -U0 as the context length.

Make --stat/--numstat codepath to honor the context length the same way
as the textual patch codepath does to avoid this problem.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-22 10:54:47 -07:00
Junio C Hamano
f946b465d7 Merge branch 'jk/color-and-pager'
* jk/color-and-pager:
  want_color: automatically fallback to color.ui
  diff: don't load color config in plumbing
  config: refactor get_colorbool function
  color: delay auto-color decision until point of use
  git_config_colorbool: refactor stdout_is_tty handling
  diff: refactor COLOR_DIFF from a flag into an int
  setup_pager: set GIT_PAGER_IN_USE
  t7006: use test_config helpers
  test-lib: add helper functions for config
  t7006: modernize calls to unset

Conflicts:
	builtin/commit.c
	parse-options.c
2011-08-28 21:19:16 -07:00
Jeff King
3e1dd17a89 diff: don't load color config in plumbing
The diff config callback is split into two functions: one
which loads "ui" config, and one which loads "basic" config.
The former chains to the latter, as the diff UI config is a
superset of the plumbing config.

The color.diff variable is only loaded in the UI config.
However, the basic config actually chains to
git_color_default_config, which loads color.ui. This doesn't
actually cause any bugs, because the plumbing diff code does
not actually look at the value of color.ui.

However, it is somewhat nonsensical, and it makes it
difficult to refactor the color code. It probably came about
because there is no git_color_config to load only color
config, but rather just git_color_default_config, which
loads color config and chains to git_default_config.

This patch splits out the color-specific portion of
git_color_default_config so that the diff UI config can call
it directly. This is perhaps better explained by the
chaining of callbacks. Before we had:

  git_diff_ui_config
    -> git_diff_basic_config
      -> git_color_default_config
        -> git_default_config

Now we have:

  git_diff_ui_config
    -> git_color_config
    -> git_diff_basic_config
      -> git_default_config

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-19 15:51:38 -07:00
Jeff King
daa0c3d971 color: delay auto-color decision until point of use
When we read a color value either from a config file or from
the command line, we use git_config_colorbool to convert it
from the tristate always/never/auto into a single yes/no
boolean value.

This has some timing implications with respect to starting
a pager.

If we start (or decide not to start) the pager before
checking the colorbool, everything is fine. Either isatty(1)
will give us the right information, or we will properly
check for pager_in_use().

However, if we decide to start a pager after we have checked
the colorbool, things are not so simple. If stdout is a tty,
then we will have already decided to use color. However, the
user may also have configured color.pager not to use color
with the pager. In this case, we need to actually turn off
color. Unfortunately, the pager code has no idea which color
variables were turned on (and there are many of them
throughout the code, and they may even have been manipulated
after the colorbool selection by something like "--color" on
the command line).

This bug can be seen any time a pager is started after
config and command line options are checked. This has
affected "git diff" since 89d07f7 (diff: don't run pager if
user asked for a diff style exit code, 2007-08-12). It has
also affect the log family since 1fda91b (Fix 'git log'
early pager startup error case, 2010-08-24).

This patch splits the notion of parsing a colorbool and
actually checking the configuration. The "use_color"
variables now have an additional possible value,
GIT_COLOR_AUTO. Users of the variable should use the new
"want_color()" wrapper, which will lazily determine and
cache the auto-color decision.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-19 15:51:34 -07:00
Jeff King
e269eb7946 git_config_colorbool: refactor stdout_is_tty handling
Usually this function figures out for itself whether stdout
is a tty. However, it has an extra parameter just to allow
git-config to override the auto-detection for its
--get-colorbool option.

Instead of an extra parameter, let's just use a global
variable. This makes calling easier in the common case, and
will make refactoring the colorbool code much simpler.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-18 14:48:29 -07:00
Jeff King
f1c9626105 diff: refactor COLOR_DIFF from a flag into an int
This lets us store more than just a bit flag for whether we
want color; we can also store whether we want automatic
colors. This can be useful for making the automatic-color
decision closer to the point of use.

This mostly just involves replacing DIFF_OPT_* calls with
manipulations of the flag. The biggest exception is that
calls to DIFF_OPT_TST must check for "o->use_color > 0",
which lets an "unknown" value (i.e., the default) stay at
"no color". In the previous code, a value of "-1" was not
propagated at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-18 14:35:53 -07:00
Junio C Hamano
ca01600306 Merge branch 'rc/histogram-diff'
* rc/histogram-diff:
  xdiff/xhistogram: drop need for additional variable
  xdiff/xhistogram: rely on xdl_trim_ends()
  xdiff/xhistogram: rework handling of recursed results
  xdiff: do away with xdl_mmfile_next()
  Make test number unique
  xdiff/xprepare: use a smaller sample size for histogram diff
  xdiff/xprepare: skip classification
  teach --histogram to diff
  t4033-diff-patience: factor out tests
  xdiff/xpatience: factor out fall-back-diff function
  xdiff/xprepare: refactor abort cleanups
  xdiff/xprepare: use memset()
2011-08-17 17:36:06 -07:00
Junio C Hamano
a35d78c0f4 Merge branch 'jc/zlib-wrap' into maint
* jc/zlib-wrap:
  zlib: allow feeding more than 4GB in one go
  zlib: zlib can only process 4GB at a time
  zlib: wrap deflateBound() too
  zlib: wrap deflate side of the API
  zlib: wrap inflateInit2 used to accept only for gzip format
  zlib: wrap remaining calls to direct inflate/inflateEnd
  zlib wrapper: refactor error message formatter
2011-08-16 11:23:26 -07:00
Junio C Hamano
e10e476fb1 Merge branch 'jk/combine-diff-binary-etc' into maint
* jk/combine-diff-binary-etc:
  combine-diff: respect textconv attributes
  refactor get_textconv to not require diff_filespec
  combine-diff: handle binary files as binary
  combine-diff: calculate mode_differs earlier
  combine-diff: split header printing into its own function
2011-08-16 11:23:24 -07:00
Junio C Hamano
eb4f4076aa Merge branch 'jc/zlib-wrap'
* jc/zlib-wrap:
  zlib: allow feeding more than 4GB in one go
  zlib: zlib can only process 4GB at a time
  zlib: wrap deflateBound() too
  zlib: wrap deflate side of the API
  zlib: wrap inflateInit2 used to accept only for gzip format
  zlib: wrap remaining calls to direct inflate/inflateEnd
  zlib wrapper: refactor error message formatter

Conflicts:
	sha1_file.c
2011-07-19 09:33:04 -07:00
Tay Ray Chuan
8c912eea94 teach --histogram to diff
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show
that it is faster than its --patience cousin, as well as the default
Meyers algorithm.

The implementation has been reworked to use structs and pointers,
instead of bitmasks, thus doing away with JGit's 2^28 line limit.

We also use xdiff's default hash table implementation (xdl_hash_bits()
with XDL_HASHLONG()) for convenience.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-12 09:29:20 -07:00
Junio C Hamano
a852aac48d Merge branch 'mg/diff-stat-count'
* mg/diff-stat-count:
  diff --stat-count: finishing touches
  diff-options.txt: describe --stat-{width,name-width,count}
  diff: introduce --stat-lines to limit the stat lines
  diff.c: omit hidden entries from namelen calculation with --stat
2011-06-29 17:03:10 -07:00
Junio C Hamano
dbae1a1336 Merge branch 'jk/combine-diff-binary-etc'
* jk/combine-diff-binary-etc:
  combine-diff: respect textconv attributes
  refactor get_textconv to not require diff_filespec
  combine-diff: handle binary files as binary
  combine-diff: calculate mode_differs earlier
  combine-diff: split header printing into its own function
2011-06-29 17:03:10 -07:00
Junio C Hamano
ef49a7a012 zlib: zlib can only process 4GB at a time
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:52:15 -07:00
Junio C Hamano
225a6f1068 zlib: wrap deflateBound() too
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:18:17 -07:00
Junio C Hamano
55bb5c9147 zlib: wrap deflate side of the API
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().

There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:10:29 -07:00
Junio C Hamano
456a4c08b8 Merge branch 'jk/diff-not-so-quick'
* jk/diff-not-so-quick:
  diff: futureproof "stop feeding the backend early" logic
  diff_tree: disable QUICK optimization with diff filter

Conflicts:
	diff.c
2011-06-06 11:40:14 -07:00
Junio C Hamano
b3c89315a3 Merge branch 'jc/rename-degrade-cc-to-c' into maint
* jc/rename-degrade-cc-to-c:
  diffcore-rename: fall back to -C when -C -C busts the rename limit
  diffcore-rename: record filepair for rename src
  diffcore-rename: refactor "too many candidates" logic
  builtin/diff.c: remove duplicated call to diff_result_code()
2011-05-31 12:00:02 -07:00
Junio C Hamano
28b9264dd6 diff: futureproof "stop feeding the backend early" logic
Refactor the "do not stop feeding the backend early" logic into a small
helper function and use it in both run_diff_files() and diff_tree() that
has the stop-early optimization. We may later add other types of diffcore
transformation that require to look at the whole result like diff-filter
does, and having the logic in a single place is essential for longer term
maintainability.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-31 09:21:36 -07:00
Junio C Hamano
e5f85df87e diff --stat-count: finishing touches
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-27 21:50:39 -07:00
Michael J Gruber
808e1db231 diff: introduce --stat-lines to limit the stat lines
Often one is interested in the full --stat output only for commits which
change a few files, but not others, because larger restructuring gives a
--stat which fills a few screens.

Introduce a new option --stat-count=<count> which limits the --stat output
to the first <count> lines, followed by a "..." line. It can
also be given as the third parameter in
--stat=<width>,<name-width>,<count>.

Also, the unstuck form is supported analogous to the other two stat
parameters.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-27 10:44:34 -07:00