"git blame" has been optimized greatly by reorganising the data
structure that is used to keep track of the work to be done, thanks
to David Karstrup <dak@gnu.org>.
* dk/blame-reorg:
blame: large-scale performance rewrite
The previous implementation used a single sorted linear list of blame
entries for organizing all partial or completed work. Every subtask had
to scan the whole list, with most entries not being relevant to the
task. The resulting run-time was quadratic to the number of separate
chunks.
This change gives every subtask its own data to work with. Subtasks are
organized into "struct origin" chains hanging off particular commits.
Commits are organized into a priority queue, processing them in commit
date order in order to keep most of the work affecting a particular blob
collated even in the presence of an extensive merge history.
For large files with a diversified history, a speedup by a factor of 3
or more is not unusual.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When show date in relative date format for git-blame, the max display
width of datetime is set as the length of the string "Thu Oct 19
16:00:04 2006 -0700" (30 characters long). But actually the max width
for C locale is only 22 (the length of string "x years, xx months ago").
And for other locale, it maybe smaller. E.g. For Chinese locale, only
needs a half (16-character width).
Set blame_date_width as the display width of _("4 years, 11 months
ago"), so that translators can make the choice.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Command `git blame --date relative` aligns the date field with a
fixed-width (defined by blame_date_width), and if time_str is shorter
than that, it adds spaces for padding. But there are two bugs in the
following codes:
time_len = strlen(time_str);
...
memset(time_buf + time_len, ' ', blame_date_width - time_len);
1. The type of blame_date_width is size_t, which is unsigned. If
time_len is greater than blame_date_width, the result of
"blame_date_width - time_len" will never be a negative number, but a
really big positive number, and will cause memory overwrite.
This bug can be triggered if either l10n message for function
show_date_relative() in date.c is longer than 30 characters, then
`git blame --date relative` may exit abnormally.
2. When show blame information with relative time, the UTF-8 characters
in time_str will break the alignment of columns after the date field.
This is because the time_buf padding with spaces should have a
constant display width, not a fixed strlen size. So we should call
utf8_strwidth() instead of strlen() for width calibration.
Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Attempts to show where a single-strand-of-pearls break in "git log"
output.
* nd/log-show-linear-break:
log: add --show-linear-break to help see non-linear history
object.h: centralize object flag allocation
While the field "flags" is mainly used by the revision walker, it is
also used in many other places. Centralize the whole flag allocation to
one place for a better overview (and easier to move flags if we have
too).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* dk/blame-janitorial:
builtin/blame.c::find_copy_in_blob: no need to scan for region end
blame.c: prepare_lines should not call xrealloc for every line
builtin/blame.c::prepare_lines: fix allocation size of sb->lineno
builtin/blame.c: eliminate same_suspect()
builtin/blame.c: struct blame_entry does not need a prev link
Shrink lifetime of variables by moving their definitions to an
inner scope where appropriate.
* ep/varscope:
builtin/gc.c: reduce scope of variables
builtin/fetch.c: reduce scope of variable
builtin/commit.c: reduce scope of variables
builtin/clean.c: reduce scope of variable
builtin/blame.c: reduce scope of variables
builtin/apply.c: reduce scope of variables
bisect.c: reduce scope of variable
Making a single preparation run for counting the lines will avoid memory
fragmentation. Also, fix the allocated memory size which was wrong
when sizeof(int *) != sizeof(int), and would have been too small
for sizeof(int *) < sizeof(int), admittedly unlikely.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are calling xrealloc on every single line, the least we can do
is get the right allocation size.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the origin pointers are "interned" and reference-counted, comparing
the pointers rather than the content is enough. The only uninterned
origins are cached values kept in commit->util, but same_suspect is not
called on them.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason to have a hardcoded upper limit of the number of
parents for an octopus merge, created via the graft mechanism.
* js/lift-parent-count-limit:
Remove the line length limit for graft files
Support for grafts predates Git's strbuf, and hence it is understandable
that there was a hard-coded line length limit of 1023 characters (which
was chosen a bit awkwardly, given that it is *exactly* one byte short of
aligning with the 41 bytes occupied by a commit name and the following
space or new-line character).
While regular commit histories hardly win comprehensibility in general
if they merge more than twenty-two branches in one go, it is not Git's
business to limit grafts in such a way.
In this particular developer's case, the use case that requires
substantially longer graft lines to be supported is the visualization of
the commits' order implied by their changes: commits are considered to
have an implicit relationship iff exchanging them in an interactive
rebase would result in merge conflicts.
Thusly implied branches tend to be very shallow in general, and the
resulting thicket of implied branches is usually very wide; It is
actually quite common that *most* of the commits in a topic branch have
not even one implied parent, so that a final merge commit has about as
many implied parents as there are commits in said branch.
[jc: squashed in tests by Jonathan]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/robustify-parse-commit:
checkout: do not die when leaving broken detached HEAD
use parse_commit_or_die instead of custom message
use parse_commit_or_die instead of segfaulting
assume parse_commit checks for NULL commit
assume parse_commit checks commit->object.parsed
log_tree_diff: die when we fail to parse a commit
Normally parse_pathspec() is used on command line arguments where it
can do fancy thing like parsing magic on each argument or adding magic
for all pathspecs based on --*-pathspecs options.
There's another use of parse_pathspec(), where pathspec is needed, but
the input is known to be pure paths. In this case we usually don't
want --*-pathspecs to interfere. And we definitely do not want to
parse magic in these paths, regardless of --literal-pathspecs.
Add new flag PATHSPEC_LITERAL_PATH for this purpose. When it's set,
--*-pathspecs are ignored, no magic is parsed. And if the caller
allows PATHSPEC_LITERAL (i.e. the next calls can take literal magic),
then PATHSPEC_LITERAL will be set.
This fixes cases where git chokes when GIT_*_PATHSPECS are set because
parse_pathspec() indicates it won't take any magic. But
GIT_*_PATHSPECS add them anyway. These are
export GIT_LITERAL_PATHSPECS=1
git blame -- something
git log --follow something
git log --merge
"git ls-files --with-tree=path" (aka parse_pathspec() in
overlay_tree_on_cache()) is safe because the input is empty, and
producing one pathspec due to PATHSPEC_PREFER_CWD does not take any
magic into account.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parse_commit function will check the "parsed" flag of
the object and do nothing if it is set. There is no need
for callers to check the flag themselves, and doing so only
clutters the code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
Teaches "git blame" to take more than one -L ranges.
* es/blame-L-twice:
line-range: reject -L line numbers less than 1
t8001/t8002: blame: add tests of -L line numbers less than 1
line-range: teach -L^:RE to search from start of file
line-range: teach -L:RE to search from end of previous -L range
line-range: teach -L^/RE/ to search from start of file
line-range-format.txt: document -L/RE/ relative search
log: teach -L/RE/ to search from end of previous -L range
blame: teach -L/RE/ to search from end of previous -L range
line-range: teach -L/RE/ to search relative to anchor point
blame: document multiple -L support
t8001/t8002: blame: add tests of multiple -L options
blame: accept multiple -L ranges
blame: inline one-line function into its lone caller
range-set: publish API for re-use by git-blame -L
line-range-format.txt: clarify -L:regex usage form
git-log.txt: place each -L option variation on its own line
Range specification -L/RE/ for blame/log unconditionally begins
searching at line one. Mailing list discussion [1] suggests that, in the
presence of multiple -L options, -L/RE/ should search relative to the
endpoint of the previous -L range, if any.
Teach the parsing machinery underlying blame's and log's -L options to
accept a start point for -L/RE/ searches. Follow-up patches will upgrade
blame and log to take advantage of this ability.
[1]: http://thread.gmane.org/gmane.comp.version-control.git/229755/focus=229966
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-blame accepts only a single -L option or none. Clients requiring
blame information for multiple disjoint ranges are therefore forced
either to invoke git-blame multiple times, once for each range, or only
once with no -L option to cover the entire file, both of which can be
costly. Teach git-blame to accept multiple -L ranges. Overlapping and
out-of-order ranges are accepted.
In this patch, the X in -LX,Y is absolute (for instance, /RE/ patterns
search from line 1), and Y is relative to X. Follow-up patches provide
more flexibility over how X is anchored.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of 25ed3412 (Refactor parse_loc; 2013-03-28),
blame.c:prepare_blame_range() became effectively a one-line function
which merely passes its arguments along to another function. This
indirection does not bring clarity to the code. Simplify by inlining
prepare_blame_range() into its lone caller.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since inception, -LX,Y has correctly reported an out-of-range error when
Y is beyond end of file, however, X was not checked, and an out-of-range
X would cause a crash. 92f9e273 (blame: prevent a segv when -L given
start > EOF; 2010-02-08) attempted to rectify this shortcoming but has
its own off-by-one error which allows X to extend one line past end of
file. For example, given a file with 5 lines:
git blame -L5 foo # OK, blames line 5
git blame -L6 foo # accepted, no error, no output, huh?
git blame -L7 foo # error "fatal: file foo has only 5 lines"
Fix this bug.
In order to avoid regressing "blame foo" when foo is an empty file, the
fix is slightly more complicated than changing '<' to '<='.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.
This patch does not change semantics of any command intentionally.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, move free_pathspec() to pathspec.c
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tr/line-log:
git-log(1): remove --full-line-diff description
line-log: fix documentation formatting
log -L: improve comments in process_all_files()
log -L: store the path instead of a diff_filespec
log -L: test merge of parallel modify/rename
t4211: pass -M to 'git log -M -L...' test
log -L: fix overlapping input ranges
log -L: check range set invariants when we look it up
Speed up log -L... -M
log -L: :pattern:file syntax to find by funcname
Implement line-history search (git log -L)
Export rewrite_parents() for 'log -L'
Refactor parse_loc
pretty-printing body of the commit that is stored in non UTF-8
encoding did not work well. The early part of this series fixes
it. And then it adds %C(auto) specifier that turns the coloring on
when we are emitting to the terminal, and adds column-aligning
format directives.
* nd/pretty-formats:
pretty: support %>> that steal trailing spaces
pretty: support truncating in %>, %< and %><
pretty: support padding placeholders, %< %> and %><
pretty: add %C(auto) for auto-coloring
pretty: split color parsing into a separate function
pretty: two phase conversion for non utf-8 commits
utf8.c: add reencode_string_len() that can handle NULs in string
utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences
utf8.c: move display_mode_esc_sequence_len() for use by other functions
pretty: share code between format_decoration and show_decorations
pretty-formats.txt: wrap long lines
pretty: get the correct encoding for --pretty:format=%e
pretty: save commit encoding from logmsg_reencode if the caller needs it
The commit encoding is parsed by logmsg_reencode, there's no need for
the caller to re-parse it again. The reencoded message now has the new
encoding, not the original one. The caller would need to read commit
object again before parsing.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
split_ident_line() can leave us with the pointers date_begin, date_end,
tz_begin and tz_end all set to NULL. Check them before use and supply
the same fallback values as in the case of a negative return code from
split_ident_line().
The "(unknown)" is not actually shown in the output, though, because it
will be converted to a number (zero) eventually.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new syntax finds a funcname matching /pattern/, and then takes from there
up to (but not including) the next funcname. So you can say
git log -L:main:main.c
and it will dig up the main() function and show its line-log, provided
there are no other funcnames matching 'main'.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to use the same style of -L n,m argument for 'git log -L' as
for git-blame. Refactor the argument parsing of the range arguments
from builtin/blame.c to the (new) file that will hold the 'git log -L'
logic.
To accommodate different data structures in blame and log -L, the file
contents are abstracted away; parse_range_arg takes a callback that it
uses to get the contents of a line of the (notional) file.
The new test is for a case that made me pause during debugging: the
'blame -L with invalid end' test was the only one that noticed an
outright failure to parse the end *at all*. So make a more explicit
test for that.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usually a commit that makes it to logmsg_reencode will have
been parsed, and the commit->buffer struct member will be
valid. However, some code paths will free commit buffers
after having used them (for example, the log traversal
machinery will do so to keep memory usage down).
Most of the time this is fine; log should only show a commit
once, and then exits. However, there are some code paths
where this does not work. At least two are known:
1. A commit may be shown as part of a regular ref, and
then it may be shown again as part of a submodule diff
(e.g., if a repo contains refs to both the superproject
and subproject).
2. A notes-cache commit may be shown during "log --all",
and then later used to access a textconv cache during a
diff.
Lazily loading in logmsg_reencode does not necessarily catch
all such cases, but it should catch most of them. Users of
the commit buffer tend to be either parsing for structure
(in which they will call parse_commit, and either we will
already have parsed, or we will load commit->buffer lazily
there), or outputting (either to the user, or fetching a
part of the commit message via format_commit_message). In
the latter case, we should always be using logmsg_reencode
anyway (and typically we do so via the pretty-print
machinery).
If there are any cases that this misses, we can fix them up
to use logmsg_reencode (or handle them on a case-by-case
basis if that is inappropriate).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logmsg_reencode function will return the reencoded
commit buffer, or NULL if reencoding failed or no reencoding
was necessary. Since every caller then ends up checking for NULL
and just using the commit's original buffer, anyway, we can
be a bit more helpful and just return that buffer when we
would have returned NULL.
Since the resulting string may or may not need to be freed,
we introduce a logmsg_free, which checks whether the buffer
came from the commit object or not (callers either
implemented the same check already, or kept two separate
pointers, one to mark the buffer to be used, and one for the
to-be-freed string).
Pushing this logic into logmsg_* simplifies the callers, and
will let future patches lazily load the commit buffer in a
single place.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach commands in the "log" family to optionally pay attention to
the mailmap.
* ap/log-mailmap:
log --use-mailmap: optimize for cases without --author/--committer search
log: add log.mailmap configuration option
log: grep author/committer using mailmap
test: add test for --use-mailmap option
log: add --use-mailmap option
pretty: use mailmap to display username and email
mailmap: add mailmap structure to rev_info and pp
mailmap: simplify map_user() interface
mailmap: remove email copy and length limitation
Use split_ident_line to parse author and committer
string-list: allow case-insensitive string list
Simplify map_user(), mostly to avoid copies of string buffers. It
also simplifies caller functions.
map_user() directly receive pointers and length from the commit buffer
as mail and name. If mapping of the user and mail can be done, the
pointer is updated to a new location. Lengths are also updated if
necessary.
The caller of map_user() can then copy the new email and name if
necessary.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently blame.c::get_acline(), pretty.c::pp_user_info() and
shortlog.c::insert_one_record() are parsing author name, email, time
and tz themselves.
Use ident.c::split_ident_line() for better code reuse.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function has only two callsites, and is a thin wrapper whose
usefulness is dubious. When the caller needs to learn the log
output encoding, it should be able to do so by directly calling
get_log_output_encoding() and calling the underlying
logmsg_reencode() with it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you know your history did not have renames, or if you care only
about the history after a large rename that happened some time ago,
"git blame --no-follow $path" is a way to tell the command not to
bother about renames.
When you use -C, the lines that came from the renamed file will
still be found without the whole-file rename detection, so it is not
all that interesting either way, though.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame MAKEFILE" run in a history that has "Makefile" but not
"MAKEFILE" should say "No such file MAKEFILE in HEAD", but got
confused on a case insensitive filesystem and failed to do so.
Even during a conflicted merge, "git blame $path" always meant to
blame uncommitted changes to the "working tree" version; make it
more useful by showing cleanly merged parts as coming from the other
branch that is being merged.
* jc/maint-blame-no-such-path:
blame: allow "blame file" in the middle of a conflicted merge
blame $path: avoid getting fooled by case insensitive filesystems
"git blame file" has always meant "find the origin of each line of
the file in the history leading to HEAD, oh by the way, blame the
lines that are modified locally to the working tree".
This teaches "git blame" that during a conflicted merge, some
uncommitted changes may have come from the other history that is
being merged.
The verify_working_tree_path() function introduced in the previous
patch to notice a typo in the filename (primarily on case insensitive
filesystems) has been updated to allow a filename that does not exist
in HEAD (i.e. the tip of our history) as long as it exists one of the
commits being merged, so that a "we deleted, the other side modified"
case tracks the history of the file in the history of the other side.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame MAKEFILE" run in a history that has "Makefile" but not
MAKEFILE can get confused on a case insensitive filesystem, because
the check we run to see if there is a corresponding file in the
working tree with lstat("MAKEFILE") succeeds. In addition to that
check, we have to make sure that the given path also exists in the
commit we start digging history from (i.e. "HEAD").
Note that this reveals the breakage in a test added in cd8ae20
(git-blame shouldn't crash if run in an unmerged tree, 2007-10-18),
which expects the entire merge-in-progress path to be blamed to the
working tree when it did not exist in our tree. As it is clear in
the log message of that commit, the old breakage was that it was
causing an internal error and the fix was about avoiding it.
Just check that the command does not die an uncontrolled death. For
this particular case, the blame should fail, as the history for the
file in that contents has not been committed yet at the point in the
test.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
A lot of i18n mark-up for the help text from "git <cmd> -h".
* nd/i18n-parseopt-help: (66 commits)
Use imperative form in help usage to describe an action
Reduce translations by using same terminologies
i18n: write-tree: mark parseopt strings for translation
i18n: verify-tag: mark parseopt strings for translation
i18n: verify-pack: mark parseopt strings for translation
i18n: update-server-info: mark parseopt strings for translation
i18n: update-ref: mark parseopt strings for translation
i18n: update-index: mark parseopt strings for translation
i18n: tag: mark parseopt strings for translation
i18n: symbolic-ref: mark parseopt strings for translation
i18n: show-ref: mark parseopt strings for translation
i18n: show-branch: mark parseopt strings for translation
i18n: shortlog: mark parseopt strings for translation
i18n: rm: mark parseopt strings for translation
i18n: revert, cherry-pick: mark parseopt strings for translation
i18n: rev-parse: mark parseopt strings for translation
i18n: reset: mark parseopt strings for translation
i18n: rerere: mark parseopt strings for translation
i18n: status: mark parseopt strings for translation
i18n: replace: mark parseopt strings for translation
...