Luben Tuikov changed 'lineno' link from leading to commit which gave
current version of given block of lines, to leading to parent of this
commit in 244a70e (Blame "linenr" link jumps to previous state at
"orig_lineno"). This made possible data mining using 'blame' view.
The current implementation calls rev-parse once per each blamed line
to find parent revision of blamed commit, even when the same commit
appears more than once, which is inefficient.
This patch mitigates this issue by caching $parent_commit info in
%metainfo, which makes gitweb call rev-parse only once per each
unique commit in the output from "git blame".
In the tables below you can see simple benchmark comparing gitweb
performance before and after this patch
File | L[1] | C[2] || Time0[3] | Before[4] | After[4]
====================================================================
blob.h | 18 | 4 || 0m1.727s | 0m2.545s | 0m2.474s
GIT-VERSION-GEN | 42 | 13 || 0m2.165s | 0m2.448s | 0m2.071s
README | 46 | 6 || 0m1.593s | 0m2.727s | 0m2.242s
revision.c | 1923 | 121 || 0m2.357s | 0m30.365s | 0m7.028s
gitweb/gitweb.perl | 6291 | 428 || 0m8.080s | 1m37.244s | 0m20.627s
File | L/C | Before/After
=========================================
blob.h | 4.5 | 1.03
GIT-VERSION-GEN | 3.2 | 1.18
README | 7.7 | 1.22
revision.c | 15.9 | 4.32
gitweb/gitweb.perl | 14.7 | 4.71
As you can see the greater ratio of lines in file to unique commits
in blame output, the greater gain from the new implementation.
Legend:
[1] Number of lines:
$ wc -l <file>
[2] Number of unique commits in the blame output:
$ git blame -p <file> | grep author-time | wc -l
[3] Time for running "git blame -p" (user time, single run):
$ time git blame -p <file> >/dev/null
[4] Time to run gitweb as Perl script from command line:
$ gitweb-run.sh "p=.git;a=blame;f=<file>" > /dev/null 2>&1
The gitweb-run.sh script includes slightly modified (with adjusted
pathnames) code from gitweb_run() function from the test script
t/t9500-gitweb-standalone-no-errors.sh; gitweb config file
gitweb_config.perl contents (again up to adjusting pathnames; in
particular $projectroot variable should point to top directory of git
repository) can be found in the same place.
Discussion
~~~~~~~~~~
A possible future improvement would be to open a bidi pipe to
"git cat-file --batch-check", (like in Git::Repo in gitweb caching by
Lea Wiemann), feed $long_rev^ to it, and parse its output, which is
in the following form:
926b07e694599d86cec668475071b32147c95034 commit 637
This would mean one call to git-cat-file for the whole 'blame' view,
instead of one call to git-rev-parse per each unique commit in blame
output.
Yet another solution would be to change use of validate_refname() to
validate_revision() when checking script parameters (CGI query or
path_info), with validate_revision being something like the following:
sub validate_revision {
my $rev = shift;
return validate_refname(strip_rev_suffixes($rev));
}
so we don't need to calculate $long_rev^, but can pass "$long_rev^" as
'hb' parameter.
This solution has the advantage that it can be easily adapted to future
incremental blame output.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Acked-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Among others, here are the highlights:
* move variable declaration closer to the place it is set and used,
if possible,
* uniquify and simplify coding style a bit, which includes removing
unnecessary '()'.
* check type only if $hash was defined, as otherwise from the way
git_get_hash_by_path() is called (and works), we know that it is
a blob,
* use modern calling convention for git-blame,
* remove unused variable,
* don't use implicit variables ($_),
* add some comments
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Acked-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move l<line number> ID from <a> link element inside table row (inside
cell element for column with line numbers), to encompassing <tr> table
row element. It was done to make it easier to manipulate result HTML
with DOM, and to be able write 'blame_incremental' view with the same,
or nearly the same result.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Acked-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of fixes have been backmerged to 'maint' and are now contained
in 1.6.0.X series as the result, so drop them from this document.
Also contains typofix and duplicate removal pointed out by
Bjørn Lindeijer and Jakub Narebski.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
GIT 1.6.0.5
"git diff <tree>{3,}": do not reverse order of arguments
tag: delete TAG_EDITMSG only on successful tag
gitweb: Make project specific override for 'grep' feature work
http.c: use 'git_config_string' to get 'curl_http_proxy'
fetch-pack: Avoid memcpy() with src==dst
According to the message of commit 0fe7c1de16,
"git diff" with three or more trees expects the merged tree first followed by
the parents, in order. However, this command reversed the order of its
arguments, resulting in confusing diffs. A comment /* Again, the revs are all
reverse */ suggested there was a reason for this, but I can't figure out the
reason, so I removed the reversal of the arguments. Test case included.
Signed-off-by: Matt McCutchen <matt@mattmccutchen.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The user may put some effort into writing an annotated tag
message. When the tagging process later fails (which can
happen fairly easily, since it may be dependent on gpg being
correctly configured and used), there is no record left on
disk of the tag message.
Instead, let's keep the TAG_EDITMSG file around until we are
sure the tag has been created successfully. If we die
because of an error, the user can recover their text from
that file. Leaving the file in place causes no conflicts;
it will be silently overwritten by the next annotated tag
creation.
This matches the behavior of COMMIT_EDITMSG, which stays
around in case of error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'grep' feature was marked in the comments as having project
specific config, but it lacked 'sub' key required for it to work.
Kind-of-Noticed-by: Matt Kraai <kraai@ftbfs.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
memcpy() may only be used for disjoint memory areas, but when invoked
from cmd_fetch_pack(), we have my_args == &args. (The argument cannot
be removed entirely because transport.c invokes with its own
variable.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/am-options:
git-am: rename apply_opt_extra file to apply-opt
Test that git-am does not lose -C/-p/--whitespace options
git-am: propagate --3way options as well
git-am: propagate -C<n>, -p<n> options as well
git-am --whitespace: do not lose the command line option
All other state files use dash in their names, not underscores.
Also, there is no reason to call this "extra". Drop it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These tests make sure that "git am" does not lose command line options
specified when it was started, after it is interrupted by a patch that
does not apply earlier in the series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reasoning is the same as the previous patch, where we made -C<n>
and -p<n> propagate across a failure.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These options are meant to deal with patches that do not apply cleanly
due to the differences between the version the patch was based on and
the version "git am" is working on.
Because a series of patches applied in the same "git am" run tends to
come from the same source, it is more useful to propagate these options
after the application stops.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you start "git am --whitespace=fix" and the patch application process
is interrupted by an unapplicable patch early in the series, after
fixing the offending patch, the remainder of the patch should be processed
still with --whitespace=fix when restarted with "git am --resolved" (or
dropping the offending patch with "git am --skip").
The breakage was introduced by the commit 67dad68 (add -C[NUM] to git-am,
2007-02-08); this should fix it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running:
p4 where //depot/SomePath/...
The result can in some situations look like:
//depot/SomePath/... //client/SomePath/... /home/user/p4root/SomePath/...
-//depot/SomePath/UndesiredSubdir/... //client/SomePath/UndesiredSubdir/... /home/user/p4root/SomePath/UndesiredSubdir/...
This depends on the users Client view. The current p4Where method will now
return /home/user/p4root/SomePath/UndesiredSubdir/... which is not what we
want. This patch loops through the results from "p4 where", and picks the one
where the depotFile exactly matches the given depotPath (//depot/SomePath/...
in this example).
Signed-off-by: Tor Arvid Lund <torarvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So that full filesystem conditions or permissions problems won't go
unnoticed.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a generated builtin since 24b1f65f (Install git-stage in
exec-path).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'branch' subcommand incorrectly had the svn-remote to use hardcoded
as 'svn', the default remote name. This meant that branches derived
from other svn-remotes would try to use the branch and tag configuration
for the 'svn' remote, potentially copying would-be branches to the wrong
place in SVN, into the branch namespace for another project.
Fix this by using the remote name extracted from the svn info for the
specified git ref. Add a testcase for this behaviour.
[jc: squashed in a fix to test from Michael J Gruber for older svn (1.4)]
Signed-off-by: Deskin Miller <deskinm@umich.edu>
Acked-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/rm-i-t-a:
git add --intent-to-add: do not let an empty blob be committed by accident
git add --intent-to-add: fix removal of cached emptiness
builtin-rm.c: explain and clarify the "local change" logic
Extend index to save more flags
Earlier the plan was to eventually eradicate git-foo executables from the
filesystem for all the built-in commands, but when we released 1.6.0 we
decided not to do so. Instead, it has been promised that by prepending
the output from $(git --exec-path) to your $PATH, you can keep using the
dashed form of commands.
This also allows "git stage" to appear in the autogenerated command list,
which is used to offer man pages by "git help" command.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a corner case of large files whose lines do not match uniquely, the
loop to eliminate a line that matches multiple locations adjacent to a run
of lines that do not uniquely match wasted too much cycles. Fix this by
giving up early after scanning 100 lines in both direction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/maint-keep-pack:
repack: only unpack-unreachable if we are deleting redundant packs
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
Use new insert_file() subroutine to insert HTML chunks from external
files: $site_header, $home_text (by default indextext.html),
$site_footer, and $projectroot/$project/REAME.html.
All non-ASCII chars of those files will be broken by Perl IO layer
without decoding to utf8, so insert_file() does to_utf8() on each
printed line; alternate solution would be to open those files with
"binmode $fh, ':utf8'", or even all files with "use open qw(:std :utf8)".
Note that inserting README.html lost one of checks for simplicity.
Noticed-by: Tatsuki Sugiura <sugi@nemui.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't redirect stderr to /dev/null, use -q to suppress the output on
stderr.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are printfs around that do not grok '\1', but need '\01'.
Discovered on AIX 4.3.x.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This comes from conversation at the GitTogether where we thought it would
be helpful to be able to teach people to 'stage' files because it tends
to cause confusion when told that they have to keep 'add'ing them.
This continues the movement to start referring to the index as a
staging area (eg: the --staged alias to 'git diff'). Also adds a
doc file for 'git stage' that basically points to the docs for
'git add'.
Signed-off-by: Scott Chacon <schacon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We remove crud characters at the beginning and end of real-names so that
when we see email addresses like
From: "David S. Miller" <davem@davemloft.net>
we drop the quotes around the name when we parse that and split it up into
name and email.
However, the list of crud characters was basically just a random list of
common things that are found around names, and it didn't contain the
backslash character that some insane scripts seem to use when quoting
things. So now the kernel has a number of authors listed like
Author: \"Rafael J. Wysocki\ <rjw@sisk.pl>
because the author name had started out as
From: \"Rafael J. Wysocki\" <rjw@sisk.pl>
and the only "crud" character we noticed and removed was the final
double-quote at the end.
We should probably do better quote removal from names anyway, but this is
the minimal obvious patch.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces make variable NO_PTHREADS for platforms that lack the
support for pthreads library or people who do not want to use it for
whatever reason. When defined, it makes the multi-threaded index
preloading into a no-op, and also disables threaded delta searching by
pack-objects.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
Tested-by: Johannes Sixt <j6t@kdbg.org> (AIX 4.3.x)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patch that allows "git bisect skip" to be passed a range of
commits using the "<commit1>..<commit2>" notation is flawed because
it introduces a regression when it was passed a simple rev or commit.
"git bisect skip <commit>" doesn't work any more, because <commit>
is quoted but not properly unquoted.
This patch fixes that and add tests cases to better check when it is
passed commits and range of commits.
While at it, this patch also properly quotes the non range arguments
using the "sq" function.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
This is taken from a patch from Giuseppe but unfortunately it came
too late to replace the series that was already on "next". The comment
he updated here is better than the version we had previously, so I am
cherry-picking this bit not to lose it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>