mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-02 15:28:21 +01:00

Author	SHA1	Message	Date
Junio C Hamano	17501ba1cd	Merge branch 'nd/fbsd-lazy-mtime' FreeBSD can lie when asked mtime of a directory, which made the untracked cache code to fall back to a slow-path, which in turn caused tests in t7063 to fail because it wanted to verify the behaviour of the fast-path. * nd/fbsd-lazy-mtime: t7063: work around FreeBSD's lazy mtime update feature	2016-08-08 14:48:42 -07:00
Junio C Hamano	3a3338d373	Merge branch 'nd/log-decorate-color-head-arrow' An entry "git log --decorate" for the tip of the current branch is shown as "HEAD -> name" (where "name" is the name of the branch); paint the arrow in the same color as "HEAD", not in the color for commits. * nd/log-decorate-color-head-arrow: log: decorate HEAD -> branch with the same color for arrow and HEAD	2016-08-08 14:48:42 -07:00
Junio C Hamano	940622bc8b	Merge branch 'rs/use-strbuf-addstr' * rs/use-strbuf-addstr: use strbuf_addstr() instead of strbuf_addf() with "%s" use strbuf_addstr() for adding constant strings to a strbuf	2016-08-08 14:48:41 -07:00
Junio C Hamano	68e80da479	Merge branch 'rs/st-mult' Micro optimization of st_mult() facility used to check the integer overflow coming from multiplication to compute size of memory allocation. * rs/st-mult: pass constants as first argument to st_mult()	2016-08-08 14:48:41 -07:00
Junio C Hamano	09ee6444f2	Merge branch 'ib/t3700-add-chmod-x-updates' The t3700 test about "add --chmod=-x" have been made a bit more robust and generally cleaned up. * ib/t3700-add-chmod-x-updates: t3700: add a test_mode_in_index helper function t3700: merge two tests into one t3700: remove unwanted leftover files before running new tests	2016-08-08 14:48:40 -07:00
Junio C Hamano	ae674b0130	Merge branch 'ab/gitweb-link-html-escape' The characters in the label shown for tags/refs for commits in "gitweb" output are now properly escaped for proper HTML output. * ab/gitweb-link-html-escape: gitweb: escape link body in format_ref_marker	2016-08-08 14:48:40 -07:00
Junio C Hamano	78849622ec	Merge branch 'jk/pack-objects-optim' "git pack-objects" has a few options that tell it not to pack objects found in certain packfiles, which require it to scan .idx files of all available packs. The codepaths involved in these operations have been optimized for a common case of not having any non-local pack and/or any .kept pack. * jk/pack-objects-optim: pack-objects: compute local/ignore_pack_keep early pack-objects: break out of want_object loop early find_pack_entry: replace last_found_pack with MRU cache add generic most-recently-used list sha1_file: drop free_pack_by_name t/perf: add tests for many-pack scenarios	2016-08-08 14:48:39 -07:00
Junio C Hamano	7647537e3e	Merge branch 'jk/difftool-in-subdir' "git difftool <paths>..." started in a subdirectory failed to interpret the paths relative to that directory, which has been fixed. * jk/difftool-in-subdir: difftool: use Git::* functions instead of passing around state difftool: avoid $GIT_DIR and $GIT_WORK_TREE difftool: fix argument handling in subdirs	2016-08-08 14:48:39 -07:00
Junio C Hamano	768ededa9c	Merge branch 'va/i18n' More i18n marking. * va/i18n: i18n: config: unfold error messages marked for translation i18n: notes: mark comment for translation	2016-08-08 14:48:38 -07:00
Junio C Hamano	6b9114c649	Merge branch 'js/rebase-i-progress-tidy' Regression fix for an i18n topic already in 'master'. * js/rebase-i-progress-tidy: rebase-interactive: trim leading whitespace from progress count	2016-08-08 14:48:38 -07:00
Junio C Hamano	0d3279962a	Merge branch 'jk/reflog-date' The reflog output format is documented better, and a new format --date=unix to report the seconds-since-epoch (without timezone) has been added. * jk/reflog-date: date: clarify --date=raw description date: add "unix" format date: document and test "raw-local" mode doc/pretty-formats: explain shortening of %gd doc/pretty-formats: describe index/time formats for %gd doc/rev-list-options: explain "-g" output formats doc/rev-list-options: clarify "commit@{Nth}" for "-g" option	2016-08-08 14:48:37 -07:00
Junio C Hamano	4c30ad8cc6	Merge branch 'cp/completion-clone-recurse-submodules' * cp/completion-clone-recurse-submodules: completion: add option '--recurse-submodules' to 'git clone'	2016-08-08 14:48:37 -07:00
Junio C Hamano	4d7f59aece	Merge branch 'jk/t4205-cleanup' Test modernization. * jk/t4205-cleanup: t4205: indent here documents t4205: drop top-level &&-chaining	2016-08-08 14:48:36 -07:00
Junio C Hamano	f2be3b73e0	Merge branch 'da/subtree-modernize' Style fixes for "git subtree" (in contrib/). * da/subtree-modernize: subtree: adjust function definitions to match CodingGuidelines subtree: adjust style to match CodingGuidelines	2016-08-08 14:48:35 -07:00
Junio C Hamano	abbf7bd495	Merge branch 'nd/fetch-ref-summary' Hotfix of a test in a topic that has already been merged to 'master'. * nd/fetch-ref-summary: t5510: skip tests under GETTEXT_POISON build	2016-08-08 14:48:35 -07:00
Junio C Hamano	612c3dfb06	Merge branch 'ew/git-svn-http-tests' Tests for "git svn" have been taught to reuse the lib-httpd test infrastructure when testing the subversion integration that interacts with subversion repositories served over the http:// protocol. * ew/git-svn-http-tests: git svn: migrate tests to use lib-httpd t/t91*: do not say how to avoid the tests	2016-08-08 14:48:34 -07:00
Junio C Hamano	3819fb9ab4	Merge branch 'js/t4130-rename-without-ino' Windows port was failing some tests in t4130, due to the lack of inum in the returned values by its lstat(2) emulation. * js/t4130-rename-without-ino: t4130: work around Windows limitation	2016-08-08 14:48:33 -07:00
René Scharfe	bc57b9c0cc	use strbuf_addstr() instead of strbuf_addf() with "%s" Call strbuf_addstr() for adding a simple string to a strbuf instead of using the heavier strbuf_addf(). This is shorter and documents the intent more clearly. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-05 15:09:25 -07:00
Junio C Hamano	c6b0597e9a	Tenth batch for 2.10 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-04 14:40:34 -07:00
Junio C Hamano	b422d99658	Merge branch 'jc/grep-commandline-vs-configuration' "git -c grep.patternType=extended log --basic-regexp" misbehaved because the internal API to access the grep machinery was not designed well. * jc/grep-commandline-vs-configuration: grep: further simplify setting the pattern type	2016-08-04 14:39:18 -07:00
Junio C Hamano	1e9a4856fb	Merge branch 'sb/submodule-clone-retry' An earlier tweak to make "submodule update" retry a failing clone of submodules was buggy and caused segfault, which has been fixed. * sb/submodule-clone-retry: submodule-helper: fix indexing in clone retry error reporting path git-submodule: forward exit code of git-submodule--helper more faithfully	2016-08-04 14:39:17 -07:00
Junio C Hamano	10881f076e	Merge branch 'sb/pack-protocol-doc-nak' A doc update. * sb/pack-protocol-doc-nak: Documentation: pack-protocol correct NAK response	2016-08-04 14:39:16 -07:00
Nguyễn Thái Ngọc Duy	6b7728db81	t7063: work around FreeBSD's lazy mtime update feature Let's start with the commit message of [1] from freebsd.git [2] Sync timestamp changes for inodes of special files to disk as late as possible (when the inode is reclaimed). Temporarily only do this if option UFS_LAZYMOD configured and softupdates aren't enabled. UFS_LAZYMOD is intentionally left out of /sys/conf/options. This is mainly to avoid almost useless disk i/o on battery powered machines. It's silly to write to disk (on the next sync or when the inode becomes inactive) just because someone hit a key or something wrote to the screen or /dev/null. PR: 5577 [3] The short version of that, in the context of t7063, is that when a directory is updated, its mtime may be updated later, not immediately. This can be shown with a simple command sequence date; sleep 1; touch abc; rm abc; sleep 10; ls -lTd . One would expect that the date shown in `ls` would be one second from `date`, but it's 10 seconds later. If we put another `ls -lTd .` in front of `sleep 10`, then the date of the last `ls` comes as expected. The first `ls` somehow forces mtime to be updated. t7063 is really sensitive to directory mtime. When mtime is too "new", git code suspects racy timestamps and will not trigger the shortcut in untracked cache, in t7063.24 and eventually be detected in t7063.27 We have two options thanks to this special FreeBSD feature: 1) Stop supporting untracked cache on FreeBSD. Skip t7063 entirely when running on FreeBSD 2) Work around this problem (using the same 'ls' trick) and continue to support untracked cache on FreeBSD I initially wanted to go with 1) because I didn't know the exact nature of this feature and feared that it would make untracked cache work unreliably, using the cached version when it should not. Since the behavior of this thing is clearer now. The picture is not that bad. If this indeed happens often, untracked cache would assume racy condition more often and _fall back_ to non-untracked cache code paths. Which means it may be less effective, but it will not show wrong things. This patch goes with option 2. PS. For those who want to look further in FreeBSD source code, this flag is now called IN_LAZYMOD. I can see it's effective in ext2 and ufs. zfs is not affected. [1] 660e6408e6df99a20dacb070c5e7f9739efdf96d [2] git://github.com/freebsd/freebsd.git [3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=5577 Reported-by: Eric Wong <e@80x24.org> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-04 09:51:42 -07:00
Junio C Hamano	80460f513e	Ninth batch of topics for 2.10 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 15:13:16 -07:00
Junio C Hamano	767da54bf8	Merge branch 'jk/diff-do-not-reuse-wtf-needs-cleaning' There is an optimization used in "git diff $treeA $treeB" to borrow an already checked-out copy in the working tree when it is known to be the same as the blob being compared, expecting that open/mmap of such a file is faster than reading it from the object store, which involves inflating and applying delta. This however kicked in even when the checked-out copy needs to go through the convert-to-git conversion (including the clean filter), which defeats the whole point of the optimization. The optimization has been disabled when the conversion is necessary. * jk/diff-do-not-reuse-wtf-needs-cleaning: diff: do not reuse worktree files that need "clean" conversion	2016-08-03 15:10:29 -07:00
Junio C Hamano	f4fa8a9b18	Merge branch 'rs/submodule-config-code-cleanup' Code cleanup. * rs/submodule-config-code-cleanup: submodule-config: fix test binary crashing when no arguments given submodule-config: combine early return code into one goto submodule-config: passing name reference for .gitmodule blobs submodule-config: use explicit empty string instead of strbuf in config_from()	2016-08-03 15:10:28 -07:00
Junio C Hamano	a58a8e3f71	Merge branch 'jk/push-progress' "git push" and "git clone" learned to give better progress meters to the end user who is waiting on the terminal. * jk/push-progress: receive-pack: send keepalives during quiet periods receive-pack: turn on connectivity progress receive-pack: relay connectivity errors to sideband receive-pack: turn on index-pack resolving progress index-pack: add flag for showing delta-resolution progress clone: use a real progress meter for connectivity check check_connected: add progress flag check_connected: relay errors to alternate descriptor check_everything_connected: use a struct with named options check_everything_connected: convert to argv_array rev-list: add optional progress reporting check_everything_connected: always pass --quiet to rev-list	2016-08-03 15:10:28 -07:00
Junio C Hamano	67b3a5d4c0	Merge branch 'jt/fetch-large-handshake-window-on-http' "git fetch" exchanges batched have/ack messages between the sender and the receiver, initially doubling every time and then falling back to enlarge the window size linearly. The "smart http" transport, being an half-duplex protocol, outgrows the preset limit too quickly and becomes inefficient when interacting with a large repository. The internal mechanism learned to grow the window size more aggressively when working with the "smart http" transport. * jt/fetch-large-handshake-window-on-http: fetch-pack: grow stateless RPC windows exponentially	2016-08-03 15:10:27 -07:00
Junio C Hamano	5569c01be8	Merge branch 'jk/git-jump' "git jump" script (in contrib/) has been updated a bit. * jk/git-jump: contrib/git-jump: fix typo in README contrib/git-jump: add whitespace-checking mode contrib/git-jump: fix greedy regex when matching hunks	2016-08-03 15:10:27 -07:00
Junio C Hamano	5a2f4d3eef	Merge branch 'mm/status-suggest-merge-abort' "git status" learned to suggest "merge --abort" during a conflicted merge, just like it already suggests "rebase --abort" during a conflicted rebase. * mm/status-suggest-merge-abort: status: suggest 'git merge --abort' when appropriate	2016-08-03 15:10:26 -07:00
Junio C Hamano	d083d420b7	Merge branch 'jk/parse-options-concat' Users of the parse_options_concat() API function need to allocate extra slots in advance and fill them with OPT_END() when they want to decide the set of supported options dynamically, which makes the code error-prone and hard to read. This has been corrected by tweaking the API to allocate and return a new copy of "struct option" array. * jk/parse-options-concat: parse_options: allocate a new array when concatenating	2016-08-03 15:10:25 -07:00
Junio C Hamano	cf27c7996e	Merge branch 'sb/push-options' "git push" learned to accept and pass extra options to the receiving end so that hooks can read and react to them. * sb/push-options: add a test for push options push: accept push options receive-pack: implement advertising and receiving push options push options: {pre,post}-receive hook learns about push options	2016-08-03 15:10:24 -07:00
Junio C Hamano	4067a45438	Merge branch 'ew/http-walker' Dumb http transport on the client side has been optimized. * ew/http-walker: list: avoid incompatibility with *BSD sys/queue.h http-walker: reduce O(n) ops with doubly-linked list http: avoid disconnecting on 404s for loose objects http-walker: remove unused parameter from fetch_object	2016-08-03 15:10:24 -07:00
Junio C Hamano	6395499a68	Merge branch 'pm/build-persistent-https-with-recent-go' The build procedure for "git persistent-https" helper (in contrib/) has been updated so that it can be built with more recent versions of Go. * pm/build-persistent-https-with-recent-go: contrib/persistent-https: use Git version for build label contrib/persistent-https: update ldflags syntax for Go 1.7+	2016-08-03 15:10:23 -07:00
Junio C Hamano	c0728edfb6	Merge branch 'da/subtree-2.9-regression' "git merge" in Git v2.9 was taught to forbid merging an unrelated lines of history by default, but that is exactly the kind of thing the "--rejoin" mode of "git subtree" (in contrib/) wants to do. "git subtree" has been taught to use the "--allow-unrelated-histories" option to override the default. * da/subtree-2.9-regression: subtree: fix "git subtree split --rejoin" t7900-subtree.sh: fix quoting and broken && chains	2016-08-03 15:10:22 -07:00
Junio C Hamano	a35031240b	Merge branch 'os/no-verify-skips-commit-msg-too' "git commit --help" said "--no-verify" is only about skipping the pre-commit hook, and failed to say that it also skipped the commit-msg hook. * os/no-verify-skips-commit-msg-too: commit: describe that --no-verify skips the commit-msg hook in the help text	2016-08-03 15:10:22 -07:00
Johannes Sixt	54956df9bc	t4130: work around Windows limitation On Windows, it is already pretty expensive to try to recreate the stat() data that Git assumes is cheap to obtain. To make things halfway decent in performance, we even have to skip emulating the inode and to determine the number of hard links. This is not a huge problem, usually, as either the size or the mtime or the ctime are tell-tale enough to say when a file has changed, and even if not, those changes are typically made after the index file was written, triggering a rehashing of the files' contents. The t4130-apply-criss-cross-rename test case, however, requires the inode to determine that files of equal size were swapped, as renaming files does not update their mtime. Every once in a while, t4130 fails on Windows because of this missing piece. Equal file sizes are not crucial for the test cases, however. Hence, generate files with different sizes so that there is some property that the swapped files can be discovered reliably even on Windows. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 08:47:38 -07:00
Ingo Brückl	766cdc4147	t3700: add a test_mode_in_index helper function The case statement to check the file mode of a staged file appears a number of times. Simplify the test by utilizing a test_mode_in_index helper function. Signed-off-by: Ingo BrÃ¼ckl <ib@wupperonline.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:25:30 -07:00
Ingo Brückl	b38ab197c2	t3700: merge two tests into one Depending on the underlying platform a chmod may be a noop. Although it wouldn't harm the result of the '--chmod=-x' test, there is a more robust way to make sure the --chmod option works both ways. Merge the two separate tests for the --chmod option into one, checking both permissions on the same file. Signed-off-by: Ingo BrÃ¼ckl <ib@wupperonline.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:20:53 -07:00
Ingo Brückl	c0fa44d8f1	t3700: remove unwanted leftover files before running new tests When an earlier test that has prerequisite is skipped, files used by later tests may be left in the working tree in an unexpected state. For example, a test runs this sequence: echo foo >xfoo1 && chmod 755 xfoo1 to create an executable file xfoo1, expecting that xfoo1 does not exist before it runs in the test sequence. However, the absence of this file depends on "git reset --hard" done in an earlier test, that is skipped when SANITY prerequisite is not met, and worse yet, xfoo1 originally is created as a symbolic link, which means the chmod does not affect the modes of xfoo1 as this test expects. Fix this by starting the test with "rm -f xfoo1" to make sure the file is created from scratch, and do the same to other similar tests. Signed-off-by: Ingo BrÃ¼ckl <ib@wupperonline.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:20:51 -07:00
René Scharfe	50492f7b38	pass constants as first argument to st_mult() The result of st_mult() is the same no matter the order of its arguments. It invokes the macro unsigned_mult_overflows(), which divides the second parameter by the first one. Pass constants first to allow that division to be done already at compile time. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:01:03 -07:00
René Scharfe	02962d3684	use strbuf_addstr() for adding constant strings to a strbuf Replace uses of strbuf_addf() for adding strings with more lightweight strbuf_addstr() calls. In http-push.c it becomes easier to see what's going on without having to verfiy that the definition of PROPFIND_ALL_REQUEST doesn't contain any format specifiers. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 13:42:10 -07:00
Andreas Brauchli	77947bbe24	gitweb: escape link body in format_ref_marker Fix a case where an html link can be generated from unescaped input resulting in invalid strict xhtml or potentially injected code. An overview of a repo with a tag "1.0.0&0.0.1" would previously result in an unescaped ampersand in the link body. Signed-off-by: Andreas Brauchli <a.brauchli@elementarea.net> Acked-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 12:55:40 -07:00
Jeff King	56dfeb6263	pack-objects: compute local/ignore_pack_keep early In want_object_in_pack(), we can exit early from our loop if neither "local" nor "ignore_pack_keep" are set. If they are, however, we must examine each pack to see if it has the object and is non-local or has a ".keep". It's quite common for there to be no non-local or .keep packs at all, in which case we know ahead of time that looking further will be pointless. We can pre-compute this by simply iterating over the list of packs ahead of time, and dropping the flags if there are no packs that could match. Another similar strategy would be to modify the loop in want_object_in_pack() to notice that we have already found the object once, and that we are looping only to check for "local" and "keep" attributes. If a pack has neither of those, we can skip the call to find_pack_entry_one(), which is the expensive part of the loop. This has two advantages: - it isn't all-or-nothing; we still get some improvement when there's a small number of kept or non-local packs, and a large number of non-kept local packs - it eliminates any possible race where we add new non-local or kept packs after our initial scan. In practice, I don't think this race matters; we already cache the packed_git information, so somebody who adds a new pack or .keep file after we've started will not be noticed at all, unless we happen to need to call reprepare_packed_git() because a lookup fails. In other words, we're already racy, and the race is not a big deal (losing the race means we might include an object in the pack that would not otherwise be, which is an acceptable outcome). However, it also has a disadvantage: we still loop over the rest of the packs for each object to check their flags. This is much less expensive than doing the object lookup, but still not free. So if we wanted to implement that strategy to cover the non-all-or-nothing cases, we could do so in addition to this one (so you get the most speedup in the all-or-nothing case, and the best we can do in the other cases). But given that the all-or-nothing case is likely the most common, it is probably not worth the trouble, and we can revisit this later if evidence points otherwise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:08 -07:00
Jeff King	cd37996795	pack-objects: break out of want_object loop early When pack-objects collects the list of objects to pack (either from stdin, or via its internal rev-list), it filters each one through want_object_in_pack(). This function loops through each existing packfile, looking for the object. When we find it, we mark the pack/offset combo for later use. However, we can't just return "yes, we want it" at that point. If --honor-pack-keep is in effect, we must keep looking to find it in _all_ packs, to make sure none of them has a .keep. Likewise, if --local is in effect, we must make sure it is not present in any non-local pack. As a result, the sum effort of these calls is effectively O(nr_objects * nr_packs). In an ordinary repository, we have only a handful of packs, and this doesn't make a big difference. But in pathological cases, it can slow the counting phase to a crawl. This patch notices the case that we have neither "--local" nor "--honor-pack-keep" in effect and breaks out of the loop early, after finding the first instance. Note that our worst case is still "objects * packs" (i.e., we might find each object in the last pack we look in), but in practice we will often break out early. On an "average" repo, my git.git with 8 packs, this shows a modest 2% (a few dozen milliseconds) improvement in the counting-objects phase of "git pack-objects --all <foo" (hackily instrumented by sticking exit(0) right after list_objects). But in a much more pathological case, it makes a bigger difference. I ran the same command on a real-world example with ~9 million objects across 1300 packs. The counting time dropped from 413s to 45s, an improvement of about 89%. Note that this patch won't do anything by itself for a normal "git gc", as it uses both --honor-pack-keep and --local. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:07 -07:00
Jeff King	a73cdd21c4	find_pack_entry: replace last_found_pack with MRU cache Each pack has an index for looking up entries in O(log n) time, but if we have multiple packs, we have to scan through them linearly. This can produce a measurable overhead for some operations. We dealt with this long ago in `f7c22cc` (always start looking up objects in the last used pack first, 2007-05-30), which keeps what is essentially a 1-element most-recently-used cache. In theory, we should be able to do better by keeping a similar but longer cache, that is the same length as the pack-list itself. Since we now have a convenient generic MRU structure, we can plug it in and measure. Here are the numbers for running p5303 against linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5303.3: rev-list (1) 31.56(31.28+0.27) 31.30(31.08+0.20) -0.8% 5303.4: repack (1) 40.62(39.35+2.36) 40.60(39.27+2.44) -0.0% 5303.6: rev-list (50) 31.31(31.06+0.23) 31.23(31.00+0.22) -0.3% 5303.7: repack (50) 58.65(69.12+1.94) 58.27(68.64+2.05) -0.6% 5303.9: rev-list (1000) 38.74(38.40+0.33) 31.87(31.62+0.24) -17.7% 5303.10: repack (1000) 367.20(441.80+4.62) 342.00(414.04+3.72) -6.9% The main numbers of interest here are the rev-list ones (since that is exercising the normal object lookup code path). The single-pack case shouldn't improve at all; the 260ms speedup there is just part of the run-to-run noise (but it's important to note that we didn't make anything worse with the overhead of maintaining our cache). In the 50-pack case, we see similar results. There may be a slight improvement, but it's mostly within the noise. The 1000-pack case does show a big improvement, though. That carries over to the repack case, as well. Even though we haven't touched its pack-search loop yet, it does still do a lot of normal object lookups (e.g., for the internal revision walk), and so improves. As a point of reference, I also ran the 1000-pack test against a version of HEAD^ with the last_found_pack optimization disabled. It takes ~60s, so that gives an indication of how much even the single-element cache is helping. For comparison, here's a smaller repository, git.git: Test HEAD^ HEAD --------------------------------------------------------------------- 5303.3: rev-list (1) 1.56(1.54+0.01) 1.54(1.51+0.02) -1.3% 5303.4: repack (1) 1.84(1.80+0.10) 1.82(1.80+0.09) -1.1% 5303.6: rev-list (50) 1.58(1.55+0.02) 1.59(1.57+0.01) +0.6% 5303.7: repack (50) 2.50(3.18+0.04) 2.50(3.14+0.04) +0.0% 5303.9: rev-list (1000) 2.76(2.71+0.04) 2.24(2.21+0.02) -18.8% 5303.10: repack (1000) 13.21(19.56+0.25) 11.66(18.01+0.21) -11.7% You can see that the percentage improvement is similar. That's because the lookup we are optimizing is roughly O(nr_objects * nr_packs). Since the number of packs is constant in both tests, we'd expect the improvement to be linear in the number of objects. But the whole process is also linear in the number of objects, so the improvement is a constant factor. The exact improvement does also depend on the contents of the packs. In p5303, the extra packs all have 5 first-parent commits in them, which is a reasonable simulation of a pushed-to repository. But it also means that only 250 first-parent commits are in those packs (compared to almost 50,000 total in linux.git), and the rest are in the huge "base" pack. So once we start looking at history in taht big pack, that's where we'll find most everything, and even the 1-element cache gets close to 100% cache hits. You could almost certainly show better numbers with a more pathological case (e.g., distributing the objects more evenly across the packs). But that's simply not that realistic a scenario, so it makes more sense to focus on these numbers. The implementation itself is a straightforward application of the MRU code. We provide an MRU-ordered list of packs that shadows the packed_git list. This is easy to do because we only create and revise the pack list in one place. The "reprepare" code path actually drops the whole MRU and replaces it for simplicity. It would be more efficient to just add new entries, but there's not much point in optimizing here; repreparing happens rarely, and only after doing a lot of other expensive work. The key things to keep optimized are traversal (which is just a normal linked list, albeit with one extra level of indirection over the regular packed_git list), and marking (which is a constant number of pointer assignments, though slightly more than the old last_found_pack was; it doesn't seem to create a measurable slowdown, though). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:07 -07:00
Jeff King	002f206faf	add generic most-recently-used list There are a few places in Git that would benefit from a fast most-recently-used cache (e.g., the list of packs, which we search linearly but would like to order based on locality). This patch introduces a generic list that can be used to store arbitrary pointers in most-recently-used order. The implementation is just a doubly-linked list, where "marking" an item as used moves it to the front of the list. Insertion and marking are O(1), and iteration is O(n). There's no lookup support provided; if you need fast lookups, you are better off with a different data structure in the first place. There is also no deletion support. This would not be hard to do, but it's not necessary for handling pack structs, which are created and never removed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:07 -07:00
Jeff King	3157c880f6	sha1_file: drop free_pack_by_name The point of this function is to drop an entry from the "packed_git" cache that points to a file we might be overwriting, because our contents may not be the same (and hence the only caller was pack-objects as it moved a temporary packfile into place). In older versions of git, this could happen because the names of packfiles were derived from the set of objects they contained, not the actual bits on disk. But since `1190a1a` (pack-objects: name pack files after trailer hash, 2013-12-05), the name reflects the actual bits on disk, and any two packfiles with the same name can be used interchangeably. Dropping this function not only saves a few lines of code, it makes the lifetime of "struct packed_git" much easier to reason about: namely, we now do not ever free these structs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:06 -07:00
Jeff King	77023ea3c3	t/perf: add tests for many-pack scenarios Git's pack storage does efficient (log n) lookups in a single packfile's index, but if we have multiple packfiles, we have to linearly search each for a given object. This patch introduces some timing tests for cases where we have a large number of packs, so that we can measure any improvements we make in the following patches. The main thing we want to time is object lookup. To do this, we measure "git rev-list --objects --all", which does a fairly large number of object lookups (essentially one per object in the repository). However, we also measure the time to do a full repack, which is interesting for two reasons. One is that in addition to the usual pack lookup, it has its own linear iteration over the list of packs. And two is that because it it is the tool one uses to go from an inefficient many-pack situation back to a single pack, we care about its performance not only at marginal numbers of packs, but at the extreme cases (e.g., if you somehow end up with 5,000 packs, it is the only way to get back to 1 pack, so we need to make sure it performs well). We measure the performance of each command in three scenarios: 1 pack, 50 packs, and 1,000 packs. The 1-pack case is a baseline; any optimizations we do to handle multiple packs cannot possibly perform better than this. The 50-pack case is as far as Git should generally allow your repository to go, if you have auto-gc enabled with the default settings. So this represents the maximum performance improvement we would expect under normal circumstances. The 1,000-pack case is hopefully rare, though I have seen it in the wild where automatic maintenance was broken for some time (and the repository continued to receive pushes). This represents cases where we care less about general performance, but want to make sure that a full repack command does not take excessively long. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:06 -07:00
Junio C Hamano	f8f7adce9f	Sync with maint * maint: Some fixes for 2.9.3	2016-07-28 14:21:18 -07:00

1 2 3 4 5 ...