mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-01 06:47:52 +01:00

Author	SHA1	Message	Date
Junio C Hamano	00d27937bf	Merge branch 'jh/status-v2-porcelain' Enhance "git status --porcelain" output by collecting more data on the state of the index and the working tree files, which may further be used to teach git-prompt (in contrib/) to make fewer calls to git. * jh/status-v2-porcelain: status: unit tests for --porcelain=v2 test-lib-functions.sh: add lf_to_nul helper git-status.txt: describe --porcelain=v2 format status: print branch info with --porcelain=v2 --branch status: print per-file porcelain v2 status data status: collect per-file data for --porcelain=v2 status: support --porcelain[=<version>] status: cleanup API to wt_status_print status: rename long-format print routines	2016-09-08 21:49:50 -07:00
Junio C Hamano	d0b61dc65f	Merge branch 'po/range-doc' Clarify various ways to specify the "revision ranges" in the documentation. * po/range-doc: doc: revisions: sort examples and fix alignment of the unchanged doc: revisions: show revision expansion in examples doc: revisions - clarify reachability examples doc: revisions - define `reachable` doc: gitrevisions - clarify 'latter case' is revision walk doc: gitrevisions - use 'reachable' in page description doc: revisions: single vs multi-parent notation comparison doc: revisions: extra clarification of <rev>^! notation effects doc: revisions: give headings for the two and three dot notations doc: show the actual left, right, and boundary marks doc: revisions - name the left and right sides doc: use 'symmetric difference' consistently	2016-09-08 21:49:49 -07:00
Junio C Hamano	d7ed183a91	Merge branch 'rt/help-unknown' "git nosuchcommand --help" said "No manual entry for gitnosuchcommand", which was not intuitive, given that "git nosuchcommand" said "git: 'nosuchcommand' is not a git command". * rt/help-unknown: help: make option --help open man pages only for Git commands help: introduce option --exclude-guides	2016-09-08 21:49:48 -07:00
Junio C Hamano	da3b6f06e1	Merge branch 'cc/receive-pack-limit' An incoming "git push" that attempts to push too many bytes can now be rejected by setting a new configuration variable at the receiving end. * cc/receive-pack-limit: receive-pack: allow a maximum input size to be specified unpack-objects: add --max-input-size=<size> option index-pack: add --max-input-size=<size> option	2016-09-08 21:49:47 -07:00
Junio C Hamano	452a9073ba	Merge branch 'jk/format-patch-number-singleton-patch-with-cover' "git format-patch --cover-letter HEAD^" to format a single patch with a separate cover letter now numbers the output as [PATCH 0/1] and [PATCH 1/1] by default. * jk/format-patch-number-singleton-patch-with-cover: format-patch: show 0/1 and 1/1 for singleton patch with cover letter	2016-09-08 21:49:47 -07:00
Junio C Hamano	c4071eace9	Merge branch 'jk/delta-base-cache' The delta-base-cache mechanism has been a key to the performance in a repository with a tightly packed packfile, but it did not scale well even with a larger value of core.deltaBaseCacheLimit. * jk/delta-base-cache: t/perf: add basic perf tests for delta base cache delta_base_cache: use hashmap.h delta_base_cache: drop special treatment of blobs delta_base_cache: use list.h for LRU release_delta_base_cache: reuse existing detach function clear_delta_base_cache_entry: use a more descriptive name cache_or_unpack_entry: drop keep_cache parameter	2016-09-08 21:49:46 -07:00
Junio C Hamano	6ebdac1bab	Git 2.10 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-02 09:05:47 -07:00
Junio C Hamano	dd39dfcf8a	l10n-2.10.0-rnd2.2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXyYCAAAoJEMek6Rt1RHoouFkP/0n836xBjmhMQAd7ksJ6UrVO YoWMq1Qa7Ox4FLSgpmxS+Q80/vgWgI4ESN9kAoyfR4axbhaK8WzRTkR8F5LA1jmf 0k2EXXNrJqf1LTTztx+fEleSMptMlVun0NJnHT7w0dQcXoeH9dHDbO/p7WZKCxXO qQ7KkmHMVW1EnO+QkUtujehfI1oUggc3Crc1pxSG1BvCyodtXYIjUD6wUN00yyeI PFzsPyLjk1uVUQESiiSMwr2kj5EqEKx/S8g0I+kLxiEnVCww8Or5a5TYPKG1IDiR jnlmCQF8Y3hoNdOyHHV2xaSPIA6OsoJcvYzmMOIDcp2IbGFnGc5ceaTQs5SlS3wP GTjwaA0ttYB1JOYjvojlsUIEOYNopaZvphws02iYv10kIL1gkbaBLlXj+roTJ0wP 6huthQhjYpVu11iCBnRH8/dXwbIs2h86V5l9e5Yj/OVyK9R08LVnV0RaEWrPnwgd FdGC1JdOgczIE6tBMoJSRtIf1pbQiEsh0wgj+Vh0bPqB6nJJSCa605Lamhm61FyU eoH6pEG+14CmPMpbP8jkctj17FlIQcYaR6LyR/qBLgcYkt9EpxTH1n9RTv+C0bRx yR2w/uCwrhSBFmaTQW2cPKrHYdGpyBdvLXUiT1ewCEJgffFun5IT+r6Qhg7B1yXb AbAJfibXGeCVgZuRKIQC =e4TB -----END PGP SIGNATURE----- Merge tag 'l10n-2.10.0-rnd2.2' of git://github.com/git-l10n/git-po l10n-2.10.0-rnd2.2 * tag 'l10n-2.10.0-rnd2.2' of git://github.com/git-l10n/git-po: l10n: Updated Vietnamese translation for v2.10.0-rc2 (2757t)	2016-09-02 08:48:14 -07:00
Jiang Xin	e8e349249c	Merge branch 'master' of https://github.com/vnwildman/git * 'master' of https://github.com/vnwildman/git: l10n: Updated Vietnamese translation for v2.10.0-rc2 (2757t)	2016-09-02 21:29:48 +08:00
Junio C Hamano	5b18e70009	A few more fixes before the final 2.10 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-31 10:21:05 -07:00
Junio C Hamano	934b1caa7a	l10n-2.10.0-rnd2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXxuOXAAoJEMek6Rt1RHooEa8QAKO9TwEgUu3RcjoEhoHuncWw 450/uVcz4rhCZsupzBahcSUI3qcE+GuunezASWQ0IkEXGogLC1CwnBkYzEUYuOvU 0BH/gao4FfIZmBddCBn/6tUZJG+wTgCsD2ept6oR0RgPo85bHdN3K12S4FTHVmGj AK894O60ZJJZuUelbErOzHh52Pgad5BEJ9V6Bt5hBPfWLV/wPvrWm8efk4T88odw ErOVUv49gSUqvwA+ot8sq7bcPVhH06CTvw15iWTPmD8hLYdQDTYKL1+PkGJa0uXR Qoe4vLhUV8Rz0RGi9aWnlOzAYgIQ9FzkLwDfqz4KCyV1qa5EqorfjwkK1vOug9VY Qv5p0fWLIKqCGEyJjy1y87xMi/GgydRI8mZYAaRhHgCc4qMI/E7n4I4+vc6/9i/0 C7tRW8ylb7UUPENdvZvcOxWgP9y/VMiw/I8wyv3wuzifM5DRxCUCfwMJDQdIzBVI T+sPO8cjYjKPATieZ563LBFtmiVoms4U1DDDqH65SzZtsOa2GOruAW3eIFwFmQHE hYhyZOFpDhg20EgSKO9owzA5IjtiuplPYJgQmUyiVEeOfcr/Gw+a3CzgCEDNJ8EF orjSHIwO0N7BmHDMJeJlGyCuhgG6JLdZDI9d/AiaPFuGiNnnINH5odSz8M5tE1F4 erUKZgZDpifOYHsSurcF =lwbs -----END PGP SIGNATURE----- Merge tag 'l10n-2.10.0-rnd2' of git://github.com/git-l10n/git-po l10n-2.10.0-rnd2 * tag 'l10n-2.10.0-rnd2' of git://github.com/git-l10n/git-po: l10n: zh_CN: for git v2.10.0 l10n round 2 l10n: ca.po: update translation l10n: fr.po v2.10.0-rc2 l10n: sv.po: Update Swedish translation (2757t0f0u) l10n: git.pot: v2.10.0 round 2 (12 new, 44 removed) l10n: Updated Vietnamese translation for v2.10.0 (2789t) l10n: pt_PT: update Portuguese translation l10n: pt_PT: merge git.pot l10n: ko.po: Update Korean translation l10n: git.pot: v2.10.0 round 1 (248 new, 56 removed)	2016-08-31 10:04:14 -07:00
Junio C Hamano	58e72a2179	Merge branch 'ls/packet-line-protocol-doc-fix' Correct an age-old calco (is that a typo-like word for calc) in the documentation. * ls/packet-line-protocol-doc-fix: pack-protocol: fix maximum pkt-line size	2016-08-31 10:03:51 -07:00
Junio C Hamano	4762bf36d9	Merge branch 'mh/blame-worktree' * mh/blame-worktree: blame: fix segfault on untracked files	2016-08-31 10:03:50 -07:00
Junio C Hamano	9010077be2	Merge branch 'kw/patch-ids-optim' * kw/patch-ids-optim: p3400: make test script executable	2016-08-31 10:03:49 -07:00
Ralf Thielow	2c6b6d9f7d	help: make option --help open man pages only for Git commands If option --help is passed to a Git command, we try to open the man page of that command. However, we do it for both commands and concepts. Make sure it is an actual command. This makes "git <concept> --help" not working anymore, while "git help <concept>" still works. Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-30 16:09:41 -07:00
Ralf Thielow	af74128f4a	help: introduce option --exclude-guides Introduce option --exclude-guides to the help command. With this option being passed, "git help" will open man pages only for actual commands. Since we know it is a command, we can use function help_unknown_command to give the user advice on typos. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-30 16:09:41 -07:00
Lars Schneider	7841c4801c	pack-protocol: fix maximum pkt-line size According to LARGE_PACKET_MAX in pkt-line.h the maximal length of a pkt-line packet is 65520 bytes. The pkt-line header takes 4 bytes and therefore the pkt-line data component must not exceed 65516 bytes. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-30 11:00:29 -07:00
Jiang Xin	5c57d7622e	l10n: zh_CN: for git v2.10.0 l10n round 2 Update 215 translations (2757t0f0u) for git v2.10.0-rc2. Signed-off-by: Jiang Xin <worldhello.net@gmail.com>	2016-08-31 00:11:13 +08:00
René Scharfe	ba67504fa8	p3400: make test script executable Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-29 12:57:16 -07:00
Thomas Gummerer	bc6b13a7d2	blame: fix segfault on untracked files Since `3b75ee9` ("blame: allow to blame paths freshly added to the index", 2016-07-16) git blame also looks at the index to determine if there is a file that was freshly added to the index. cache_name_pos returns -pos - 1 in case there is no match is found, or if the name matches, but the entry has a stage other than 0. As git blame should work for unmerged files, it uses strcmp to determine whether the name of the returned position matches, in which case the file exists, but is merely unmerged, or if the file actually doesn't exist in the index. If the repository is empty, or if the file would lexicographically be sorted as the last file in the repository, -cache_name_pos - 1 is outside of the length of the active_cache array, causing git blame to segfault. Guard against that, and die() normally to restore the old behaviour. Reported-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-29 11:57:33 -07:00
Alex Henrie	63b8265402	l10n: ca.po: update translation Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>	2016-08-28 10:32:56 -06:00
Jean-Noel Avila	b67e63067d	l10n: fr.po v2.10.0-rc2 Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>	2016-08-28 11:36:14 +02:00
Tran Ngoc Quan	800d88e2b3	l10n: Updated Vietnamese translation for v2.10.0-rc2 (2757t) Signed-off-by: Tran Ngoc Quan <vnwildman@gmail.com>	2016-08-28 07:23:30 +07:00
Peter Krefting	8ed2d3fb15	l10n: sv.po: Update Swedish translation (2757t0f0u) Signed-off-by: Peter Krefting <peter@softwolves.pp.se>	2016-08-27 20:42:50 +01:00
Jiang Xin	b30eec1a69	Merge branch 'master' of https://github.com/vnwildman/git * 'master' of https://github.com/vnwildman/git: l10n: Updated Vietnamese translation for v2.10.0 (2789t)	2016-08-27 23:36:16 +08:00
Jiang Xin	5bd166d8af	l10n: git.pot: v2.10.0 round 2 (12 new, 44 removed) Generate po/git.pot from v2.10.0-rc2 for git v2.10.0 l10n round 2. Signed-off-by: Jiang Xin <worldhello.net@gmail.com>	2016-08-27 23:23:26 +08:00
Jiang Xin	fe1280decc	Merge branch 'master' of git://github.com/git-l10n/git-po * 'master' of git://github.com/git-l10n/git-po: l10n: pt_PT: update Portuguese translation l10n: pt_PT: merge git.pot l10n: ko.po: Update Korean translation l10n: git.pot: v2.10.0 round 1 (248 new, 56 removed)	2016-08-27 23:14:27 +08:00
Tran Ngoc Quan	b9252573c4	l10n: Updated Vietnamese translation for v2.10.0 (2789t) Signed-off-by: Tran Ngoc Quan <vnwildman@gmail.com>	2016-08-27 09:15:28 +07:00
Junio C Hamano	d5cb9cbd64	Git 2.10-rc2 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-26 13:59:20 -07:00
Torsten Bögershausen	e28eae3184	gitattributes: Document the unified "auto" handling Update the documentation about text=auto: text=auto now follows the core.autocrlf handling when files are not normalized in the repository. For a cross platform project recommend the usage of attributes for line-ending conversions. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-26 13:54:16 -07:00
Junio C Hamano	3b1c6a9b6e	Merge branch 'js/no-html-bypass-on-windows' into rt/help-unknown * js/no-html-bypass-on-windows: Revert "display HTML in default browser using Windows' shell API"	2016-08-26 11:29:07 -07:00
Junio C Hamano	5cb0d5ad05	Prepare for 2.10.0-rc2 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-25 13:56:51 -07:00
Junio C Hamano	0fd6c99bdf	Merge branch 'ja/i18n' The recent i18n patch we added during this cycle did a bit too much refactoring of the messages to avoid word-legos; the repetition has been reduced to help translators. * ja/i18n: i18n: simplify numeric error reporting i18n: fix git rebase interactive commit messages i18n: fix typos for translation	2016-08-25 13:55:07 -07:00
Junio C Hamano	3dc01702df	Merge branch 'bw/mingw-avoid-inheriting-fd-to-lockfile' The tempfile (hence its user lockfile) API lets the caller to open a file descriptor to a temporary file, write into it and then finalize it by first closing the filehandle and then either removing or renaming the temporary file. When the process spawns a subprocess after obtaining the file descriptor, and if the subprocess has not exited when the attempt to remove or rename is made, the last step fails on Windows, because the subprocess has the file descriptor still open. Open tempfile with O_CLOEXEC flag to avoid this (on Windows, this is mapped to O_NOINHERIT). * bw/mingw-avoid-inheriting-fd-to-lockfile: mingw: ensure temporary file handles are not inherited by child processes t6026-merge-attr: child processes must not inherit index.lock handles	2016-08-25 13:55:07 -07:00
Junio C Hamano	a8998453be	Merge branch 'dg/document-git-c-in-git-config-doc' The "git -c var[=val] cmd" facility to append a configuration variable definition at the end of the search order was described in git(1) manual page, but not in git-config(1), which was more likely place for people to look for when they ask "can I make a one-shot override, and if so how?" * dg/document-git-c-in-git-config-doc: doc: mention `git -c` in git-config(1)	2016-08-25 13:55:07 -07:00
Junio C Hamano	13e11ff707	Merge branch 'js/no-html-bypass-on-windows' On Windows, help.browser configuration variable used to be ignored, which has been corrected. * js/no-html-bypass-on-windows: Revert "display HTML in default browser using Windows' shell API"	2016-08-25 13:55:06 -07:00
Junio C Hamano	a1f0b4e286	Merge branch 'hv/doc-commit-reference-style' A small doc update. * hv/doc-commit-reference-style: SubmittingPatches: document how to reference previous commits	2016-08-25 13:55:06 -07:00
Torsten Bögershausen	41a616dada	git ls-files: text=auto eol=lf is supported in Git 2.10 The man page for `git ls-files --eol` mentions the combination of text attributes "text=auto eol=lf" or "text=auto eol=crlf" as not supported yet, but may be in the future. Now they are supported. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-25 13:38:18 -07:00
Vasco Almeida	9d83143621	l10n: pt_PT: update Portuguese translation Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>	2016-08-25 13:33:17 +00:00
Vasco Almeida	587dae416d	l10n: pt_PT: merge git.pot Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>	2016-08-25 13:33:17 +00:00
Jeff King	c08db5a2d0	receive-pack: allow a maximum input size to be specified Receive-pack feeds its input to either index-pack or unpack-objects, which will happily accept as many bytes as a sender is willing to provide. Let's allow an arbitrary cutoff point where we will stop writing bytes to disk. Cleaning up what has already been written to disk is a related problem that is not addressed by this patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-24 12:31:05 -07:00
Christian Couder	5ad2186733	unpack-objects: add --max-input-size=<size> option When receiving a pack-file, it can be useful to abort the `git unpack-objects`, if the pack-file is too big. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-24 12:31:05 -07:00
Jeff King	411481be6f	index-pack: add --max-input-size=<size> option When receiving a pack-file, it can be useful to abort the `git index-pack`, if the pack-file is too big. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-24 12:31:05 -07:00
Jean-Noel Avila	078fe30523	i18n: simplify numeric error reporting Signed-off-by: Jean-Noel Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-24 08:47:20 -07:00
Jean-Noel Avila	8aa6dc1d9e	i18n: fix git rebase interactive commit messages For proper i18n, the logic cannot embed english specific processing. Signed-off-by: Jean-Noel Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-24 08:43:27 -07:00
Jean-Noel Avila	cd3e4677cf	i18n: fix typos for translation Signed-off-by: Jean-Noel Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-24 08:41:22 -07:00
Jacob Keller	957ed3a56c	format-patch: show 0/1 and 1/1 for singleton patch with cover letter Change the default behavior of git-format-patch to generate numbered sequence of 0/1 and 1/1 when generating both a cover-letter and a single patch. This standardizes the cover letter to have 0/N which helps distinguish the cover letter from the patch itself. Since the behavior is easily changed via configuration as well as the use of -n and -N this should be acceptable default behavior. Add tests for the new default behavior. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 15:59:11 -07:00
Jeff King	c7df68cbca	t/perf: add basic perf tests for delta base cache This just shows off the improvements done by the last few patches, and gives us a baseline for noticing regressions in the future. Here are the results with linux.git as the perf "large repo": Test origin HEAD ------------------------------------------------------------------- 0003.1: log --raw 43.41(40.36+2.69) 33.86(30.96+2.41) -22.0% 0003.2: log -S 313.61(309.74+3.78) 298.75(295.58+3.00) -4.7% (for a large repo, the "log -S" improvements are greater if you bump the delta base cache limit, but I think it makes sense to test the "stock" behavior, since that is what most people will see). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 15:26:16 -07:00
Jeff King	8261e1f139	delta_base_cache: use hashmap.h The fundamental data structure of the delta base cache is a hash table mapping pairs of "(packfile, offset)" into structs containing the actual object data. The hash table implementation dates back to `e5e0161` (Implement a simple delta_base cache, 2007-03-17), and uses a fixed-size table. The current size is a hard-coded 256 entries. Because we need to be able to remove objects from the hash table, entry lookup does not do any kind of probing to handle collisions. Colliding items simply replace whatever is in their slot. As a result, we have fewer usable slots than even the 256 we allocate. At half full, each new item has a 50% chance of displacing another one. Or another way to think about it: every item has a 1/256 chance of being ejected due to hash collision, without regard to our LRU strategy. So it would be interesting to see the effect of increasing the cache size on the runtime for some common operations. As with the previous patch, we'll measure "git log --raw" for tree-only operations, and "git log -Sfoo --raw" for operations that touch trees and blobs. All times are wall-clock best-of-3, done against fully packed repos with --depth=50, and the default core.deltaBaseCacheLimit of 96MB. Here are timings for various values of MAX_DELTA_CACHE against git.git (the asterisk marks the minimum time for each operation): MAX_DELTA_CACHE log-raw log-S --------------- --------- --------- 256 0m02.227s 0m12.821s 512 0m02.143s 0m10.602s 1024 0m02.127s 0m08.642s 2048 0m02.148s 0m07.123s 4096 0m02.194s 0m06.448s* 8192 0m02.239s 0m06.504s 16384 0m02.144s* 0m06.502s 32768 0m02.202s 0m06.622s 65536 0m02.230s 0m06.677s The log-raw case isn't changed much at all here (probably because our trees just aren't that big in the first place, or possibly because we have so _few_ trees in git.git that the 256-entry cache is enough). But once we start putting blobs in the cache, too, we see a big improvement (almost 50%). The curve levels off around 4096, which means that we can hold about that many entries before hitting the 96MB memory limit (or possibly that the workload is small enough that there is simply no more work to be optimized out by caching more). (As a side note, I initially timed my existing git.git pack, which was a base of --aggressive combined with some pulls on top. So it had quite a few deeper delta chains. The 256-cache case was more like 15s, and it still dropped to ~6.5s in the same way). Here are the timings for linux.git: MAX_DELTA_CACHE log-raw log-S --------------- --------- --------- 256 0m41.661s 5m12.410s 512 0m39.547s 5m07.920s 1024 0m37.054s 4m54.666s 2048 0m35.871s 4m41.194s* 4096 0m34.646s 4m51.648s 8192 0m33.881s 4m55.342s 16384 0m35.190s 5m00.122s 32768 0m35.060s 4m58.851s 65536 0m33.311s* 4m51.420s As we grow we see a nice 20% speedup in the tree traversal, and more modest 10% in the log-S. This is probably an indication that we are bound less by the number of entries, and more by the memory limit (more on that below). What is interesting is that the numbers bounce around a bit; increasing the number of entries isn't always a strict improvement. Partially this is due to noise in the measurement. But it may also be an indication that our LRU ejection scheme is not optimal. The smaller cache sizes introduce some randomness into the ejection (due to collisions), which may sometimes work in our favor (and sometimes not!). So what is the optimal setting of MAX_DELTA_CACHE? The "bouncing" in the linux.git log-S numbers notwithstanding, it mostly seems like bigger is better. And even if we were to try to find a "sweet spot", these are just two repositories, that are not necessarily representative. The shape of history, the size of trees and blobs, the memory limit configuration, etc, all will affect the outcome. Rather than trying to find the "right" number, another strategy is to just switch to a hash table that can actually store collisions: namely our hashmap.h implementation. Here are numbers for that compared to the "best" we saw from adjusting MAX_DELTA_CACHE: \| log-raw \| log-S \| best hashmap \| best hashmap \| --------- --------- \| --------- --------- git \| 0m02.144s 0m02.144s \| 0m06.448s 0m06.688s linux \| 0m33.311s 0m33.092s \| 4m41.194s 4m57.172s We can see the results are similar in most cases, which is what we'd expect. We're not ejecting due to collisions at all, so this is purely representing the LRU. So really, we'd expect this to model most closely the larger values of the static MAX_DELTA_CACHE limit. And that does seem to be what's happening, including the "bounce" in the linux log-S case. So while the value for that case _isn't_ as good as the optimal one measured above (which was 2048 entries), given the bouncing I'm hesitant to suggest that 2048 is any kind of optimum (not even for linux.git, let alone as a general rule). The generic hashmap has the appeal that it drops the number of tweakable numbers by one, which means we can focus on tuning other elements, like the LRU strategy or the core.deltaBaseCacheLimit setting. And indeed, if we bump the cache limit to 1G (which is probably silly for general use, but maybe something people with big workstations would want to do), the linux.git log-S time drops to 3m32s. That's something you really _can't_ do easily with the static hash table, because the number of entries needs to grow in proportion to the memory limit (so 2048 is almost certainly not going to be the right value there). This patch takes that direction, and drops the static hash table entirely in favor of using the hashmap.h API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 15:18:50 -07:00
Jeff King	6d9617f4f7	delta_base_cache: drop special treatment of blobs When the delta base cache runs out of allowed memory, it has to drop entries. It does so by walking an LRU list, dropping objects until we are under the memory limit. But we actually walk the list twice: once to drop blobs, and then again to drop other objects (which are generally trees). This comes from `18bdec1` (Limit the size of the new delta_base_cache, 2007-03-19). This performs poorly as the number of entries grows, because any time dropping blobs does not satisfy the limit, we have to walk the _entire_ list, trees included, looking for blobs to drop, before starting to drop any trees. It's not generally a problem now, as the cache is limited to only 256 entries. But as we could benefit from increasing that in a future patch, it's worth looking at how it performs as the cache size grows. And the answer is "not well". The table below shows times for various operations with different values of MAX_DELTA_CACHE (which is not a run-time knob; I recompiled with -DMAX_DELTA_CACHE=$n for each). I chose "git log --raw" ("log-raw" in the table) because it will access all of the trees, but no blobs at all (so in a sense it is a worst case for this problem, because we will always walk over the entire list of trees once before realizing there are no blobs to drop). This is also representative of other tree-only operations like "rev-list --objects" and "git log -- <path>". I also timed "git log -Sfoo --raw" ("log-S" in the table). It similarly accesses all of the trees, but also the blobs for each commit. It's representative of "git log -p", though it emphasizes the cost of blob access more, as "-S" is cheaper than computing an actual blob diff. All timings are best-of-3 wall-clock times (though they all were CPU bound, so the user CPU times are similar). The repositories were fully packed with --depth=50, and the default core.deltaBaseCacheLimit of 96M was in effect. The current value of MAX_DELTA_CACHE is 256, so I started there and worked up by factors of 2. First, here are values for git.git (the asterisk signals the fastest run for each operation): MAX_DELTA_CACHE log-raw log-S --------------- --------- --------- 256 0m02.212s 0m12.634s 512 0m02.136s* 0m10.614s 1024 0m02.156s 0m08.614s 2048 0m02.208s 0m07.062s 4096 0m02.190s 0m06.484s* 8192 0m02.176s 0m07.635s 16384 0m02.913s 0m19.845s 32768 0m03.617s 1m05.507s 65536 0m04.031s 1m18.488s You can see that for the tree-only log-raw case, we don't actually benefit that much as the cache grows (all the differences up through 8192 are basically just noise; this is probably because we don't actually have that many distinct trees in git.git). But for log-S, we get a definite speed improvement as the cache grows, but the improvements are lost as cache size grows and the linear LRU management starts to dominate. Here's the same thing run against linux.git: MAX_DELTA_CACHE log-raw log-S --------------- --------- ---------- 256 0m40.987s 5m13.216s 512 0m37.949s 5m03.243s 1024 0m35.977s 4m50.580s 2048 0m33.855s 4m39.818s 4096 0m32.913s 4m47.299s* 8192 0m32.176s* 5m14.650s 16384 0m32.185s 6m31.625s 32768 0m38.056s 9m31.136s 65536 1m30.518s 17m38.549s The pattern is similar, though the effect in log-raw is more pronounced here. The times dip down in the middle, and then go back up as we keep growing. So we know there's a problem. What's the solution? The obvious one is to improve the data structure to avoid walking over tree entries during the looking-for-blobs traversal. We can do this by keeping _two_ LRU lists: one for blobs, and one for other objects. We drop items from the blob LRU first, and then from the tree LRU (if necessary). Here's git.git using that strategy: MAX_DELTA_CACHE log-raw log-S --------------- --------- ---------- 256 0m02.264s 0m12.830s 512 0m02.201s 0m10.771s 1024 0m02.181s 0m08.593s 2048 0m02.205s 0m07.116s 4096 0m02.158s 0m06.537s* 8192 0m02.213s 0m07.246s 16384 0m02.155s* 0m10.975s 32768 0m02.159s 0m16.047s 65536 0m02.181s 0m16.992s The upswing on log-raw is gone completely. But log-S still has it (albeit much better than without this strategy). Let's see what linux.git shows: MAX_DELTA_CACHE log-raw log-S --------------- --------- --------- 256 0m42.519s 5m14.654s 512 0m39.106s 5m04.708s 1024 0m36.802s 4m51.454s 2048 0m34.685s 4m39.378s* 4096 0m33.663s 4m44.047s 8192 0m33.157s 4m50.644s 16384 0m33.090s* 4m49.648s 32768 0m33.458s 4m53.371s 65536 0m33.563s 5m04.580s The results are similar. The tree-only case again performs well (not surprising; we're literally just dropping the one useless walk, and not otherwise changing the cache eviction strategy at all). But the log-S case again does a bit worse as the cache grows (though possibly that's within the noise, which is much larger for this case). Perhaps this is an indication that the "remove blobs first" strategy is not actually optimal. The intent of it is to avoid blowing out the tree cache when we see large blobs, but it also means we'll throw away useful, recent blobs in favor of older trees. Let's run the same numbers without caring about object type at all (i.e., one LRU list, and always evicting whatever is at the head, regardless of type). Here's git.git: MAX_DELTA_CACHE log-raw log-S --------------- --------- --------- 256 0m02.227s 0m12.821s 512 0m02.143s 0m10.602s 1024 0m02.127s 0m08.642s 2048 0m02.148s 0m07.123s 4096 0m02.194s 0m06.448s* 8192 0m02.239s 0m06.504s 16384 0m02.144s* 0m06.502s 32768 0m02.202s 0m06.622s 65536 0m02.230s 0m06.677s Much smoother; there's no dramatic upswing as we increase the cache size (some remains, though it's small enough that it's mostly run-to-run noise. E.g., in the log-raw case, note how 8192 is 50-100ms higher than its neighbors). Note also that we stop getting any real benefit for log-S after about 4096 entries; that number will depend on the size of the repository, the size of the blob entries, and the memory limit of the cache. Let's see what linux.git shows for the same strategy: MAX_DELTA_CACHE log-raw log-S --------------- --------- --------- 256 0m41.661s 5m12.410s 512 0m39.547s 5m07.920s 1024 0m37.054s 4m54.666s 2048 0m35.871s 4m41.194s* 4096 0m34.646s 4m51.648s 8192 0m33.881s 4m55.342s 16384 0m35.190s 5m00.122s 32768 0m35.060s 4m58.851s 65536 0m33.311s* 4m51.420s It's similarly good. As with the "separate blob LRU" strategy, there's a lot of noise on the log-S run here. But it's certainly not any worse, is possibly a bit better, and the improvement over "separate blob LRU" on the git.git case is dramatic. So it seems like a clear winner, and that's what this patch implements. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 14:57:44 -07:00

1 2 3 4 5 ...