1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-01 06:47:52 +01:00
Commit graph

44183 commits

Author SHA1 Message Date
Junio C Hamano
00d27937bf Merge branch 'jh/status-v2-porcelain'
Enhance "git status --porcelain" output by collecting more data on
the state of the index and the working tree files, which may
further be used to teach git-prompt (in contrib/) to make fewer
calls to git.

* jh/status-v2-porcelain:
  status: unit tests for --porcelain=v2
  test-lib-functions.sh: add lf_to_nul helper
  git-status.txt: describe --porcelain=v2 format
  status: print branch info with --porcelain=v2 --branch
  status: print per-file porcelain v2 status data
  status: collect per-file data for --porcelain=v2
  status: support --porcelain[=<version>]
  status: cleanup API to wt_status_print
  status: rename long-format print routines
2016-09-08 21:49:50 -07:00
Junio C Hamano
d0b61dc65f Merge branch 'po/range-doc'
Clarify various ways to specify the "revision ranges" in the
documentation.

* po/range-doc:
  doc: revisions: sort examples and fix alignment of the unchanged
  doc: revisions: show revision expansion in examples
  doc: revisions - clarify reachability examples
  doc: revisions - define `reachable`
  doc: gitrevisions - clarify 'latter case' is revision walk
  doc: gitrevisions - use 'reachable' in page description
  doc: revisions: single vs multi-parent notation comparison
  doc: revisions: extra clarification of <rev>^! notation effects
  doc: revisions: give headings for the two and three dot notations
  doc: show the actual left, right, and boundary marks
  doc: revisions - name the left and right sides
  doc: use 'symmetric difference' consistently
2016-09-08 21:49:49 -07:00
Junio C Hamano
d7ed183a91 Merge branch 'rt/help-unknown'
"git nosuchcommand --help" said "No manual entry for gitnosuchcommand",
which was not intuitive, given that "git nosuchcommand" said "git:
'nosuchcommand' is not a git command".

* rt/help-unknown:
  help: make option --help open man pages only for Git commands
  help: introduce option --exclude-guides
2016-09-08 21:49:48 -07:00
Junio C Hamano
da3b6f06e1 Merge branch 'cc/receive-pack-limit'
An incoming "git push" that attempts to push too many bytes can now
be rejected by setting a new configuration variable at the receiving
end.

* cc/receive-pack-limit:
  receive-pack: allow a maximum input size to be specified
  unpack-objects: add --max-input-size=<size> option
  index-pack: add --max-input-size=<size> option
2016-09-08 21:49:47 -07:00
Junio C Hamano
452a9073ba Merge branch 'jk/format-patch-number-singleton-patch-with-cover'
"git format-patch --cover-letter HEAD^" to format a single patch
with a separate cover letter now numbers the output as [PATCH 0/1]
and [PATCH 1/1] by default.

* jk/format-patch-number-singleton-patch-with-cover:
  format-patch: show 0/1 and 1/1 for singleton patch with cover letter
2016-09-08 21:49:47 -07:00
Junio C Hamano
c4071eace9 Merge branch 'jk/delta-base-cache'
The delta-base-cache mechanism has been a key to the performance in
a repository with a tightly packed packfile, but it did not scale
well even with a larger value of core.deltaBaseCacheLimit.

* jk/delta-base-cache:
  t/perf: add basic perf tests for delta base cache
  delta_base_cache: use hashmap.h
  delta_base_cache: drop special treatment of blobs
  delta_base_cache: use list.h for LRU
  release_delta_base_cache: reuse existing detach function
  clear_delta_base_cache_entry: use a more descriptive name
  cache_or_unpack_entry: drop keep_cache parameter
2016-09-08 21:49:46 -07:00
Junio C Hamano
6ebdac1bab Git 2.10
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-02 09:05:47 -07:00
Junio C Hamano
dd39dfcf8a l10n-2.10.0-rnd2.2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXyYCAAAoJEMek6Rt1RHoouFkP/0n836xBjmhMQAd7ksJ6UrVO
 YoWMq1Qa7Ox4FLSgpmxS+Q80/vgWgI4ESN9kAoyfR4axbhaK8WzRTkR8F5LA1jmf
 0k2EXXNrJqf1LTTztx+fEleSMptMlVun0NJnHT7w0dQcXoeH9dHDbO/p7WZKCxXO
 qQ7KkmHMVW1EnO+QkUtujehfI1oUggc3Crc1pxSG1BvCyodtXYIjUD6wUN00yyeI
 PFzsPyLjk1uVUQESiiSMwr2kj5EqEKx/S8g0I+kLxiEnVCww8Or5a5TYPKG1IDiR
 jnlmCQF8Y3hoNdOyHHV2xaSPIA6OsoJcvYzmMOIDcp2IbGFnGc5ceaTQs5SlS3wP
 GTjwaA0ttYB1JOYjvojlsUIEOYNopaZvphws02iYv10kIL1gkbaBLlXj+roTJ0wP
 6huthQhjYpVu11iCBnRH8/dXwbIs2h86V5l9e5Yj/OVyK9R08LVnV0RaEWrPnwgd
 FdGC1JdOgczIE6tBMoJSRtIf1pbQiEsh0wgj+Vh0bPqB6nJJSCa605Lamhm61FyU
 eoH6pEG+14CmPMpbP8jkctj17FlIQcYaR6LyR/qBLgcYkt9EpxTH1n9RTv+C0bRx
 yR2w/uCwrhSBFmaTQW2cPKrHYdGpyBdvLXUiT1ewCEJgffFun5IT+r6Qhg7B1yXb
 AbAJfibXGeCVgZuRKIQC
 =e4TB
 -----END PGP SIGNATURE-----

Merge tag 'l10n-2.10.0-rnd2.2' of git://github.com/git-l10n/git-po

l10n-2.10.0-rnd2.2

* tag 'l10n-2.10.0-rnd2.2' of git://github.com/git-l10n/git-po:
  l10n: Updated Vietnamese translation for v2.10.0-rc2 (2757t)
2016-09-02 08:48:14 -07:00
Jiang Xin
e8e349249c Merge branch 'master' of https://github.com/vnwildman/git
* 'master' of https://github.com/vnwildman/git:
  l10n: Updated Vietnamese translation for v2.10.0-rc2 (2757t)
2016-09-02 21:29:48 +08:00
Junio C Hamano
5b18e70009 A few more fixes before the final 2.10
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 10:21:05 -07:00
Junio C Hamano
934b1caa7a l10n-2.10.0-rnd2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXxuOXAAoJEMek6Rt1RHooEa8QAKO9TwEgUu3RcjoEhoHuncWw
 450/uVcz4rhCZsupzBahcSUI3qcE+GuunezASWQ0IkEXGogLC1CwnBkYzEUYuOvU
 0BH/gao4FfIZmBddCBn/6tUZJG+wTgCsD2ept6oR0RgPo85bHdN3K12S4FTHVmGj
 AK894O60ZJJZuUelbErOzHh52Pgad5BEJ9V6Bt5hBPfWLV/wPvrWm8efk4T88odw
 ErOVUv49gSUqvwA+ot8sq7bcPVhH06CTvw15iWTPmD8hLYdQDTYKL1+PkGJa0uXR
 Qoe4vLhUV8Rz0RGi9aWnlOzAYgIQ9FzkLwDfqz4KCyV1qa5EqorfjwkK1vOug9VY
 Qv5p0fWLIKqCGEyJjy1y87xMi/GgydRI8mZYAaRhHgCc4qMI/E7n4I4+vc6/9i/0
 C7tRW8ylb7UUPENdvZvcOxWgP9y/VMiw/I8wyv3wuzifM5DRxCUCfwMJDQdIzBVI
 T+sPO8cjYjKPATieZ563LBFtmiVoms4U1DDDqH65SzZtsOa2GOruAW3eIFwFmQHE
 hYhyZOFpDhg20EgSKO9owzA5IjtiuplPYJgQmUyiVEeOfcr/Gw+a3CzgCEDNJ8EF
 orjSHIwO0N7BmHDMJeJlGyCuhgG6JLdZDI9d/AiaPFuGiNnnINH5odSz8M5tE1F4
 erUKZgZDpifOYHsSurcF
 =lwbs
 -----END PGP SIGNATURE-----

Merge tag 'l10n-2.10.0-rnd2' of git://github.com/git-l10n/git-po

l10n-2.10.0-rnd2

* tag 'l10n-2.10.0-rnd2' of git://github.com/git-l10n/git-po:
  l10n: zh_CN: for git v2.10.0 l10n round 2
  l10n: ca.po: update translation
  l10n: fr.po v2.10.0-rc2
  l10n: sv.po: Update Swedish translation (2757t0f0u)
  l10n: git.pot: v2.10.0 round 2 (12 new, 44 removed)
  l10n: Updated Vietnamese translation for v2.10.0 (2789t)
  l10n: pt_PT: update Portuguese translation
  l10n: pt_PT: merge git.pot
  l10n: ko.po: Update Korean translation
  l10n: git.pot: v2.10.0 round 1 (248 new, 56 removed)
2016-08-31 10:04:14 -07:00
Junio C Hamano
58e72a2179 Merge branch 'ls/packet-line-protocol-doc-fix'
Correct an age-old calco (is that a typo-like word for calc)
in the documentation.

* ls/packet-line-protocol-doc-fix:
  pack-protocol: fix maximum pkt-line size
2016-08-31 10:03:51 -07:00
Junio C Hamano
4762bf36d9 Merge branch 'mh/blame-worktree'
* mh/blame-worktree:
  blame: fix segfault on untracked files
2016-08-31 10:03:50 -07:00
Junio C Hamano
9010077be2 Merge branch 'kw/patch-ids-optim'
* kw/patch-ids-optim:
  p3400: make test script executable
2016-08-31 10:03:49 -07:00
Ralf Thielow
2c6b6d9f7d help: make option --help open man pages only for Git commands
If option --help is passed to a Git command, we try to open
the man page of that command.  However, we do it for both commands
and concepts.  Make sure it is an actual command.

This makes "git <concept> --help" not working anymore, while
"git help <concept>" still works.

Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-30 16:09:41 -07:00
Ralf Thielow
af74128f4a help: introduce option --exclude-guides
Introduce option --exclude-guides to the help command.  With this option
being passed, "git help" will open man pages only for actual commands.

Since we know it is a command, we can use function help_unknown_command
to give the user advice on typos.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-30 16:09:41 -07:00
Lars Schneider
7841c4801c pack-protocol: fix maximum pkt-line size
According to LARGE_PACKET_MAX in pkt-line.h the maximal length of a
pkt-line packet is 65520 bytes. The pkt-line header takes 4 bytes and
therefore the pkt-line data component must not exceed 65516 bytes.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-30 11:00:29 -07:00
Jiang Xin
5c57d7622e l10n: zh_CN: for git v2.10.0 l10n round 2
Update 215 translations (2757t0f0u) for git v2.10.0-rc2.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2016-08-31 00:11:13 +08:00
René Scharfe
ba67504fa8 p3400: make test script executable
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-29 12:57:16 -07:00
Thomas Gummerer
bc6b13a7d2 blame: fix segfault on untracked files
Since 3b75ee9 ("blame: allow to blame paths freshly added to the index",
2016-07-16) git blame also looks at the index to determine if there is a
file that was freshly added to the index.

cache_name_pos returns -pos - 1 in case there is no match is found, or
if the name matches, but the entry has a stage other than 0.  As git
blame should work for unmerged files, it uses strcmp to determine
whether the name of the returned position matches, in which case the
file exists, but is merely unmerged, or if the file actually doesn't
exist in the index.

If the repository is empty, or if the file would lexicographically be
sorted as the last file in the repository, -cache_name_pos - 1 is
outside of the length of the active_cache array, causing git blame to
segfault.  Guard against that, and die() normally to restore the old
behaviour.

Reported-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-29 11:57:33 -07:00
Alex Henrie
63b8265402 l10n: ca.po: update translation
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
2016-08-28 10:32:56 -06:00
Jean-Noel Avila
b67e63067d l10n: fr.po v2.10.0-rc2
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
2016-08-28 11:36:14 +02:00
Tran Ngoc Quan
800d88e2b3 l10n: Updated Vietnamese translation for v2.10.0-rc2 (2757t)
Signed-off-by: Tran Ngoc Quan <vnwildman@gmail.com>
2016-08-28 07:23:30 +07:00
Peter Krefting
8ed2d3fb15 l10n: sv.po: Update Swedish translation (2757t0f0u)
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
2016-08-27 20:42:50 +01:00
Jiang Xin
b30eec1a69 Merge branch 'master' of https://github.com/vnwildman/git
* 'master' of https://github.com/vnwildman/git:
  l10n: Updated Vietnamese translation for v2.10.0 (2789t)
2016-08-27 23:36:16 +08:00
Jiang Xin
5bd166d8af l10n: git.pot: v2.10.0 round 2 (12 new, 44 removed)
Generate po/git.pot from v2.10.0-rc2 for git v2.10.0 l10n round 2.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2016-08-27 23:23:26 +08:00
Jiang Xin
fe1280decc Merge branch 'master' of git://github.com/git-l10n/git-po
* 'master' of git://github.com/git-l10n/git-po:
  l10n: pt_PT: update Portuguese translation
  l10n: pt_PT: merge git.pot
  l10n: ko.po: Update Korean translation
  l10n: git.pot: v2.10.0 round 1 (248 new, 56 removed)
2016-08-27 23:14:27 +08:00
Tran Ngoc Quan
b9252573c4 l10n: Updated Vietnamese translation for v2.10.0 (2789t)
Signed-off-by: Tran Ngoc Quan <vnwildman@gmail.com>
2016-08-27 09:15:28 +07:00
Junio C Hamano
d5cb9cbd64 Git 2.10-rc2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-26 13:59:20 -07:00
Torsten Bögershausen
e28eae3184 gitattributes: Document the unified "auto" handling
Update the documentation about text=auto:
text=auto now follows the core.autocrlf handling when files are not
normalized in the repository.

For a cross platform project recommend the usage of attributes for
line-ending conversions.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-26 13:54:16 -07:00
Junio C Hamano
3b1c6a9b6e Merge branch 'js/no-html-bypass-on-windows' into rt/help-unknown
* js/no-html-bypass-on-windows:
  Revert "display HTML in default browser using Windows' shell API"
2016-08-26 11:29:07 -07:00
Junio C Hamano
5cb0d5ad05 Prepare for 2.10.0-rc2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-25 13:56:51 -07:00
Junio C Hamano
0fd6c99bdf Merge branch 'ja/i18n'
The recent i18n patch we added during this cycle did a bit too much
refactoring of the messages to avoid word-legos; the repetition has
been reduced to help translators.

* ja/i18n:
  i18n: simplify numeric error reporting
  i18n: fix git rebase interactive commit messages
  i18n: fix typos for translation
2016-08-25 13:55:07 -07:00
Junio C Hamano
3dc01702df Merge branch 'bw/mingw-avoid-inheriting-fd-to-lockfile'
The tempfile (hence its user lockfile) API lets the caller to open
a file descriptor to a temporary file, write into it and then
finalize it by first closing the filehandle and then either
removing or renaming the temporary file.  When the process spawns a
subprocess after obtaining the file descriptor, and if the
subprocess has not exited when the attempt to remove or rename is
made, the last step fails on Windows, because the subprocess has
the file descriptor still open.  Open tempfile with O_CLOEXEC flag
to avoid this (on Windows, this is mapped to O_NOINHERIT).

* bw/mingw-avoid-inheriting-fd-to-lockfile:
  mingw: ensure temporary file handles are not inherited by child processes
  t6026-merge-attr: child processes must not inherit index.lock handles
2016-08-25 13:55:07 -07:00
Junio C Hamano
a8998453be Merge branch 'dg/document-git-c-in-git-config-doc'
The "git -c var[=val] cmd" facility to append a configuration
variable definition at the end of the search order was described in
git(1) manual page, but not in git-config(1), which was more likely
place for people to look for when they ask "can I make a one-shot
override, and if so how?"

* dg/document-git-c-in-git-config-doc:
  doc: mention `git -c` in git-config(1)
2016-08-25 13:55:07 -07:00
Junio C Hamano
13e11ff707 Merge branch 'js/no-html-bypass-on-windows'
On Windows, help.browser configuration variable used to be ignored,
which has been corrected.

* js/no-html-bypass-on-windows:
  Revert "display HTML in default browser using Windows' shell API"
2016-08-25 13:55:06 -07:00
Junio C Hamano
a1f0b4e286 Merge branch 'hv/doc-commit-reference-style'
A small doc update.

* hv/doc-commit-reference-style:
  SubmittingPatches: document how to reference previous commits
2016-08-25 13:55:06 -07:00
Torsten Bögershausen
41a616dada git ls-files: text=auto eol=lf is supported in Git 2.10
The man page for `git ls-files --eol` mentions the combination
of text attributes "text=auto eol=lf" or "text=auto eol=crlf" as not
supported yet, but may be in the future.

Now they are supported.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-25 13:38:18 -07:00
Vasco Almeida
9d83143621 l10n: pt_PT: update Portuguese translation
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
2016-08-25 13:33:17 +00:00
Vasco Almeida
587dae416d l10n: pt_PT: merge git.pot
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
2016-08-25 13:33:17 +00:00
Jeff King
c08db5a2d0 receive-pack: allow a maximum input size to be specified
Receive-pack feeds its input to either index-pack or
unpack-objects, which will happily accept as many bytes as
a sender is willing to provide. Let's allow an arbitrary
cutoff point where we will stop writing bytes to disk.

Cleaning up what has already been written to disk is a
related problem that is not addressed by this patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Christian Couder
5ad2186733 unpack-objects: add --max-input-size=<size> option
When receiving a pack-file, it can be useful to abort the
`git unpack-objects`, if the pack-file is too big.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Jeff King
411481be6f index-pack: add --max-input-size=<size> option
When receiving a pack-file, it can be useful to abort the
`git index-pack`, if the pack-file is too big.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Jean-Noel Avila
078fe30523 i18n: simplify numeric error reporting
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 08:47:20 -07:00
Jean-Noel Avila
8aa6dc1d9e i18n: fix git rebase interactive commit messages
For proper i18n, the logic cannot embed english specific processing.

Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 08:43:27 -07:00
Jean-Noel Avila
cd3e4677cf i18n: fix typos for translation
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 08:41:22 -07:00
Jacob Keller
957ed3a56c format-patch: show 0/1 and 1/1 for singleton patch with cover letter
Change the default behavior of git-format-patch to generate numbered
sequence of 0/1 and 1/1 when generating both a cover-letter and a single
patch. This standardizes the cover letter to have 0/N which helps
distinguish the cover letter from the patch itself. Since the behavior
is easily changed via configuration as well as the use of -n and -N this
should be acceptable default behavior.

Add tests for the new default behavior.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-23 15:59:11 -07:00
Jeff King
c7df68cbca t/perf: add basic perf tests for delta base cache
This just shows off the improvements done by the last few
patches, and gives us a baseline for noticing regressions in
the future. Here are the results with linux.git as the perf
"large repo":

Test                origin                HEAD
-------------------------------------------------------------------
0003.1: log --raw   43.41(40.36+2.69)     33.86(30.96+2.41) -22.0%
0003.2: log -S      313.61(309.74+3.78)   298.75(295.58+3.00) -4.7%

(for a large repo, the "log -S" improvements are greater if
you bump the delta base cache limit, but I think it makes
sense to test the "stock" behavior, since that is what most
people will see).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-23 15:26:16 -07:00
Jeff King
8261e1f139 delta_base_cache: use hashmap.h
The fundamental data structure of the delta base cache is a
hash table mapping pairs of "(packfile, offset)" into
structs containing the actual object data. The hash table
implementation dates back to e5e0161 (Implement a simple
delta_base cache, 2007-03-17), and uses a fixed-size table.
The current size is a hard-coded 256 entries.

Because we need to be able to remove objects from the hash
table, entry lookup does not do any kind of probing to
handle collisions. Colliding items simply replace whatever
is in their slot.  As a result, we have fewer usable slots
than even the 256 we allocate. At half full, each new item
has a 50% chance of displacing another one. Or another way
to think about it: every item has a 1/256 chance of being
ejected due to hash collision, without regard to our LRU
strategy.

So it would be interesting to see the effect of increasing
the cache size on the runtime for some common operations. As
with the previous patch, we'll measure "git log --raw" for
tree-only operations, and "git log -Sfoo --raw" for
operations that touch trees and blobs. All times are
wall-clock best-of-3, done against fully packed repos with
--depth=50, and the default core.deltaBaseCacheLimit of
96MB.

Here are timings for various values of MAX_DELTA_CACHE
against git.git (the asterisk marks the minimum time for
each operation):

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ---------
                256   0m02.227s   0m12.821s
                512   0m02.143s   0m10.602s
               1024   0m02.127s   0m08.642s
               2048   0m02.148s   0m07.123s
               4096   0m02.194s   0m06.448s*
               8192   0m02.239s   0m06.504s
              16384   0m02.144s*  0m06.502s
              32768   0m02.202s   0m06.622s
              65536   0m02.230s   0m06.677s

The log-raw case isn't changed much at all here (probably
because our trees just aren't that big in the first place,
or possibly because we have so _few_ trees in git.git that
the 256-entry cache is enough). But once we start putting
blobs in the cache, too, we see a big improvement (almost
50%). The curve levels off around 4096, which means that we
can hold about that many entries before hitting the 96MB
memory limit (or possibly that the workload is small enough
that there is simply no more work to be optimized out by
caching more).

(As a side note, I initially timed my existing git.git pack,
which was a base of --aggressive combined with some pulls on
top. So it had quite a few deeper delta chains. The
256-cache case was more like 15s, and it still dropped to
~6.5s in the same way).

Here are the timings for linux.git:

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ---------
                256   0m41.661s   5m12.410s
                512   0m39.547s   5m07.920s
               1024   0m37.054s   4m54.666s
               2048   0m35.871s   4m41.194s*
               4096   0m34.646s   4m51.648s
               8192   0m33.881s   4m55.342s
              16384   0m35.190s   5m00.122s
              32768   0m35.060s   4m58.851s
              65536   0m33.311s*  4m51.420s

As we grow we see a nice 20% speedup in the tree traversal,
and more modest 10% in the log-S. This is probably an
indication that we are bound less by the number of entries,
and more by the memory limit (more on that below). What is
interesting is that the numbers bounce around a bit;
increasing the number of entries isn't always a strict
improvement.

Partially this is due to noise in the measurement. But it
may also be an indication that our LRU ejection scheme is
not optimal. The smaller cache sizes introduce some
randomness into the ejection (due to collisions), which may
sometimes work in our favor (and sometimes not!).

So what is the optimal setting of MAX_DELTA_CACHE? The
"bouncing" in the linux.git log-S numbers notwithstanding,
it mostly seems like bigger is better. And even if we were
to try to find a "sweet spot", these are just two
repositories, that are not necessarily representative. The
shape of history, the size of trees and blobs, the memory
limit configuration, etc, all will affect the outcome.

Rather than trying to find the "right" number, another
strategy is to just switch to a hash table that can actually
store collisions: namely our hashmap.h implementation.

Here are numbers for that compared to the "best" we saw from
adjusting MAX_DELTA_CACHE:

        |       log-raw        |       log-S
        | best       hashmap   | best       hashmap
	| ---------  --------- | ---------  ---------
  git   | 0m02.144s  0m02.144s | 0m06.448s  0m06.688s
  linux | 0m33.311s  0m33.092s | 4m41.194s  4m57.172s

We can see the results are similar in most cases, which is
what we'd expect. We're not ejecting due to collisions at
all, so this is purely representing the LRU. So really, we'd
expect this to model most closely the larger values of the
static MAX_DELTA_CACHE limit. And that does seem to be
what's happening, including the "bounce" in the linux log-S
case.

So while the value for that case _isn't_ as good as the
optimal one measured above (which was 2048 entries), given
the bouncing I'm hesitant to suggest that 2048 is any kind
of optimum (not even for linux.git, let alone as a general
rule). The generic hashmap has the appeal that it drops the
number of tweakable numbers by one, which means we can focus
on tuning other elements, like the LRU strategy or the
core.deltaBaseCacheLimit setting.

And indeed, if we bump the cache limit to 1G (which is
probably silly for general use, but maybe something people
with big workstations would want to do), the linux.git log-S
time drops to 3m32s. That's something you really _can't_ do
easily with the static hash table, because the number of
entries needs to grow in proportion to the memory limit (so
2048 is almost certainly not going to be the right value
there).

This patch takes that direction, and drops the static hash
table entirely in favor of using the hashmap.h API.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-23 15:18:50 -07:00
Jeff King
6d9617f4f7 delta_base_cache: drop special treatment of blobs
When the delta base cache runs out of allowed memory, it has
to drop entries. It does so by walking an LRU list, dropping
objects until we are under the memory limit. But we actually
walk the list twice: once to drop blobs, and then again to
drop other objects (which are generally trees). This comes
from 18bdec1 (Limit the size of the new delta_base_cache,
2007-03-19).

This performs poorly as the number of entries grows, because
any time dropping blobs does not satisfy the limit, we have
to walk the _entire_ list, trees included, looking for blobs
to drop, before starting to drop any trees.

It's not generally a problem now, as the cache is limited to
only 256 entries. But as we could benefit from increasing
that in a future patch, it's worth looking at how it
performs as the cache size grows. And the answer is "not
well".

The table below shows times for various operations with
different values of MAX_DELTA_CACHE (which is not a run-time
knob; I recompiled with -DMAX_DELTA_CACHE=$n for each).

I chose "git log --raw" ("log-raw" in the table) because it
will access all of the trees, but no blobs at all (so in a
sense it is a worst case for this problem, because we will
always walk over the entire list of trees once before
realizing there are no blobs to drop). This is also
representative of other tree-only operations like "rev-list
--objects" and "git log -- <path>".

I also timed "git log -Sfoo --raw" ("log-S" in the table).
It similarly accesses all of the trees, but also the blobs
for each commit. It's representative of "git log -p", though
it emphasizes the cost of blob access more, as "-S" is
cheaper than computing an actual blob diff.

All timings are best-of-3 wall-clock times (though they all
were CPU bound, so the user CPU times are similar). The
repositories were fully packed with --depth=50, and the
default core.deltaBaseCacheLimit of 96M was in effect.  The
current value of MAX_DELTA_CACHE is 256, so I started there
and worked up by factors of 2.

First, here are values for git.git (the asterisk signals the
fastest run for each operation):

    MAX_DELTA_CACHE    log-raw       log-S
    ---------------   ---------    ---------
                256   0m02.212s    0m12.634s
                512   0m02.136s*   0m10.614s
               1024   0m02.156s    0m08.614s
               2048   0m02.208s    0m07.062s
               4096   0m02.190s    0m06.484s*
               8192   0m02.176s    0m07.635s
              16384   0m02.913s    0m19.845s
              32768   0m03.617s    1m05.507s
              65536   0m04.031s    1m18.488s

You can see that for the tree-only log-raw case, we don't
actually benefit that much as the cache grows (all the
differences up through 8192 are basically just noise; this
is probably because we don't actually have that many
distinct trees in git.git). But for log-S, we get a definite
speed improvement as the cache grows, but the improvements
are lost as cache size grows and the linear LRU management
starts to dominate.

Here's the same thing run against linux.git:

    MAX_DELTA_CACHE    log-raw       log-S
    ---------------   ---------    ----------
                256   0m40.987s     5m13.216s
                512   0m37.949s     5m03.243s
               1024   0m35.977s     4m50.580s
               2048   0m33.855s     4m39.818s
               4096   0m32.913s     4m47.299s*
               8192   0m32.176s*    5m14.650s
              16384   0m32.185s     6m31.625s
              32768   0m38.056s     9m31.136s
              65536   1m30.518s    17m38.549s

The pattern is similar, though the effect in log-raw is more
pronounced here. The times dip down in the middle, and then
go back up as we keep growing.

So we know there's a problem. What's the solution?

The obvious one is to improve the data structure to avoid
walking over tree entries during the looking-for-blobs
traversal. We can do this by keeping _two_ LRU lists: one
for blobs, and one for other objects. We drop items from the
blob LRU first, and then from the tree LRU (if necessary).

Here's git.git using that strategy:

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ----------
                256   0m02.264s   0m12.830s
                512   0m02.201s   0m10.771s
               1024   0m02.181s   0m08.593s
               2048   0m02.205s   0m07.116s
               4096   0m02.158s   0m06.537s*
               8192   0m02.213s   0m07.246s
              16384   0m02.155s*  0m10.975s
              32768   0m02.159s   0m16.047s
              65536   0m02.181s   0m16.992s

The upswing on log-raw is gone completely. But log-S still
has it (albeit much better than without this strategy).
Let's see what linux.git shows:

    MAX_DELTA_CACHE    log-raw       log-S
    ---------------   ---------    ---------
                256   0m42.519s    5m14.654s
                512   0m39.106s    5m04.708s
               1024   0m36.802s    4m51.454s
               2048   0m34.685s    4m39.378s*
               4096   0m33.663s    4m44.047s
               8192   0m33.157s    4m50.644s
              16384   0m33.090s*   4m49.648s
              32768   0m33.458s    4m53.371s
              65536   0m33.563s    5m04.580s

The results are similar. The tree-only case again performs
well (not surprising; we're literally just dropping the one
useless walk, and not otherwise changing the cache eviction
strategy at all). But the log-S case again does a bit worse
as the cache grows (though possibly that's within the noise,
which is much larger for this case).

Perhaps this is an indication that the "remove blobs first"
strategy is not actually optimal. The intent of it is to
avoid blowing out the tree cache when we see large blobs,
but it also means we'll throw away useful, recent blobs in
favor of older trees.

Let's run the same numbers without caring about object type
at all (i.e., one LRU list, and always evicting whatever is
at the head, regardless of type).

Here's git.git:

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ---------
                256   0m02.227s   0m12.821s
                512   0m02.143s   0m10.602s
               1024   0m02.127s   0m08.642s
               2048   0m02.148s   0m07.123s
               4096   0m02.194s   0m06.448s*
               8192   0m02.239s   0m06.504s
              16384   0m02.144s*  0m06.502s
              32768   0m02.202s   0m06.622s
              65536   0m02.230s   0m06.677s

Much smoother; there's no dramatic upswing as we increase
the cache size (some remains, though it's small enough that
it's mostly run-to-run noise. E.g., in the log-raw case,
note how 8192 is 50-100ms higher than its neighbors). Note
also that we stop getting any real benefit for log-S after
about 4096 entries; that number will depend on the size of
the repository, the size of the blob entries, and the memory
limit of the cache.

Let's see what linux.git shows for the same strategy:

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ---------
                256   0m41.661s   5m12.410s
                512   0m39.547s   5m07.920s
               1024   0m37.054s   4m54.666s
               2048   0m35.871s   4m41.194s*
               4096   0m34.646s   4m51.648s
               8192   0m33.881s   4m55.342s
              16384   0m35.190s   5m00.122s
              32768   0m35.060s   4m58.851s
              65536   0m33.311s*  4m51.420s

It's similarly good. As with the "separate blob LRU"
strategy, there's a lot of noise on the log-S run here. But
it's certainly not any worse, is possibly a bit better, and
the improvement over "separate blob LRU" on the git.git case
is dramatic.

So it seems like a clear winner, and that's what this patch
implements.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-23 14:57:44 -07:00