One of the indirect callers of make_virtual_commit() passes the result of
oid_to_hex() as the name, i.e. a pointer to a static buffer. Since the
function uses that string pointer directly in building a struct
merge_remote_desc, multiple entries can end up sharing the same name
inadvertently.
Fix that by calling set_merge_remote_desc(), which creates a copy of the
string, instead of building the struct by hand.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Export a helper function for allocating, populating and attaching a
merge_remote_desc to a commit.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation errors for the name member just like we already do
for the struct merge_remote_desc itself.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
handle_message_id() duplicates the contents of the strbuf that is passed
to it. Its only caller proceeds to release the strbuf immediately after
that. Reuse it instead and make that change of object ownership more
obvious by inlining this short function.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This section is about "The FLEXPTR_* variants", so use FLEXPTR_ALLOC_STR
in the example.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit adjusted the column alignment for revision
examples which show expansion. Fix the unchanged examples and sort
those that show expansions to the end of the list.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The revisions examples show the revison arguments and the selected
commits, but do not show the intermediate step of the expansion of
the special 'range' notations. Extend the examples, including an
all-parents multi-parent merge commit example.
Sort the examples and fix the alignment for those unaffected
in the next commit.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the r1..r2 case, the exclusion of r1, rather than inclusion of r2,
would be the unexpected case in natural language for a simple linear
development, i.e. start..end excludes start.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do not self-define `reachable`, which can lead to misunderstanding.
Instead define `reachability` explictly.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prior sentence has too many clauses for easy parsing.
Replace 'the latter case' with a direct quote.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The concerned test greps the error message in git_parse_source() which
contains "bad config line %d in submodule-blob %s".
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use test_i18ngrep function instead of grep for grepping strings.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The concerned test greps the output of exit_with_patch() in
git-rebase--interactive.sh script.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While there, also break out the other shorthand notations and
add a title for the revision range summary (which also appears
in git-rev-parse, so keep it mixed case).
We do not quote the notation within the headings as the asciidoc ->
docbook -> groff man viewer toolchain, particularly the docbook-groff
step, does not cope with two font changes, failing to return the heading
font to bold after the quotation of the notation.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.
* kw/patch-ids-optim:
rebase: avoid computing unnecessary patch IDs
patch-ids: add flag to create the diff patch id using header only data
patch-ids: replace the seen indicator with a commit pointer
patch-ids: stop using a hand-rolled hashmap implementation
The http-backend (the server-side component of smart-http
transport) used to trickle the HTTP header one at a time. Now
these write(2)s are batched.
* ew/http-backend-batch-headers:
http-backend: buffer headers before sending
"git mv dir non-existing-dir/" did not work in some environments
the same way as existing mainstream platforms. The code now moves
"dir" to "non-existing-dir", without relying on rename("A", "B/")
that strips the trailing slash of '/'.
* js/mv-dir-to-new-directory:
git mv: do not keep slash in `git mv dir non-existing-dir/`
Various small fixups to the "GIT_TRACE" facility.
* jk/trace-fixup:
trace: do not fall back to stderr
write_or_die: drop write_or_whine_pipe()
trace: disable key after write error
trace: correct variable name in write() error message
trace: cosmetic fixes for error messages
trace: use warning() for printing trace errors
trace: stop using write_or_whine_pipe()
trace: handle NULL argument in trace_disable()
"import-tars" fast-import script (in contrib/) used to ignore a
hardlink target and replaced it with an empty file, which has been
corrected to record the same blob as the other file the hardlink is
shared with.
* js/import-tars-hardlinks:
import-tars: support hard links
"git difftool <paths>..." started in a subdirectory failed to
interpret the paths relative to that directory, which has been
fixed.
* jk/difftool-in-subdir:
difftool: use Git::* functions instead of passing around state
difftool: avoid $GIT_DIR and $GIT_WORK_TREE
difftool: fix argument handling in subdirs
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.
* jk/reset-ident-time-per-commit:
am: reset cached ident date for each patch
The `rebase` family of Git commands avoid applying patches that were
already integrated upstream. They do that by using the revision walking
option that computes the patch IDs of the two sides of the rebase
(local-only patches vs upstream-only ones) and skipping those local
patches whose patch ID matches one of the upstream ones.
In many cases, this causes unnecessary churn, as already the set of
paths touched by a given commit would suffice to determine that an
upstream patch has no local equivalent.
This hurts performance in particular when there are a lot of upstream
patches, and/or large ones.
Therefore, let's introduce the concept of a "diff-header-only" patch ID,
compare those first, and only evaluate the "full" patch ID lazily.
Please note that in contrast to the "full" patch IDs, those
"diff-header-only" patch IDs are prone to collide with one another, as
adjacent commits frequently touch the very same files. Hence we now
have to be careful to allow multiple hash entries with the same hash.
We accomplish that by using the hashmap_add() function that does not even
test for hash collisions. This also allows us to evaluate the full patch ID
lazily, i.e. only when we found commits with matching diff-header-only
patch IDs.
We add a performance test that demonstrates ~1-6% improvement. In
practice this will depend on various factors such as how many upstream
changes and how big those changes are along with whether file system
caches are cold or warm. As Git's test suite has no way of catching
performance regressions, we also add a regression test that verifies
that the full patch ID computation is skipped when the diff-header-only
computation suffices.
Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit message is long and has lots of background and
numbers. The summary is: the current default of 250 doesn't
save much space, and costs CPU. It's not a good tradeoff.
Read on for details.
The "--aggressive" flag to git-gc does three things:
1. use "-f" to throw out existing deltas and recompute from
scratch
2. use "--window=250" to look harder for deltas
3. use "--depth=250" to make longer delta chains
Items (1) and (2) are good matches for an "aggressive"
repack. They ask the repack to do more computation work in
the hopes of getting a better pack. You pay the costs during
the repack, and other operations see only the benefit.
Item (3) is not so clear. Allowing longer chains means fewer
restrictions on the deltas, which means potentially finding
better ones and saving some space. But it also means that
operations which access the deltas have to follow longer
chains, which affects their performance. So it's a tradeoff,
and it's not clear that the tradeoff is even a good one.
The existing "250" numbers for "--aggressive" come
originally from this thread:
http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/
where Linus says:
So when I said "--depth=250 --window=250", I chose those
numbers more as an example of extremely aggressive
packing, and I'm not at all sure that the end result is
necessarily wonderfully usable. It's going to save disk
space (and network bandwidth - the delta's will be re-used
for the network protocol too!), but there are definitely
downsides too, and using long delta chains may
simply not be worth it in practice.
There are some numbers in that thread, but they're mostly
focused on the improved window size, and measure the
improvement from --depth=250 and --window=250 together.
E.g.:
http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/
talks about the improved run-time of "git-blame", which
comes from the reduced pack size. But most of that reduction
is coming from --window=250, whereas most of the extra costs
come from --depth=250. There's a link in that thread showing
that increasing the depth beyond 50 doesn't seem to help
much with the size:
https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html
but again, no discussion of the timing impact.
In an earlier thread from Ted Ts'o which discussed setting
the non-aggressive default (from 10 to 50):
http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/
we have more numbers, with the conclusion that going past 50
does not help size much, and hurts the speed of normal
operations.
So from that, we might guess that 50 is actually a sweet
spot, even for aggressive, if we interpret aggressive to
"spend time now to make a better pack". It is not clear that
"--depth=250" is actually a better pack. It may be slightly
_smaller_, but it carries a run-time penalty.
Here are some more recent timings I did to verify that. They
show three things:
- the size of the resulting pack (so disk saved to store,
bandwidth saved on clones/fetches)
- the cost of "rev-list --objects --all", which shows the
effect of the delta chains on trees (commits typically
don't delta, and the command doesn't touch the blobs at
all)
- the cost of "log -Sfoo", which will additionally access
each blob
All cases were repacked with "git repack -adf --depth=$d
--window=250" (so basically, what would happen if we tweaked
the "gc --aggressive" default depth).
The timings are all wall-clock best-of-3. The machine itself
has plenty of RAM compared to the repositories (which is
probably typical of most workstations these days), so we're
really measuring CPU usage, as the whole thing will be in
disk cache after the first run.
The core.deltaBaseCacheLimit is at its default of 96MiB.
It's possible that tweaking it would have some impact on the
tests, as some of them (especially "log -S" on a large repo)
are likely to overflow that. But bumping that carries a
run-time memory cost, so for these tests, I focused on what
we could do just with the on-disk pack tradeoffs.
Each test is done for four depths: 250 (the current value),
50 (the current default that tested well previously), 100
(to show something on the larger side, which previous tests
showed was not a good tradeoff), and 10 (the very old
default, which previous tests showed was worse than 50).
Here are the numbers for linux.git:
depth | size | % | rev-list | % | log -Sfoo | %
-------+-------+-------+----------+--------+-----------+-------
250 | 967MB | n/a | 48.159s | n/a | 378.088 | n/a
100 | 971MB | +0.4% | 41.471s | -13.9% | 342.060 | -9.5%
50 | 979MB | +1.2% | 37.778s | -21.6% | 311.040s | -17.7%
10 | 1.1GB | +6.6% | 32.518s | -32.5% | 279.890s | -25.9%
and for git.git:
depth | size | % | rev-list | % | log -Sfoo | %
-------+-------+-------+----------+--------+-----------+-------
250 | 48MB | n/a | 2.215s | n/a | 20.922s | n/a
100 | 49MB | +0.5% | 2.140s | -3.4% | 17.736s | -15.2%
50 | 49MB | +1.7% | 2.099s | -5.2% | 15.418s | -26.3%
10 | 53MB | +9.3% | 2.001s | -9.7% | 12.677s | -39.4%
You can see that that the CPU savings for regular operations improves as we
decrease the depth. The savings are less for "rev-list" on a smaller repository
than they are for blob-accessing operations, or even rev-list on a larger
repository. This may mean that a larger delta cache would help (though setting
core.deltaBaseCacheLimit by itself doesn't).
But we can also see that the space savings are not that great as the depth goes
higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff.
Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is
probably not.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few updates to "git submodule update".
Use of "| wc -l" break with BSD variant of 'wc'.
* sb/submodule-update-dot-branch:
t7406: fix breakage on OSX
submodule update: allow '.' for branch value
submodule--helper: add remote-branch helper
submodule-config: keep configured branch around
submodule--helper: fix usage string for relative-path
submodule update: narrow scope of local variable
submodule update: respect depth in subsequent fetches
t7406: future proof tests with hard coded depth
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.
* js/am-3-merge-recursive-direct:
merge-recursive: flush output buffer even when erroring out
merge_trees(): ensure that the callers release output buffer
merge-recursive: offer an option to retain the output in 'obuf'
merge-recursive: write the commit title in one go
merge-recursive: flush output buffer before printing error messages
am -3: use merge_recursive() directly again
merge-recursive: switch to returning errors instead of dying
merge-recursive: handle return values indicating errors
merge-recursive: allow write_tree_from_memory() to error out
merge-recursive: avoid returning a wholesale struct
merge_recursive: abort properly upon errors
prepare the builtins for a libified merge_recursive()
merge-recursive: clarify code in was_tracked()
die(_("BUG")): avoid translating bug messages
die("bug"): report bugs consistently
t5520: verify that `pull --rebase` shows the helpful advice when failing
"git format-patch" learned format.from configuration variable to
specify the default settings for its "--from" option.
* jt/format-patch-from-config:
format-patch: format.from gives the default for --from
"git push --force-with-lease" already had enough logic to allow
ensuring that such a push results in creation of a ref (i.e. the
receiving end did not have another push from sideways that would be
discarded by our force-pushing), but didn't expose this possibility
to the users. It does so now.
* jk/push-force-with-lease-creation:
t5533: make it pass on case-sensitive filesystems
push: allow pushing new branches with --force-with-lease
push: add shorthand for --force-with-lease branch creation
Documentation/git-push: fix placeholder formatting
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.
* jk/reset-ident-time-per-commit:
am: reset cached ident date for each patch