Current text claims optimization, implying the use of
hardlinks, when this option ratchets down the level of
efficiency. This change explains the difference made by
using this option, namely copying instead of hardlinking,
and why it may be useful.
Signed-off-by: Albert L. Lash, IV <alash3@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All other man files have capitalized descriptions which
immediately follow the command's name. Let's capitalize
this one too for consistency.
Signed-off-by: Albert L. Lash, IV <alash3@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The term mismerges without hyphen is used a few other
places in the documentation. Let's update this to
be consistent.
Signed-off-by: Albert L. Lash, IV <alash3@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`gc --auto` takes time and can block the user temporarily (but not any
less annoyingly). Make it run in background on systems that support
it. The only thing lost with running in background is printouts. But
gc output is not really interesting. You can keep it in foreground by
changing gc.autodetach.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
See 3e63b21 (upload-pack: Implement no-done capability - 2011-03-14)
and 761ecf0 (fetch-pack: Implement no-done capability - 2011-03-14)
for more information.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-protocol.txt explains in detail how multi_ack_detailed works and
what's the difference between no multi_ack, multi_ack and
multi_ack_detailed. No need to repeat here.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's introduced in 1bd8c8f (git-upload-pack: Support the multi_ack
protocol - 2005-10-28) but probably better documented in the commit
message of 78affc4 (Add multi_ack_detailed capability to
fetch-pack/upload-pack - 2009-10-30).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --mixed is used, entries could be removed from index if the
target ref does not have them. When "reset" is used in preparation for
commit spliting (in a dirty worktree), it could be hard to track what
files to be added back. The new option --intent-to-add simplifies it
by marking all removed files intent-to-add.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
We wanted to call the upcoming release "Git 1.9", with its
maintenance track being "Git 1.9.1", "Git 1.9.2", etc., but various
third-party tools are reported to assume that there are at least
three dewey-decimal components in our version number.
Adjust the plan so that vX.Y.0 are feature releases while vX.Y.Z
(Z > 0) are maintenance releases.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation to "git pull" hinted there is an "-m" option
because it incorrectly shared the documentation with "git merge".
* jc/maint-pull-docfix:
Documentation: "git pull" does not have the "-m" option
Documentation: exclude irrelevant options from "git pull"
This goes far back to e84fb2f (branch --contains: default to HEAD -
2008-07-08) where the same parsing code is shared with
builtin/tag.c. git-branch.txt correctly states that <commit> for
--contains is optional while git-tag.txt does not. Correct it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nicolas Vigier <boklm@mars-attacks.org>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nicolas Vigier <boklm@mars-attacks.org>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ta/doc-http-protocol-in-html:
http-protocol.txt: don't use uppercase for variable names in "The Negotiation Algorithm"
Documentation: make it easier to maintain enumerated documents
create HTML for http-protocol.txt
Explicitly list $HOME/.config/git/ignore as one of the places you
can use to keep ignore patterns that depend on your personal choice
of tools, e.g. *~ for Emacs users.
* jn/ignore-doc:
gitignore doc: add global gitignore to synopsis
Instead of starting an enumeration of documents with a DOC = doc1
followed by DOC += doc2, DOC += doc3, ..., empty it with "DOC =" at
the beginning and consistently add them with "DOC += ...".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
./Documentation/technical/http-protocol.txt was missing from TECH_DOCS in Makefile.
Add it and also improve HTML formatting while still retaining good readability of the ASCII text:
- Use monospace font instead of italicized or roman font for machine output and source text
- Use roman font for things which should be body text
- Use double quotes consistently for "want" and "have" commands
- Use uppercase "C" / "S" consistently for "client" / "server";
also use "C:" / "S:" instead of "(C)" / "(S)" for consistency and
to avoid having formatted "(C)" as copyright symbol in HTML
- Use only spaces and not a combination of tabs and spaces for whitespace
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We decided at 48bb914e (doc: drop author/documentation sections from
most pages, 2011-03-11) to remove "author" and "documentation"
sections from our documentation. Remove a few stragglers.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add cross-references between the manpages for git-for-each-ref(1) and
git-show-ref(1).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thomas Rast noticed the docs have a mix of styles when
it comes to options with multiple spellings. Standardize
the couple in git-p4.txt that are odd.
Instead of:
-n, --dry-run::
Do this:
-n::
--dry-run::
See
http://thread.gmane.org/gmane.comp.version-control.git/219936/focus=219945
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The @{-N} syntax always referred to the N-th last thing checked out,
which can be either a branch or a commit (for detached HEAD cases).
However, the documentation only mentioned branches.
Edit in a "/commit" in the appropriate places.
Reported-by: Kevin <ikke@ikke.info>
Signed-off-by: Thomas Rast <tr@thomasrast.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -L option is the same as for git-log, so the entire block is just
copied from git-log.txt. However, until the parser is fixed we add a
caveat that gitk only understands the stuck form.
Signed-off-by: Thomas Rast <tr@thomasrast.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).
* nd/shallow-clone: (31 commits)
t5537: fix incorrect expectation in test case 10
shallow: remove unused code
send-pack.c: mark a file-local function static
git-clone.txt: remove shallow clone limitations
prune: clean .git/shallow after pruning objects
clone: use git protocol for cloning shallow repo locally
send-pack: support pushing from a shallow clone via http
receive-pack: support pushing to a shallow clone via http
smart-http: support shallow fetch/clone
remote-curl: pass ref SHA-1 to fetch-pack as well
send-pack: support pushing to a shallow clone
receive-pack: allow pushes that update .git/shallow
connected.c: add new variant that runs with --shallow-file
add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
receive/send-pack: support pushing from a shallow clone
receive-pack: reorder some code in unpack()
fetch: add --update-shallow to accept refs that update .git/shallow
upload-pack: make sure deepening preserves shallow roots
fetch: support fetching from a shallow repository
clone: support remote shallow repository
...
The gitignore(5) manpage already documents $XDG_CONFIG_HOME/git/ignore
but it is easy to forget that it exists. Add a reminder to the
synopsis.
Noticed while looking for a place to put a list of scratch filenames
in the cwd used by one's editor of choice.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a `pull.ff` configuration option that is analogous
to the `merge.ff` option.
This allows us to control the fast-forward behavior for
pull-initiated merges only.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old text made it sound like macros are only allowed in the
.gitattributes file at the top-level of the working tree. Make it
clear that they are also allowed in $GIT_DIR/info/attributes and in
the global and system-wide gitattributes files.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though "--[no-]edit" can be used with "git pull", the
explanation of the interaction between this option and the "-m"
option does not make sense within the context of "git pull". Use
the conditional inclusion mechanism to remove this part from "git
pull" documentation, while keeping it for "git merge".
Reported-by: Ivan Zakharyaschev
Signed-off-by: Junio C Hamano <gitster@pobox.com>
10eb64f5 (git pull manpage: don't include -n from fetch-options.txt,
2008-01-25) introduced a way to exclude some parts of included
source when building git-pull documentation, and later 409b8d82
(Documentation/git-pull: put verbosity options before merge/fetch
ones, 2010-02-24) attempted to use the mechanism to exclude some
parts of merge-options.txt when used from git-pull.txt.
However, the latter did not have an intended effect, because the
macro "git-pull" used to decide if the source is included in
git-pull documentation were defined a bit too late.
Define the macro before it is used to fix this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With a submodule that was initialized in an old fashioned way
without gitlinks, switching branches in the superproject between
the one with and without the submodule may leave the submodule
working tree with its embedded repository behind, as there may be
unexpendable state there. Document and warn users about this.
* jl/submodule-mv-checkout-caveat:
rm: better document side effects when removing a submodule
mv: better document side effects when moving a submodule
Just like we give a reasonable default for "less" via the LESS
environment variable, specify a reasonable default for "lv" via the
"LV" environment variable when spawning the pager.
* jn/pager-lv-default-env:
pager: set LV=-c alongside LESS=FRSX
"git help $cmd" unnecessarily enumerated potential command names
from the filesystem, even when $cmd is known to be a built-in.
Ideas for further optimization, primarily by killing the use of
is_in_cmdlist(), were suggested in the discussion, but they can
come as follow-ups on top of this series.
* ss/builtin-cleanup:
builtin/help.c: speed up is_git_command() by checking for builtin commands first
builtin/help.c: call load_command_list() only when it is needed
git.c: consistently use the term "builtin" instead of "internal command"
Update the way the user-manual is formatted via AsciiDoc to save
trees.
* ta/format-user-manual-as-an-article:
user-manual: improve html and pdf formatting
Teach "cat-file --batch" to show delta-base object name for a
packed object that is represented as a delta.
* jk/oi-delta-base:
cat-file: provide %(deltabase) batch format
sha1_object_info_extended: provide delta base sha1s
Allow "git diff -O<file>" to be configured with a new configuration
variable.
* sb/diff-orderfile-config:
diff: add diff.orderfile configuration variable
diff: let "git diff -O" read orderfile from any file and fail properly
t4056: add new tests for "git diff -O"
read_sha1_file() that is the workhorse to read the contents given
an object name honoured object replacements, but there is no
corresponding mechanism to sha1_object_info() that is used to
obtain the metainfo (e.g. type & size) about the object, leading
callers to weird inconsistencies.
* cc/replace-object-info:
replace info: rename 'full' to 'long' and clarify in-code symbols
Documentation/git-replace: describe --format option
builtin/replace: unset read_replace_refs
t6050: add tests for listing with --format
builtin/replace: teach listing using short, medium or full formats
sha1_file: perform object replacement in sha1_object_info_extended()
t6050: show that git cat-file --batch fails with replace objects
sha1_object_info_extended(): add an "unsigned flags" parameter
sha1_file.c: add lookup_replace_object_extended() to pass flags
replace_object: don't check read_replace_refs twice
rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT
Introduce "negative pathspec" magic, to allow "git log -- . ':!dir'" to
tell us "I am interested in everything but 'dir' directory".
* nd/negative-pathspec:
pathspec.c: support adding prefix magic to a pathspec with mnemonic magic
Support pathspec magic :(exclude) and its short form :!
glossary-content.txt: rephrase magic signature part
The "Submodules" section of the "git rm" documentation mentions what will
happen when a submodule with a gitfile gets removed with newer git. But it
doesn't talk about what happens when the user changes between commits
before and after the removal, which does not remove the submodule from the
work tree like using the rm command did the first time.
Explain what happens and what the user has to do manually to fix that in
the new BUGS section. Also document this behavior in a new test.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "Submodules" section of the "git mv" documentation mentions what will
happen when a submodule with a gitfile gets moved with newer git. But it
doesn't talk about what happens when the user changes between commits
before and after the move, which does not update the work tree like using
the mv command did the first time.
Explain what happens and what the user has to do manually to fix that in
the new BUGS section. Also document this behavior in a new test.
Reported-by: George Papanikolaou <g3orge.app@gmail.com>
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On systems with lv configured as the preferred pager (i.e.,
DEFAULT_PAGER=lv at build time, or PAGER=lv exported in the
environment) git commands that use color show control codes instead of
color in the pager:
$ git diff
^[[1mdiff --git a/.mailfilter b/.mailfilter^[[m
^[[1mindex aa4f0b2..17e113e 100644^[[m
^[[1m--- a/.mailfilter^[[m
^[[1m+++ b/.mailfilter^[[m
^[[36m@@ -1,11 +1,58 @@^[[m
"less" avoids this problem because git uses the LESS environment
variable to pass the -R option ('output ANSI color escapes in raw
form') by default. Use the LV environment variable to pass 'lv' the
-c option ('allow ANSI escape sequences for text decoration / color')
to fix it for lv, too.
Noticed when the default value for color.ui flipped to 'auto' in
v1.8.4-rc0~36^2~1 (2013-06-10).
Reported-by: Olaf Meeuwissen <olaf.meeuwissen@avasys.jp>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use asciidoc style 'article' instead of 'book' and change asciidoc
title level. This removes blank first page and superfluous "Part I"
page (there is no "Part II") in pdf output. Also pdf size is
decreased by this from 77 to 67 pages. In html output this removes
unnecessary sub-tocs and chapter numbering.
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 2dce956 is_git_command() is a bit slow as it does file I/O in
the call to list_commands_in_dir(). Avoid the file I/O by adding an
early check for the builtin commands.
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Descriptions for all the settings fell under the initial "Each
submodule section also contains the following required keys:". The
example shows sections with just 'path' and 'url' entries, which are
indeed required, but we should still make the required/optional
distinction explicit to clarify that the rest of them are optional.
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enum names SHORT/MEDIUM/FULL were too broad to be descriptive. And
they clashed with built-in symbols on platforms like Windows.
Clarify by giving them REPLACE_FORMAT_ prefix.
Rename 'full' format in "git replace --format=<name>" to 'long', to
match others (i.e. 'short' and 'medium').
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.
In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.
As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.
The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.
But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.
This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.
Here are perf results for p5310:
Test origin/master HEAD^ HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7%
5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5%
5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2%
5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9%
You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
`.idx` files, this commit teaches `git-repack` to consider the new
bitmap indexes (if they exist) when performing repack operations.
This implies moving old bitmap indexes out of the way if we are
repacking a repository that already has them, and moving the newly
generated bitmap indexes into the `objects/pack` directory, next to
their corresponding packfiles.
Since `git repack` is now capable of handling these `.bitmap` files,
a normal `git gc` run on a repository that has `pack.writebitmaps` set
to true in its config file will generate bitmap indexes as part of the
garbage collection process.
Alternatively, `git repack` can be called with the `-b` switch to
explicitly generate bitmap indexes if you are experimenting
and don't want them on all the time.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bitmap reachability index used to speed up the counting objects
phase during `pack-objects` can also be used to optimize a normal
rev-list if the only thing required are the SHA1s of the objects during
the list (i.e., not the path names at which trees and blobs were found).
Calling `git rev-list --objects --use-bitmap-index [committish]` will
perform an object iteration based on a bitmap result instead of actually
walking the object graph.
These are some example timings for `torvalds/linux` (warm cache,
best-of-five):
$ time git rev-list --objects master > /dev/null
real 0m34.191s
user 0m33.904s
sys 0m0.268s
$ time git rev-list --objects --use-bitmap-index master > /dev/null
real 0m1.041s
user 0m0.976s
sys 0m0.064s
Likewise, using `git rev-list --count --use-bitmap-index` will speed up
the counting operation by building the resulting bitmap and performing a
fast popcount (number of bits set on the bitmap) on the result.
Here are some sample timings of different ways to count commits in
`torvalds/linux`:
$ time git rev-list master | wc -l
399882
real 0m6.524s
user 0m6.060s
sys 0m3.284s
$ time git rev-list --count master
399882
real 0m4.318s
user 0m4.236s
sys 0m0.076s
$ time git rev-list --use-bitmap-index --count master
399882
real 0m0.217s
user 0m0.176s
sys 0m0.040s
This also respects negative refs, so you can use it to count
a slice of history:
$ time git rev-list --count v3.0..master
144843
real 0m1.971s
user 0m1.932s
sys 0m0.036s
$ time git rev-list --use-bitmap-index --count v3.0..master
real 0m0.280s
user 0m0.220s
sys 0m0.056s
Though note that the closer the endpoints, the less it helps. In the
traversal case, we have fewer commits to cross, so we take less time.
But the bitmap time is dominated by generating the pack revindex, which
is constant with respect to the refs given.
Note that you cannot yet get a fast --left-right count of a symmetric
difference (e.g., "--count --left-right master...topic"). The slow part
of that walk actually happens during the merge-base determination when
we parse "master...topic". Even though a count does not actually need to
know the real merge base (it only needs to take the symmetric difference
of the bitmaps), the revision code would require some refactoring to
handle this case.
Additionally, a `--test-bitmap` flag has been added that will perform
the same rev-list manually (i.e. using a normal revwalk) and using
bitmaps, and verify that the results are the same. This can be used to
exercise the bitmap code, and also to verify that the contents of the
.bitmap file are sane.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the technical documentation for the JGit-compatible Bitmap v1
on-disk format.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two packfiles that contain the same set of objects have
traditionally been named identically, but that made repacking a
repository that is already fully packed without any cruft with a
different packing parameter cumbersome. Update the convention to
name the packfile after the bytestream representation of the data,
not after the set of objects in it.
* jk/name-pack-after-byte-representation:
pack-objects doc: treat output filename as opaque
pack-objects: name pack files after trailer hash
sha1write: make buffer const-correct
Show the total number of paths and the number of paths shown so far
when "git difftool" prompts to launch an external diff tool, which
would give users some sense of progress.
* zk/difftool-counts:
diff.c: fix some recent whitespace style violations
difftool: display the number of files in the diff queue in the prompt
Make "git push origin master" update the same ref that would be
updated by our 'master' when "git push origin" (no refspecs) is run
while the 'master' branch is checked out, which makes "git push"
more symmetric to "git fetch" and more usable for the triangular
workflow.
* jc/push-refmap:
push: also use "upstream" mapping when pushing a single ref
push: use remote.$name.push as a refmap
builtin/push.c: use strbuf instead of manual allocation
It can be useful for debugging or analysis to see which
objects are stored as delta bases on top of others. This
information is available by running `git verify-pack`, but
that is extremely expensive (and is harder than necessary to
parse).
Instead, let's make it available as a cat-file query format,
which makes it fast and simple to get the bases for a subset
of the objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff.orderfile acts as a default for the -O command line option.
[sb: split up aw's original patch; rework tests and docs, treat option
as pathname]
Signed-off-by: Anders Waldenborg <anders@0x63.nu>
Signed-off-by: Samuel Bronson <naesten@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The BFG is a tool specifically designed for the task of removing
unwanted data from Git repository history - a common use-case for which
git-filter-branch has been the traditional workhorse.
It's beneficial to let users know that filter-branch has an alternative
here:
* speed : The BFG is 10-50x faster
http://rtyley.github.io/bfg-repo-cleaner/#speed
* complexity of configuration : filter-branch is a very flexible tool,
but demands very careful usage in order to get the desired results
http://rtyley.github.io/bfg-repo-cleaner/#examples
Obviously, filter-branch has it's advantages too - it permits very
complex rewrites, and doesn't require a JVM - but for the common
use-case of deleting unwanted data, it's helpful to users to be aware
that an alternative exists.
The BFG was released under the GPL in February 2013, and has since seen
widespread production use (The Guardian, RedHat, Google, UK Government
Digital Service), been tested against large repos (~300K commits, ~5GB
packfiles) and received significant positive feedback from users:
http://rtyley.github.io/bfg-repo-cleaner/#feedback
Signed-off-by: Roberto Tyley <roberto.tyley@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow gitweb to be configured to show refs out of refs/heads/ as if
they were branches.
* kn/gitweb-extra-branch-refs:
gitweb: Denote non-heads, non-remotes branches
gitweb: Add a feature for adding more branch refs
gitweb: Return 1 on validation success instead of passed input
gitweb: Move check-ref-format code into separate function
After 1190a1a (pack-objects: name pack files after trailer hash,
2013-12-05), the SHA-1 used to determine the filename is calculated
differently. Update the documentation to not guarantee anything more
than that the SHA-1 depends on the pack content somehow.
Hopefully this will discourage readers from depending on the old or
the new calculation.
Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow receive-pack to insist on receiving a fat pack from "git
push" clients.
* cn/thin-push-capability:
send-pack: don't send a thin pack to a server which doesn't support it