1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-15 21:53:44 +01:00
Commit graph

36115 commits

Author SHA1 Message Date
Jeff King
212f2ffbf0 t: add basic bitmap functionality tests
Now that we can read and write bitmaps, we can exercise them
with some basic functionality tests. These tests aren't
particularly useful for seeing the benefit, as the test
repo is too small for it to make a difference. However, we
can at least check that using bitmaps does not break anything.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Nguyễn Thái Ngọc Duy
d3d3e4c490 count-objects: recognize .bitmap in garbage-checking
Count-objects will report any "garbage" files in the packs
directory, including files whose extensions it does not
know (case 1), and files whose matching ".pack" file is
missing (case 2).  Without having learned about ".bitmap"
files, the current code reports all such files as garbage
(case 1), even if their pack exists. Instead, they should be
treated as case 2.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Vicent Marti
5cf2741c5a repack: consider bitmaps when performing repacks
Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
`.idx` files, this commit teaches `git-repack` to consider the new
bitmap indexes (if they exist) when performing repack operations.

This implies moving old bitmap indexes out of the way if we are
repacking a repository that already has them, and moving the newly
generated bitmap indexes into the `objects/pack` directory, next to
their corresponding packfiles.

Since `git repack` is now capable of handling these `.bitmap` files,
a normal `git gc` run on a repository that has `pack.writebitmaps` set
to true in its config file will generate bitmap indexes as part of the
garbage collection process.

Alternatively, `git repack` can be called with the `-b` switch to
explicitly generate bitmap indexes if you are experimenting
and don't want them on all the time.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Jeff King
b77fcd1edc repack: handle optional files created by pack-objects
We ask pack-objects to pack to a set of temporary files, and
then rename them into place. Some files that pack-objects
creates may be optional (like a .bitmap file), in which case
we would not want to call rename(). We already call stat()
and make the chmod optional if the file cannot be accessed.
We could simply skip the rename step in this case, but that
would be a minor regression in noticing problems with
non-optional files (like the .pack and .idx files).

Instead, we can now annotate extensions as optional, and
skip them if they don't exist (and otherwise rely on
rename() to barf).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Jeff King
42a02d8529 repack: turn exts array into array-of-struct
This is slightly more verbose, but will let us annotate the
extensions with further options in future commits.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Jeff King
b328c2166e repack: stop using magic number for ARRAY_SIZE(exts)
We have a static array of extensions, but hardcode the size
of the array in our loops. Let's pull out this magic number,
which will make it easier to change.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Vicent Marti
7cc8f97108 pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.

If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.

Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:

    1. `bitmap_writer_set_checksum`: this call stores the partial
       checksum for the packfile being written; the checksum will be
       written in the resulting bitmap index to verify its integrity

    2. `bitmap_writer_build_type_index`: this call uses the array of
       `struct object_entry` that has just been sorted when writing out
       the actual packfile index to disk to generate 4 type-index bitmaps
       (one for each object type).

       These bitmaps have their nth bit set if the given object is of
       the bitmap's type. E.g. the nth bit of the Commits bitmap will be
       1 if the nth object in the packfile index is a commit.

       This is a very cheap operation because the bitmap writing code has
       access to the metadata stored in the `struct object_entry` array,
       and hence the real type for each object in the packfile.

    3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
       index for one of the packfiles we're trying to repack, this call
       will efficiently rebuild the existing bitmaps so they can be
       reused on the new index. All the existing bitmaps will be stored
       in a `reuse` hash table, and the commit selection phase will
       prioritize these when selecting, as they can be written directly
       to the new index without having to perform a revision walk to
       fill the bitmap. This can greatly speed up the repack of a
       repository that already has bitmaps.

    4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
       a given `pack-objects` run, the sequence of commits generated
       during the Counting Objects phase will be stored in an array.

       We then use that array to build up the list of selected commits.
       Writing a bitmap in the index for each object in the repository
       would be cost-prohibitive, so we use a simple heuristic to pick
       the commits that will be indexed with bitmaps.

       The current heuristics are a simplified version of JGit's
       original implementation. We select a higher density of commits
       depending on their age: the 100 most recent commits are always
       selected, after that we pick 1 commit of each 100, and the gap
       increases as the commits grow older. On top of that, we make sure
       that every single branch that has not been merged (all the tips
       that would be required from a clone) gets their own bitmap, and
       when selecting commits between a gap, we tend to prioritize the
       commit with the most parents.

       Do note that there is no right/wrong way to perform commit
       selection; different selection algorithms will result in
       different commits being selected, but there's no such thing as
       "missing a commit". The bitmap walker algorithm implemented in
       `prepare_bitmap_walk` is able to adapt to missing bitmaps by
       performing manual walks that complete the bitmap: the ideal
       selection algorithm, however, would select the commits that are
       more likely to be used as roots for a walk in the future (e.g.
       the tips of each branch, and so on) to ensure a bitmap for them
       is always available.

    5. `bitmap_writer_build`: this is the computationally expensive part
       of bitmap generation. Based on the list of commits that were
       selected in the previous step, we perform several incremental
       walks to generate the bitmap for each commit.

       The walks begin from the oldest commit, and are built up
       incrementally for each branch. E.g. consider this dag where A, B,
       C, D, E, F are the selected commits, and a, b, c, e are a chunk
       of simplified history that will not receive bitmaps.

            A---a---B--b--C--c--D
                     \
                      E--e--F

       We start by building the bitmap for A, using A as the root for a
       revision walk and marking all the objects that are reachable
       until the walk is over. Once this bitmap is stored, we reuse the
       bitmap walker to perform the walk for B, assuming that once we
       reach A again, the walk will be terminated because A has already
       been SEEN on the previous walk.

       This process is repeated for C, and D, but when we try to
       generate the bitmaps for E, we can reuse neither the current walk
       nor the bitmap we have generated so far.

       What we do now is resetting both the walk and clearing the
       bitmap, and performing the walk from scratch using E as the
       origin. This new walk, however, does not need to be completed.
       Once we hit B, we can lookup the bitmap we have already stored
       for that commit and OR it with the existing bitmap we've composed
       so far, allowing us to limit the walk early.

       After all the bitmaps have been generated, another iteration
       through the list of commits is performed to find the best XOR
       offsets for compression before writing them to disk. Because of
       the incremental nature of these bitmaps, XORing one of them with
       its predecesor results in a minimal "bitmap delta" most of the
       time. We can write this delta to the on-disk bitmap index, and
       then re-compose the original bitmaps by XORing them again when
       loaded.

       This is a phase very similar to pack-object's `find_delta` (using
       bitmaps instead of objects, of course), except the heuristics
       have been greatly simplified: we only check the 10 bitmaps before
       any given one to find best compressing one. This gives good
       results in practice, because there is locality in the ordering of
       the objects (and therefore bitmaps) in the packfile.

     6. `bitmap_writer_finish`: the last step in the process is
	serializing to disk all the bitmap data that has been generated
	in the two previous steps.

	The bitmap is written to a tmp file and then moved atomically to
	its final destination, using the same process as
	`pack-write.c:write_idx_file`.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Vicent Marti
aa32939fea rev-list: add bitmap mode to speed up object lists
The bitmap reachability index used to speed up the counting objects
phase during `pack-objects` can also be used to optimize a normal
rev-list if the only thing required are the SHA1s of the objects during
the list (i.e., not the path names at which trees and blobs were found).

Calling `git rev-list --objects --use-bitmap-index [committish]` will
perform an object iteration based on a bitmap result instead of actually
walking the object graph.

These are some example timings for `torvalds/linux` (warm cache,
best-of-five):

    $ time git rev-list --objects master > /dev/null

    real    0m34.191s
    user    0m33.904s
    sys     0m0.268s

    $ time git rev-list --objects --use-bitmap-index master > /dev/null

    real    0m1.041s
    user    0m0.976s
    sys     0m0.064s

Likewise, using `git rev-list --count --use-bitmap-index` will speed up
the counting operation by building the resulting bitmap and performing a
fast popcount (number of bits set on the bitmap) on the result.

Here are some sample timings of different ways to count commits in
`torvalds/linux`:

    $ time git rev-list master | wc -l
        399882

        real    0m6.524s
        user    0m6.060s
        sys     0m3.284s

    $ time git rev-list --count master
        399882

        real    0m4.318s
        user    0m4.236s
        sys     0m0.076s

    $ time git rev-list --use-bitmap-index --count master
        399882

        real    0m0.217s
        user    0m0.176s
        sys     0m0.040s

This also respects negative refs, so you can use it to count
a slice of history:

        $ time git rev-list --count v3.0..master
        144843

        real    0m1.971s
        user    0m1.932s
        sys     0m0.036s

        $ time git rev-list --use-bitmap-index --count v3.0..master
        real    0m0.280s
        user    0m0.220s
        sys     0m0.056s

Though note that the closer the endpoints, the less it helps. In the
traversal case, we have fewer commits to cross, so we take less time.
But the bitmap time is dominated by generating the pack revindex, which
is constant with respect to the refs given.

Note that you cannot yet get a fast --left-right count of a symmetric
difference (e.g., "--count --left-right master...topic"). The slow part
of that walk actually happens during the merge-base determination when
we parse "master...topic". Even though a count does not actually need to
know the real merge base (it only needs to take the symmetric difference
of the bitmaps), the revision code would require some refactoring to
handle this case.

Additionally, a `--test-bitmap` flag has been added that will perform
the same rev-list manually (i.e. using a normal revwalk) and using
bitmaps, and verify that the results are the same. This can be used to
exercise the bitmap code, and also to verify that the contents of the
.bitmap file are sane.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Vicent Marti
6b8fda2db1 pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).

For bitmaps to be used, the following must be true:

  1. We must be packing to stdout (as a normal `pack-objects` from
     `upload-pack` would do).

  2. There must be a .bitmap index containing at least one of the
     "have" objects that the client is asking for.

  3. Bitmaps must be enabled (they are enabled by default, but can be
     disabled by setting `pack.usebitmaps` to false, or by using
     `--no-use-bitmap-index` on the command-line).

If any of these is not true, we fall back to doing a normal walk of the
object graph.

Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:

    [existing graph traversal]
    $ time git pack-objects --all --stdout --no-use-bitmap-index \
			    </dev/null >/dev/null
    Counting objects: 3237103, done.
    Compressing objects: 100% (508752/508752), done.
    Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

    real    0m44.111s
    user    0m42.396s
    sys     0m3.544s

    [bitmaps only, without partial pack reuse; note that
     pack reuse is automatic, so timing this required a
     patch to disable it]
    $ time git pack-objects --all --stdout </dev/null >/dev/null
    Counting objects: 3237103, done.
    Compressing objects: 100% (508752/508752), done.
    Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

    real    0m5.413s
    user    0m5.604s
    sys     0m1.804s

    [bitmaps with pack reuse (what you get with this patch)]
    $ time git pack-objects --all --stdout </dev/null >/dev/null
    Reusing existing pack: 3237103, done.
    Total 3237103 (delta 0), reused 0 (delta 0)

    real    0m1.636s
    user    0m1.460s
    sys     0m0.172s

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Jeff King
ce2bc42456 pack-objects: split add_object_entry
This function actually does three things:

  1. Check whether we've already added the object to our
     packing list.

  2. Check whether the object meets our criteria for adding.

  3. Actually add the object to our packing list.

It's a little hard to see these three phases, because they
happen linearly in the rather long function. Instead, this
patch breaks them up into three separate helper functions.

The result is a little easier to follow, though it
unfortunately suffers from some optimization
interdependencies between the stages (e.g., during step 3 we
use the packing list index from step 1 and the packfile
information from step 2).

More importantly, though, the various parts can be
composed differently, as they will be in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Vicent Marti
fff42755ef pack-bitmap: add support for bitmap indexes
A bitmap index is a `.bitmap` file that can be found inside
`$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
contains precalculated reachability information for selected commits.
The full specification of the format for these bitmap indexes can be found
in `Documentation/technical/bitmap-format.txt`.

For a given commit SHA1, if it happens to be available in the bitmap
index, its bitmap will represent every single object that is reachable
from the commit itself. The nth bit in the bitmap is the nth object in
the packfile; if it's set to 1, the object is reachable.

By using the bitmaps available in the index, this commit implements
several new functions:

	- `prepare_bitmap_git`
	- `prepare_bitmap_walk`
	- `traverse_bitmap_commit_list`
	- `reuse_partial_packfile_from_bitmap`

The `prepare_bitmap_walk` function tries to build a bitmap of all the
objects that can be reached from the commit roots of a given `rev_info`
struct by using the following algorithm:

- If all the interesting commits for a revision walk are available in
the index, the resulting reachability bitmap is the bitwise OR of all
the individual bitmaps.

- When the full set of WANTs is not available in the index, we perform a
partial revision walk using the commits that don't have bitmaps as
roots, and limiting the revision walk as soon as we reach a commit that
has a corresponding bitmap. The earlier OR'ed bitmap with all the
indexed commits can now be completed as this walk progresses, so the end
result is the full reachability list.

- For revision walks with a HAVEs set (a set of commits that are deemed
uninteresting), first we perform the same method as for the WANTs, but
using our HAVEs as roots, in order to obtain a full reachability bitmap
of all the uninteresting commits. This bitmap then can be used to:

	a) limit the subsequent walk when building the WANTs bitmap
	b) finding the final set of interesting commits by performing an
	   AND-NOT of the WANTs and the HAVEs.

If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
stored and the equivalent of a `traverse_commit_list` call can be
performed by using `traverse_bitmap_commit_list`; the bitmap version
of this call yields the objects straight from the packfile index
(without having to look them up or parse them) and hence is several
orders of magnitude faster.

As an extra optimization, when `prepare_bitmap_walk` succeeds, the
`reuse_partial_packfile_from_bitmap` call can be attempted: it will find
the amount of objects at the beginning of the on-disk packfile that can
be reused as-is, and return an offset into the packfile. The source
packfile can then be loaded and the bytes up to `offset` can be written
directly to the result without having to consider the entires inside the
packfile individually.

If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
are available), the `rev_info` struct is left untouched, and can be used
to perform a manual rev-walk using `traverse_commit_list`.

Hence, this new set of functions are a generic API that allows to
perform the equivalent of

	git rev-list --objects [roots...] [^uninteresting...]

for any set of commits, even if they don't have specific bitmaps
generated for them.

In further patches, we'll use this bitmap traversal optimization to
speed up the `pack-objects` and `rev-list` commands.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Vicent Marti
0d4455a3ab documentation: add documentation for the bitmap format
This is the technical documentation for the JGit-compatible Bitmap v1
on-disk format.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Vicent Marti
e1273106f6 ewah: compressed bitmap implementation
EWAH is a word-aligned compressed variant of a bitset (i.e. a data
structure that acts as a 0-indexed boolean array for many entries).

It uses a 64-bit run-length encoding (RLE) compression scheme,
trading some compression for better processing speed.

The goal of this word-aligned implementation is not to achieve
the best compression, but rather to improve query processing time.
As it stands right now, this EWAH implementation will always be more
efficient storage-wise than its uncompressed alternative.

EWAH arrays will be used as the on-disk format to store reachability
bitmaps for all objects in a repository while keeping reasonable sizes,
in the same way that JGit does.

This EWAH implementation is a mostly straightforward port of the
original `javaewah` library that JGit currently uses. The library is
self-contained and has been embedded whole (4 files) inside the `ewah`
folder to ease redistribution.

The library is re-licensed under the GPLv2 with the permission of Daniel
Lemire, the original author. The source code for the C version can
be found on GitHub:

	https://github.com/vmg/libewok

The original Java implementation can also be found on GitHub:

	https://github.com/lemire/javaewah

[jc: stripped debug-only code per Peff's $gmane/239768]

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:17:20 -08:00
Junio C Hamano
8f29299136 merge-base --octopus: reduce the result from get_octopus_merge_bases()
Scripts that use "merge-base --octopus" could do the reducing
themselves, but most of them are expected to want to get the reduced
results without having to do any work themselves.

Tests are taken from a message by Василий Макаров
<einmalfel@gmail.com>

Signed-off-by: Junio C Hamano <gitster@pobox.com>

---

 We might want to vet the existing callers of the underlying
 get_octopus_merge_bases() and find out if _all_ of them are doing
 anything extra (like deduping) because the machinery can return
 duplicate results. And if that is the case, then we may want to
 move the dedupling down the callchain instead of having it here.
2013-12-30 11:58:54 -08:00
Junio C Hamano
e2f5df4244 merge-base: separate "--independent" codepath into its own helper
It piggybacks on an unrelated handle_octopus() function only because
there are some similarities between the way they need to preprocess
their input and output their result.  There is nothing similar in
the true logic between these two operations.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 11:37:49 -08:00
Johannes Schindelin
e228c1736f Remove the line length limit for graft files
Support for grafts predates Git's strbuf, and hence it is understandable
that there was a hard-coded line length limit of 1023 characters (which
was chosen a bit awkwardly, given that it is *exactly* one byte short of
aligning with the 41 bytes occupied by a commit name and the following
space or new-line character).

While regular commit histories hardly win comprehensibility in general
if they merge more than twenty-two branches in one go, it is not Git's
business to limit grafts in such a way.

In this particular developer's case, the use case that requires
substantially longer graft lines to be supported is the visualization of
the commits' order implied by their changes: commits are considered to
have an implicit relationship iff exchanging them in an interactive
rebase would result in merge conflicts.

Thusly implied branches tend to be very shallow in general, and the
resulting thicket of implied branches is usually very wide; It is
actually quite common that *most* of the commits in a topic branch have
not even one implied parent, so that a final merge commit has about as
many implied parents as there are commits in said branch.

[jc: squashed in tests by Jonathan]

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-27 16:46:25 -08:00
Junio C Hamano
53f3478d42 Merge git://git.bogomips.org/git-svn
* git://git.bogomips.org/git-svn:
  git-svn: workaround for a bug in svn serf backend
2013-12-27 14:58:35 -08:00
Junio C Hamano
36ec9e2173 Merge branch 'fc/remote-helper-fixes'
* fc/remote-helper-fixes:
  remote-hg: test 'shared_path' in a moved clone
  remote-hg: add tests for special filenames
  remote-hg: fix 'shared path' path
  remote-helpers: add extra safety checks
  remote-hg: avoid buggy strftime()
2013-12-27 14:58:25 -08:00
Junio C Hamano
97663a1e97 Merge branch 'js/gnome-keyring'
Style fix.

* js/gnome-keyring:
  contrib/git-credential-gnome-keyring.c: small stylistic cleanups
2013-12-27 14:58:23 -08:00
Junio C Hamano
53b2a5ff30 Merge branch 'jk/name-pack-after-byte-representation'
Two packfiles that contain the same set of objects have
traditionally been named identically, but that made repacking a
repository that is already fully packed without any cruft with a
different packing parameter cumbersome. Update the convention to
name the packfile after the bytestream representation of the data,
not after the set of objects in it.

* jk/name-pack-after-byte-representation:
  pack-objects doc: treat output filename as opaque
  pack-objects: name pack files after trailer hash
  sha1write: make buffer const-correct
2013-12-27 14:58:19 -08:00
Junio C Hamano
73b063130b Merge branch 'tg/diff-no-index-refactor'
"git diff ../else/where/A ../else/where/B" when ../else/where is
clearly outside the repository, and "git diff --no-index A B", do
not have to look at the index at all, but we used to read the index
unconditionally.

* tg/diff-no-index-refactor:
  diff: avoid some nesting
  diff: add test for --no-index executed outside repo
  diff: don't read index when --no-index is given
  diff: move no-index detection to builtin/diff.c
2013-12-27 14:58:17 -08:00
Junio C Hamano
6904f9aa5b Merge branch 'zk/difftool-counts'
Show the total number of paths and the number of paths shown so far
when "git difftool" prompts to launch an external diff tool, which
would give users some sense of progress.

* zk/difftool-counts:
  diff.c: fix some recent whitespace style violations
  difftool: display the number of files in the diff queue in the prompt
2013-12-27 14:58:13 -08:00
Junio C Hamano
604ada435b Merge branch 'jk/cat-file-regression-fix'
"git cat-file --batch=", an admittedly useless command, did not
behave very well.

* jk/cat-file-regression-fix:
  cat-file: handle --batch format with missing type/size
  cat-file: pass expand_data to print_object_or_die
2013-12-27 14:58:11 -08:00
Junio C Hamano
2b0a564e02 Merge branch 'jk/pull-rebase-using-fork-point'
* jk/pull-rebase-using-fork-point:
  rebase: use reflog to find common base with upstream
  pull: use merge-base --fork-point when appropriate
2013-12-27 14:58:08 -08:00
Junio C Hamano
e9ecee0423 Merge branch 'jk/rev-parse-double-dashes'
"git rev-parse <revs> -- <paths>" did not implement the usual
disambiguation rules the commands in the "git log" family used in
the same way.

* jk/rev-parse-double-dashes:
  rev-parse: be more careful with munging arguments
  rev-parse: correctly diagnose revision errors before "--"
2013-12-27 14:58:01 -08:00
Junio C Hamano
7cdebd8a20 Merge branch 'jc/push-refmap'
Make "git push origin master" update the same ref that would be
updated by our 'master' when "git push origin" (no refspecs) is run
while the 'master' branch is checked out, which makes "git push"
more symmetric to "git fetch" and more usable for the triangular
workflow.

* jc/push-refmap:
  push: also use "upstream" mapping when pushing a single ref
  push: use remote.$name.push as a refmap
  builtin/push.c: use strbuf instead of manual allocation
2013-12-27 14:57:50 -08:00
Roman Kagan
2394e94e83 git-svn: workaround for a bug in svn serf backend
Subversion serf backend in versions 1.8.5 and below has a bug(*) that the
function creating the descriptor of a file change -- add_file() --
doesn't make a copy of its third argument when storing it on the
returned descriptor.  As a result, by the time this field is used (in
transactions of file copying or renaming) it may well be released, and
the memory reused.

One of its possible manifestations is the svn assertion triggering on an
invalid path, with a message

svn_fspath__skip_ancestor: Assertion
`svn_fspath__is_canonical(child_fspath)' failed.

This patch works around this bug, by storing the value to be passed as
the third argument to add_file() in a local variable with the same scope
as the file change descriptor, making sure their lifetime is the same.

* [ew: fixed in Subversion r1553376 as noted by Jonathan Nieder]

Cc: Benjamin Pabst <benjamin.pabst85@gmail.com>
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Roman Kagan <rkagan@mail.ru>
2013-12-27 20:22:19 +00:00
Nguyễn Thái Ngọc Duy
8785b7654b commit.c: make "tree" a const pointer in commit_tree*()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 11:55:18 -08:00
Jeff King
65ea9c3c3d cat-file: provide %(deltabase) batch format
It can be useful for debugging or analysis to see which
objects are stored as delta bases on top of others. This
information is available by running `git verify-pack`, but
that is extremely expensive (and is harder than necessary to
parse).

Instead, let's make it available as a cat-file query format,
which makes it fast and simple to get the bases for a subset
of the objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 11:54:26 -08:00
Jeff King
5d642e7506 sha1_object_info_extended: provide delta base sha1s
A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.

However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.

For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check "oi.whence == OI_PACKED && oi.u.packed.is_delta"
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 11:53:32 -08:00
Jeff King
9af270e8c2 do not pretend sha1write returns errors
The sha1write function returns an int, but it will always be
"0". The failure-prone parts of the function happen in the
"flush" callback, which cannot pass an error back to us. So
we just end up calling die() during the flush.

Let's just drop the return value altogether, as it only
confuses callers into thinking that it might be useful.

Only one call site actually checked the return value. We can
drop that check, since it just led to a die() anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 11:50:20 -08:00
Nguyễn Thái Ngọc Duy
64ed07cee0 add: don't complain when adding empty project root
This behavior was added in 07d7bed (add: don't complain when adding
empty project root - 2009-04-28) then broken by 84b8b5d (remove
match_pathspec() in favor of match_pathspec_depth() -
2013-07-14). Reinstate it.

Noticed-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 10:46:26 -08:00
Antoine Pelisse
1f7feb7753 remote-hg: test 'shared_path' in a moved clone
Since e71d1378 (remote-hg: fix 'shared path' path, 2013-12-07),
Mercurial 'shared_path' file is correctly updated whenever a clone is
moved. Make sure it keeps working, especially as this is depending on a
private Mercurial file.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 10:43:56 -08:00
brian m. carlson
5e1361ccdb log: properly handle decorations with chained tags
git log did not correctly handle decorations when a tag object referenced
another tag object that was no longer a ref, such as when the second tag was
deleted.  The commit would not be decorated correctly because parse_object had
not been called on the second tag and therefore its tagged field had not been
filled in, resulting in none of the tags being associated with the relevant
commit.

Call parse_object to fill in this field if it is absent so that the chain of
tags can be dereferenced and the commit can be properly decorated.  Include
tests as well to prevent future regressions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-20 14:37:03 -08:00
Nguyễn Thái Ngọc Duy
82246b765b daemon: be strict at parsing parameters --[no-]informative-errors
Use strcmp() instead of starts_with()/!prefixcmp() to stop accepting
--informative-errors-just-a-little

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-20 14:05:07 -08:00
Samuel Bronson
6d8940b562 diff: add diff.orderfile configuration variable
diff.orderfile acts as a default for the -O command line option.

[sb: split up aw's original patch; rework tests and docs, treat option
as pathname]

Signed-off-by: Anders Waldenborg <anders@0x63.nu>
Signed-off-by: Samuel Bronson <naesten@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-18 16:39:00 -08:00
Samuel Bronson
a21bae33d9 diff: let "git diff -O" read orderfile from any file and fail properly
The -O flag really shouldn't silently fail to do anything when given
a path that it can't read from.

However, it should be able to read from un-mmappable files, such as:

 * pipes/fifos

 * /dev/null:  It's a character device (at least on Linux)

 * ANY empty file:

   Quoting Linux mmap(2), "SUSv3 specifies that mmap() should fail if
   length is 0.  However, in kernels before 2.6.12, mmap() succeeded in
   this case: no mapping was created and the call returned addr.  Since
   kernel 2.6.12, mmap() fails with the error EINVAL for this case."

We especially want "-O/dev/null" to work, since we will be documenting
it as the way to cancel "diff.orderfile" when we add that.

(Note: "-O/dev/null" did have the right effect, since the existing error
handling essentially worked out to "silently ignore the orderfile".  But
this was probably more coincidence than anything else.)

So, lets toss all of that logic to get the file mmapped and just use
strbuf_read_file() instead, which gives us decent error handling
practically for free.

Signed-off-by: Samuel Bronson <naesten@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-18 16:29:05 -08:00
Samuel Bronson
b527773092 t4056: add new tests for "git diff -O"
Adapted from $gmane/236427 by Anders Waldenborg, "diff: Add
diff.orderfile configuration variable".

Signed-off-by: Anders Waldenborg <anders@0x63.nu>
Signed-off-by: Samuel Bronson <naesten@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-18 16:26:12 -08:00
Jeff King
4454e9cb59 builtin/prune.c: use strbuf to avoid having to worry about PATH_MAX
While at it, rename prune_tmp_object(), which used to be a helper to
remove temporary files that were created to become loose object
files, to prune_tmp_file(), as the function is also used to remove
any random cruft whose name begins with tmp_ directly in .git/object
or .git/object/pack directories these days.

Noticed-by:  Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-18 15:53:56 -08:00
Junio C Hamano
491a8dec44 get_max_fd_limit(): fall back to OPEN_MAX upon getrlimit/sysconf failure
On broken systems where RLIMIT_NOFILE is visible by the compliers
but underlying getrlimit() system call does not behave, we used to
simply die() when we are trying to decide how many file descriptors
to allocate for keeping packfiles open.  Instead, allow the fallback
codepath to take over when we get such a failure from getrlimit().

The same issue exists with _SC_OPEN_MAX and sysconf(); restructure
the code in a similar way to prepare for a broken sysconf() as well.

Noticed-by: Joey Hess <joey@kitenet.net>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-18 14:59:43 -08:00
Roberto Tyley
615b8f1a8d docs: add filter-branch notes on The BFG
The BFG is a tool specifically designed for the task of removing
unwanted data from Git repository history - a common use-case for which
git-filter-branch has been the traditional workhorse.

It's beneficial to let users know that filter-branch has an alternative
here:

* speed : The BFG is 10-50x faster
  http://rtyley.github.io/bfg-repo-cleaner/#speed
* complexity of configuration : filter-branch is a very flexible tool,
  but demands very careful usage in order to get the desired results
  http://rtyley.github.io/bfg-repo-cleaner/#examples

Obviously, filter-branch has it's advantages too - it permits very
complex rewrites, and doesn't require a JVM - but for the common
use-case of deleting unwanted data, it's helpful to users to be aware
that an alternative exists.

The BFG was released under the GPL in February 2013, and has since seen
widespread production use (The Guardian, RedHat, Google, UK Government
Digital Service), been tested against large repos (~300K commits, ~5GB
packfiles) and received significant positive feedback from users:

http://rtyley.github.io/bfg-repo-cleaner/#feedback

Signed-off-by: Roberto Tyley <roberto.tyley@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-18 10:41:41 -08:00
Junio C Hamano
7794a680e6 Sync with 1.8.5.2
* maint:
  Git 1.8.5.2
  cmd_repack(): remove redundant local variable "nr_packs"
2013-12-17 14:12:17 -08:00
Junio C Hamano
b10cd577d8 Update draft release notes to 1.9
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-17 14:05:50 -08:00
Junio C Hamano
173473c287 Merge branch 'kn/gitweb-extra-branch-refs'
Allow gitweb to be configured to show refs out of refs/heads/ as if
they were branches.

* kn/gitweb-extra-branch-refs:
  gitweb: Denote non-heads, non-remotes branches
  gitweb: Add a feature for adding more branch refs
  gitweb: Return 1 on validation success instead of passed input
  gitweb: Move check-ref-format code into separate function
2013-12-17 12:03:33 -08:00
Junio C Hamano
1945e8ac85 Merge branch 'tb/clone-ssh-with-colon-for-port'
Be more careful when parsing remote repository URL given in the
scp-style host:path notation.

* tb/clone-ssh-with-colon-for-port:
  git_connect(): use common return point
  connect.c: refactor url parsing
  git_connect(): refactor the port handling for ssh
  git fetch: support host:/~repo
  t5500: add test cases for diag-url
  git fetch-pack: add --diag-url
  git_connect: factor out discovery of the protocol and its parts
  git_connect: remove artificial limit of a remote command
  t5601: add tests for ssh
  t5601: remove clear_ssh, refactor setup_ssh_wrapper
2013-12-17 12:03:32 -08:00
Junio C Hamano
88cb2f96ac Merge branch 'nd/transport-positive-depth-only'
"git fetch --depth=0" was a no-op, and was silently
ignored. Diagnose it as an error.

* nd/transport-positive-depth-only:
  clone,fetch: catch non positive --depth option value
2013-12-17 12:03:29 -08:00
Junio C Hamano
ad70448576 Merge branch 'cc/starts-n-ends-with'
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.

* cc/starts-n-ends-with:
  replace {pre,suf}fixcmp() with {starts,ends}_with()
  strbuf: introduce starts_with() and ends_with()
  builtin/remote: remove postfixcmp() and use suffixcmp() instead
  environment: normalize use of prefixcmp() by removing " != 0"
2013-12-17 12:02:44 -08:00
Junio C Hamano
14a9c5f261 Merge branch 'jl/commit-v-strip-marker'
"git commit -v" appends the patch to the log message before
editing, and then removes the patch when the editor returned
control. However, the patch was not stripped correctly when the
first modified path was a submodule.

* jl/commit-v-strip-marker:
  commit -v: strip diffs and submodule shortlogs from the commit message
2013-12-17 11:47:18 -08:00
Junio C Hamano
433a30d0ba Merge branch 'tr/send-email-ssl'
SSL-related options were not passed correctly to underlying socket
layer in "git send-email".

* tr/send-email-ssl:
  send-email: set SSL options through IO::Socket::SSL::set_client_defaults
  send-email: --smtp-ssl-cert-path takes an argument
  send-email: pass Debug to Net::SMTP::SSL::new
2013-12-17 11:47:12 -08:00
Junio C Hamano
7dc8a65c86 Merge branch 'nd/gettext-vsnprintf'
* nd/gettext-vsnprintf:
  gettext.c: detect the vsnprintf bug at runtime
2013-12-17 11:47:10 -08:00