1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-08 02:03:12 +01:00
Find a file
Jeff King 6859de45a9 fetch: avoid quadratic loop checking for updated submodules
Recent versions of git can be slow to fetch repositories with a
large number of refs (or when they already have a large
number of refs). For example, GitHub makes pull-requests
available as refs, which can lead to a large number of
available refs. This slowness goes away when submodule
recursion is turned off:

  $ git ls-remote git://github.com/rails/rails.git | wc -l
  3034

  [this takes ~10 seconds of CPU time to complete]
  git fetch --recurse-submodules=no \
    git://github.com/rails/rails.git "refs/*:refs/*"

  [this still isn't done after 10 _minutes_ of pegging the CPU]
  git fetch \
    git://github.com/rails/rails.git "refs/*:refs/*"

You can produce a quicker and simpler test case like this:

  doit() {
    head=`git rev-parse HEAD`
    for i in `seq 1 $1`; do
      echo $head refs/heads/ref$i
    done >.git/packed-refs
    echo "==> $1"
    rm -rf dest
    git init -q --bare dest &&
      (cd dest && time git.compile fetch -q .. refs/*:refs/*)
  }

  rm -rf repo
  git init -q repo && cd repo &&
  >file && git add file && git commit -q -m one

  doit 100
  doit 200
  doit 400
  doit 800
  doit 1600
  doit 3200

Which yields timings like:

  # refs  seconds of CPU
     100            0.06
     200            0.24
     400            0.95
     800            3.39
    1600           13.66
    3200           54.09

Notice that although the number of refs doubles in each
trial, the CPU time spent quadruples.

The problem is that the submodule recursion code works
something like:

  - for each ref we fetch
    - for each commit in git rev-list $new_sha1 --not --all
      - add modified submodules to list
  - fetch any newly referenced submodules

But that means if we fetch N refs, we start N revision
walks. Worse, because we use "--all", the number of refs we
must process that constitute "--all" keeps growing, too. And
you end up doing O(N^2) ref resolutions.

Instead, this patch structures the code like this:

  - for each sha1 we already have
    - add $old_sha1 to list $old
  - for each ref we fetch
    - add $new_sha1 to list $new
  - for each commit in git rev-list $new --not $old
    - add modified submodules to list
  - fetch any newly referenced submodules

This yields timings like:

  # refs  seconds of CPU
  100               0.00
  200               0.04
  400               0.04
  800               0.10
  1600              0.21
  3200              0.39

Note that the amount of effort doubles as the number of refs
doubles. Similarly, the fetch of rails.git takes about as
much time as it does with --recurse-submodules=no.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-12 14:16:41 -07:00
block-sha1
builtin plug a few coverity-spotted leaks 2011-06-20 14:27:36 -07:00
compat compat/fnmatch/fnmatch.c: give a fall-back definition for NULL 2011-05-26 09:25:47 -07:00
contrib Merge branch 'maint' 2011-06-26 12:09:11 -07:00
Documentation Git 1.7.6 2011-06-26 12:41:16 -07:00
git-gui Merge git-gui 0.14.0 2011-03-26 10:42:35 -07:00
git_remote_helpers
gitk-git Merge git://git.kernel.org/pub/scm/gitk/gitk 2011-04-11 09:33:06 -07:00
gitweb Merge branch 'maint' 2011-06-21 14:56:59 -07:00
perl perl: command_bidi_pipe() method should set-up git environmens 2011-02-14 15:28:13 -08:00
po i18n: Makefile: "pot" target to extract messages marked for translation 2011-03-09 23:52:52 -08:00
ppc
t Merge branch 'mk/grep-pcre' 2011-06-20 14:49:44 -07:00
templates
vcs-svn Merge branch 'rj/sparse' 2011-04-27 11:36:42 -07:00
xdiff
.gitattributes
.gitignore Merge branch 'jn/gitweb-js' 2011-05-26 10:31:57 -07:00
.mailmap
abspath.c Name make_*_path functions more accurately 2011-03-17 16:08:30 -07:00
aclocal.m4 configure: use AC_LANG_PROGRAM consistently 2011-02-14 10:55:15 -08:00
advice.c
advice.h
alias.c
alloc.c unbreak and eliminate NO_C99_FORMAT 2011-03-17 15:30:49 -07:00
archive-tar.c
archive-zip.c
archive.c Convert read_tree{,_recursive} to support struct pathspec 2011-03-25 09:20:33 -07:00
archive.h
attr.c sparse: Fix some "symbol not declared" warnings 2011-04-22 10:04:27 -07:00
attr.h
base85.c
bisect.c bisect: refactor sha1_array into a generic sha1 list 2011-05-19 20:02:10 -07:00
bisect.h
blob.c
blob.h
branch.c Merge branch 'jh/maint-do-not-track-non-branches' 2011-03-15 14:22:13 -07:00
branch.h
builtin.h repo-config: add deprecation warning 2011-02-13 15:13:41 -08:00
bundle.c bundle: Use OFS_DELTA in bundle files 2011-02-06 22:50:26 -08:00
bundle.h
cache-tree.c
cache-tree.h
cache.h Merge branch 'jk/maint-config-alias-fix' into maint 2011-06-01 14:05:22 -07:00
check-builtins.sh
check-racy.c
check_bindir
color.c Share color list between graph and show-branch 2011-04-04 23:20:39 -07:00
color.h Share color list between graph and show-branch 2011-04-04 23:20:39 -07:00
combine-diff.c
command-list.txt
commit.c Add const to parse_{commit,tag}_buffer() 2011-02-07 15:04:42 -08:00
commit.h Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
config.c Merge branch 'jk/maint-config-alias-fix' into maint 2011-06-01 14:05:22 -07:00
config.mak.in Merge branch 'kk/maint-prefix-in-config-mak' into maint 2011-06-01 14:02:39 -07:00
configure.ac configure: Check for libpcre 2011-05-09 16:29:46 -07:00
connect.c Merge branch 'jk/git-connection-deadlock-fix' into maint-1.7.4 2011-05-26 10:28:10 -07:00
convert.c convert: make it harder to screw up adding a conversion attribute 2011-05-09 14:59:09 -07:00
copy.c
COPYING
csum-file.c sparse: Fix errors and silence warnings 2011-04-03 10:14:53 -07:00
csum-file.h
ctype.c magic pathspec: futureproof shorthand form 2011-04-08 16:19:48 -07:00
daemon.c Fix sparse warnings 2011-03-22 10:16:54 -07:00
date.c date: avoid "X years, 12 months" in relative dates 2011-04-20 19:23:16 -07:00
decorate.c
decorate.h
delta.h
diff-delta.c
diff-lib.c Merge branch 'jk/diff-not-so-quick' 2011-06-06 11:40:14 -07:00
diff-no-index.c Convert struct diff_options to use struct pathspec 2011-02-03 12:28:15 -08:00
diff.c Merge branch 'jk/diff-not-so-quick' 2011-06-06 11:40:14 -07:00
diff.h Merge branch 'jk/diff-not-so-quick' 2011-06-06 11:40:14 -07:00
diffcore-break.c
diffcore-delta.c
diffcore-order.c
diffcore-pickaxe.c
diffcore-rename.c diffcore-rename.c: avoid set-but-not-used warning 2011-06-01 13:54:17 -07:00
diffcore.h
dir.c Merge branch 'nd/struct-pathspec' 2011-05-06 10:50:06 -07:00
dir.h Merge branch 'nd/maint-setup' 2011-05-02 15:58:30 -07:00
editor.c
entry.c
environment.c Merge branch 'jc/replacing' 2011-05-19 20:37:21 -07:00
exec_cmd.c Name make_*_path functions more accurately 2011-03-17 16:08:30 -07:00
exec_cmd.h
fast-import.c fast-import: fix option parser for no-arg options 2011-05-05 21:21:24 -07:00
fetch-pack.h standardize brace placement in struct definitions 2011-03-16 12:49:02 -07:00
fixup-builtins
fsck.c Merge branch 'jm/maint-misc-fix' into maint 2011-05-30 00:09:41 -07:00
fsck.h
generate-cmdlist.sh standardize brace placement in struct definitions 2011-03-16 12:49:02 -07:00
gettext.c i18n: do not poison translations unless GIT_GETTEXT_POISON envvar is set 2011-03-08 12:10:03 -08:00
gettext.h i18n: avoid parenthesized string as array initializer 2011-04-11 10:33:51 -07:00
git-add--interactive.perl add -i: ignore terminal escape sequences 2011-05-17 20:44:17 -07:00
git-am.sh
git-archimport.perl
git-bisect.sh bisect: visualize with git-log if gitk is unavailable 2011-03-21 10:23:45 -07:00
git-compat-util.h Merge branch 'jc/magic-pathspec' 2011-05-23 09:58:35 -07:00
git-cvsexportcommit.perl
git-cvsimport.perl Merge branch 'gr/cvsimport-alternative-cvspass-location' into maint 2011-05-13 10:44:54 -07:00
git-cvsserver.perl
git-difftool--helper.sh
git-difftool.perl
git-filter-branch.sh
git-instaweb.sh git-instaweb: Change how gitweb.psgi is made runnable as standalone app 2011-02-27 22:02:31 -08:00
git-lost-found.sh
git-merge-octopus.sh
git-merge-one-file.sh Merge branch 'jk/merge-one-file-working-tree' into maint 2011-05-13 10:44:19 -07:00
git-merge-resolve.sh
git-mergetool--lib.sh Pass empty file to p4merge where no base is suitable. 2011-05-01 15:56:05 -07:00
git-mergetool.sh mergetool: Teach about submodules 2011-04-13 12:21:45 -07:00
git-parse-remote.sh Merge branch 'mz/rebase' 2011-04-28 14:11:39 -07:00
git-pull.sh Merge branch 'mz/rebase' 2011-04-28 14:11:39 -07:00
git-quiltimport.sh
git-rebase--am.sh git-rebase--am: remove unnecessary --3way option 2011-02-10 14:08:10 -08:00
git-rebase--interactive.sh rebase: write a reflog entry when finishing 2011-05-27 15:52:03 -07:00
git-rebase--merge.sh rebase -m: don't print exit code 2 when merge fails 2011-02-10 14:08:09 -08:00
git-rebase.sh rebase: write a reflog entry when finishing 2011-05-27 15:52:03 -07:00
git-relink.perl
git-remote-testgit.py
git-repack.sh
git-request-pull.sh git-request-pull: open-code the only invocation of get_remote_url 2011-03-02 12:26:58 -08:00
git-send-email.perl git-send-email: fix missing space in error message 2011-04-29 11:34:32 -07:00
git-sh-i18n.sh git-sh-i18n.sh: add GIT_GETTEXT_POISON support 2011-05-14 20:29:11 -07:00
git-sh-setup.sh require-work-tree wants more than what its name says 2011-05-24 11:34:40 -07:00
git-stash.sh Merge branch 'jk/maint-stash-oob' into maint 2011-05-04 14:58:42 -07:00
git-submodule.sh Merge branch 'maint' 2011-05-30 00:09:55 -07:00
git-svn.perl Merge branch 'maint' 2011-05-20 18:50:29 -07:00
GIT-VERSION-GEN Git 1.7.6 2011-06-26 12:41:16 -07:00
git-web--browse.sh
git.c Merge branch 'jk/maint-config-alias-fix' into maint 2011-06-01 14:05:22 -07:00
git.spec.in
graph.c Share color list between graph and show-branch 2011-04-04 23:20:39 -07:00
graph.h
grep.c git-grep: Learn PCRE 2011-05-09 16:29:33 -07:00
grep.h git-grep: Learn PCRE 2011-05-09 16:29:33 -07:00
hash.c for_each_hash: allow passing a 'void *data' pointer to callback 2011-02-18 22:25:51 -08:00
hash.h for_each_hash: allow passing a 'void *data' pointer to callback 2011-02-18 22:25:51 -08:00
help.c
help.h
hex.c
http-backend.c
http-fetch.c Fix two unused variable warnings in gcc 4.6 2011-04-03 10:59:40 -07:00
http-push.c http-push: refactor curl_easy_setup madness 2011-05-04 13:30:28 -07:00
http-walker.c http: make curl callbacks match contracts from curl header 2011-05-04 13:30:28 -07:00
http.c Merge branch 'sp/maint-clear-postfields' into maint 2011-05-04 14:58:56 -07:00
http.h http: make curl callbacks match contracts from curl header 2011-05-04 13:30:28 -07:00
ident.c Merge branch 'rg/no-gecos-in-pwent' 2011-05-26 10:32:19 -07:00
imap-send.c sparse: Fix some "Using plain integer as NULL pointer" warnings 2011-04-11 10:35:25 -07:00
INSTALL
levenshtein.c
levenshtein.h
LGPL-2.1 provide a copy of the LGPLv2.1 2011-05-19 18:23:17 -07:00
list-objects.c Merge branch 'nd/struct-pathspec' 2011-05-06 10:50:06 -07:00
list-objects.h
ll-merge.c ll-merge: simplify opts == NULL case 2011-01-15 20:34:14 -08:00
ll-merge.h
lockfile.c Name make_*_path functions more accurately 2011-03-17 16:08:30 -07:00
log-tree.c Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
log-tree.h
mailmap.c
mailmap.h
Makefile Merge branch 'mk/grep-pcre' 2011-05-30 00:00:07 -07:00
match-trees.c
merge-file.c sparse: Fix an "symbol 'merge_file' not decared" warning 2011-04-11 10:35:25 -07:00
merge-file.h sparse: Fix an "symbol 'merge_file' not decared" warning 2011-04-11 10:35:25 -07:00
merge-recursive.c Merge branch 'jc/rename-degrade-cc-to-c' into maint 2011-05-31 12:00:02 -07:00
merge-recursive.h Merge branch 'jk/merge-rename-ux' 2011-03-19 23:23:56 -07:00
name-hash.c
notes-cache.c
notes-cache.h
notes-merge.c index_fd(): turn write_object and format_check arguments into one flag 2011-05-09 11:58:19 -07:00
notes-merge.h
notes.c notes: refactor display notes default handling 2011-03-29 14:31:59 -07:00
notes.h notes: refactor display notes default handling 2011-03-29 14:31:59 -07:00
object.c read_sha1_file(): get rid of read_sha1_file_repl() madness 2011-05-15 15:23:33 -07:00
object.h object.h: Remove obsolete struct object_refs 2011-03-14 10:49:28 -07:00
pack-check.c sparse: Fix errors and silence warnings 2011-04-03 10:14:53 -07:00
pack-refs.c
pack-refs.h
pack-revindex.c
pack-revindex.h
pack-write.c
pack.h
pager.c
parse-options.c Fix sparse warnings 2011-03-22 10:16:54 -07:00
parse-options.h Make <identifier> lowercase as per CodingGuidelines 2011-02-15 11:53:09 -08:00
patch-delta.c compat: helper for detecting unsigned overflow 2011-02-10 13:47:56 -08:00
patch-ids.c
patch-ids.h
path.c Name make_*_path functions more accurately 2011-03-17 16:08:30 -07:00
pkt-line.c sparse: Fix errors and silence warnings 2011-04-03 10:14:53 -07:00
pkt-line.h
preload-index.c Convert ce_path_match() to use struct pathspec 2011-02-03 14:08:30 -08:00
pretty.c Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
progress.c
progress.h
quote.c
quote.h quote.h: simplify the inclusion 2011-02-07 15:15:17 -08:00
reachable.c Remove unused variables 2011-03-22 11:43:27 -07:00
reachable.h
read-cache.c index_fd(): turn write_object and format_check arguments into one flag 2011-05-09 11:58:19 -07:00
README
reflog-walk.c
reflog-walk.h
refs.c Fix typo: existant->existent 2011-06-16 10:33:50 -07:00
refs.h
RelNotes Start 1.7.5.4 draft release notes 2011-05-31 12:06:40 -07:00
remote-curl.c plug a few coverity-spotted leaks 2011-06-20 14:27:36 -07:00
remote.c
remote.h
replace_object.c inline lookup_replace_object() calls 2011-05-15 15:23:33 -07:00
rerere.c Merge branch 'maint' 2011-05-30 00:09:55 -07:00
rerere.h rerere: libify rerere_clear() and rerere_gc() 2011-05-08 12:55:34 -07:00
resolve-undo.c
resolve-undo.h
revision.c Merge branch 'jc/notes-batch-removal' 2011-05-29 23:51:26 -07:00
revision.h Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
run-command.c run-command: handle short writes and EINTR in die_child 2011-04-20 10:09:26 -07:00
run-command.h
send-pack.h
server-info.c
setup.c Merge branch 'maint' 2011-05-30 00:09:55 -07:00
sh-i18n--envsubst.c Merge branch 'ab/i18n-scripts-basic' 2011-06-17 11:40:32 -07:00
sha1-array.c receive-pack: eliminate duplicate .have refs 2011-05-19 20:02:31 -07:00
sha1-array.h receive-pack: eliminate duplicate .have refs 2011-05-19 20:02:31 -07:00
sha1-lookup.c
sha1-lookup.h
sha1_file.c Merge branch 'jc/bigfile' 2011-05-25 16:23:26 -07:00
sha1_name.c Merge branch 'jc/magic-pathspec' 2011-05-23 09:58:35 -07:00
shallow.c
shell.c shell: add missing initialization of argv0_path 2011-05-05 09:32:28 -07:00
shortlog.h
show-index.c
sideband.c
sideband.h
sigchain.c
sigchain.h
strbuf.c Merge branch 'ef/maint-strbuf-init' 2011-04-27 11:36:43 -07:00
strbuf.h strbuf: clarify assertion in strbuf_setlen() 2011-04-27 10:52:15 -07:00
string-list.c string_list_append: always set util pointer to NULL 2011-02-14 10:55:03 -08:00
string-list.h standardize brace placement in struct definitions 2011-03-16 12:49:02 -07:00
submodule.c fetch: avoid quadratic loop checking for updated submodules 2011-09-12 14:16:41 -07:00
submodule.h fetch/pull: Add the 'on-demand' value to the --recurse-submodules option 2011-03-09 13:10:35 -08:00
symlinks.c do not overwrite untracked symlinks 2011-02-21 22:51:07 -08:00
tag.c parse_tag_buffer(): do not prefixcmp() out of range 2011-02-16 10:05:14 -08:00
tag.h Add const to parse_{commit,tag}_buffer() 2011-02-07 15:04:42 -08:00
tar.h
test-chmtime.c
test-ctype.c
test-date.c
test-delta.c
test-dump-cache-tree.c
test-genrandom.c
test-index-version.c
test-line-buffer.c vcs-svn: remove buffer_read_string 2011-03-26 00:17:35 -05:00
test-match-trees.c
test-mktemp.c
test-obj-pool.c
test-parse-options.c Make <identifier> lowercase as per CodingGuidelines 2011-02-15 11:53:11 -08:00
test-path-utils.c Name make_*_path functions more accurately 2011-03-17 16:08:30 -07:00
test-run-command.c tests: check error message from run_command 2011-04-20 10:08:54 -07:00
test-sha1.c
test-sha1.sh
test-sigchain.c
test-string-pool.c
test-subprocess.c Remove unused variables 2011-03-22 11:43:27 -07:00
test-svn-fe.c
test-treap.c
thread-utils.c Fix sparse warnings 2011-03-22 10:16:54 -07:00
thread-utils.h
trace.c Fix sparse warnings 2011-03-22 10:16:54 -07:00
transport-helper.c Remove unused variables 2011-03-22 11:43:27 -07:00
transport.c Merge branch 'maint' 2011-05-30 00:09:55 -07:00
transport.h refactor refs_from_alternate_cb to allow passing extra data 2011-05-19 20:01:10 -07:00
tree-diff.c Merge branch 'jk/diff-not-so-quick' 2011-06-06 11:40:14 -07:00
tree-walk.c pathspec: rename per-item field has_wildcard to use_wildcard 2011-04-05 09:30:36 -07:00
tree-walk.h grep: drop pathspec_matches() in favor of tree_entry_interesting() 2011-02-03 14:08:31 -08:00
tree.c Convert read_tree{,_recursive} to support struct pathspec 2011-03-25 09:20:33 -07:00
tree.h Convert read_tree{,_recursive} to support struct pathspec 2011-03-25 09:20:33 -07:00
unimplemented.sh
unpack-trees.c unpack-trees: add the dry_run flag to unpack_trees_options 2011-05-25 14:32:02 -07:00
unpack-trees.h unpack-trees: add the dry_run flag to unpack_trees_options 2011-05-25 14:32:02 -07:00
upload-pack.c Merge branch 'jk/maint-upload-pack-shallow' into maint 2011-05-04 14:58:13 -07:00
url.c Fix sparse warnings 2011-03-22 10:16:54 -07:00
url.h
usage.c Fix sparse warnings 2011-03-22 10:16:54 -07:00
userdiff.c userdiff/perl: tighten BEGIN/END block pattern to reject here-doc delimiters 2011-05-23 11:39:13 -07:00
userdiff.h
utf8.c strbuf: add fixed-length version of add_wrapped_text 2011-02-23 13:44:36 -08:00
utf8.h strbuf: add fixed-length version of add_wrapped_text 2011-02-23 13:44:36 -08:00
walker.c
walker.h
wrap-for-bin.sh
wrapper.c read_in_full: always report errors 2011-05-26 13:54:18 -07:00
write_or_die.c
ws.c
wt-status.c Merge branch 'ab/i18n-st' 2011-04-01 17:55:55 -07:00
wt-status.h Merge branch 'jn/status-translatable' 2011-03-19 23:24:19 -07:00
xdiff-interface.c add, merge, diff: do not use strcasecmp to compare config variable names 2011-05-14 18:53:39 -07:00
xdiff-interface.h
zlib.c

////////////////////////////////////////////////////////////////

	GIT - the stupid content tracker

////////////////////////////////////////////////////////////////

"git" can mean anything, depending on your mood.

 - random three-letter combination that is pronounceable, and not
   actually used by any common UNIX command.  The fact that it is a
   mispronunciation of "get" may or may not be relevant.
 - stupid. contemptible and despicable. simple. Take your pick from the
   dictionary of slang.
 - "global information tracker": you're in a good mood, and it actually
   works for you. Angels sing, and a light suddenly fills the room.
 - "goddamn idiotic truckload of sh*t": when it breaks

Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.

Git is an Open Source project covered by the GNU General Public License.
It was originally written by Linus Torvalds with help of a group of
hackers around the net. It is currently maintained by Junio C Hamano.

Please read the file INSTALL for installation instructions.

See Documentation/gittutorial.txt to get started, then see
Documentation/everyday.txt for a useful minimum set of commands, and
Documentation/git-commandname.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with "man gittutorial" or "git help tutorial", and the
documentation of each command with "man git-commandname" or "git help
commandname".

CVS users may also want to read Documentation/gitcvs-migration.txt
("man gitcvs-migration" or "git help cvs-migration" if git is
installed).

Many Git online resources are accessible from http://git-scm.com/
including full documentation and Git related tools.

The user discussion and development of Git take place on the Git
mailing list -- everyone is welcome to post bug reports, feature
requests, comments and patches to git@vger.kernel.org. To subscribe
to the list, send an email with just "subscribe git" in the body to
majordomo@vger.kernel.org. The mailing list archives are available at
http://marc.theaimsgroup.com/?l=git and other archival sites.

The messages titled "A note from the maintainer", "What's in
git.git (stable)" and "What's cooking in git.git (topics)" and
the discussion following them on the mailing list give a good
reference for project status, development direction and
remaining tasks.