1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-14 21:23:03 +01:00
Commit graph

58 commits

Author SHA1 Message Date
Linus Torvalds
4c068a9831 tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".

It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.

This allows tree traversal with

	struct tree_desc desc;
	struct name_entry entry;

	desc.buf = tree->buffer;
	desc.size = tree->size;
	while (tree_entry(&desc, &entry) {
		... use "entry.{path, sha1, mode, pathlen}" ...
	}

which is not only shorter than writing it out in full, it's hopefully less
error prone too.

[ It's actually a tad faster too - we don't need to recalculate the entry
  pathlength in both extract and update, but need to do it only once.
  Also, some callers can avoid doing a "strlen()" on the result, since
  it's returned as part of the name_entry structure.

  However, by now we're talking just 1% speedup on "git-rev-list --objects
  --all", and we're definitely at the point where tree walking is no
  longer the issue any more. ]

NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.

We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 23:03:01 -07:00
Linus Torvalds
2d9c58c69d Remove "tree->entries" tree-entry list from tree parser
Instead, just use the tree buffer directly, and use the tree-walk
infrastructure to walk the buffers instead of the tree-entry list.

The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is
generally no harder to use than following a linked list, and allows
us to do most tree parsing in-place.

Some programs still use the old tree-entry lists, and are a bit
painful to convert without major surgery. For them we have a helper
function that creates a temporary tree-entry list on demand.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:06:59 -07:00
Linus Torvalds
3a7c352bd0 Make "tree_entry" have a SHA1 instead of a union of object pointers
This is preparatory work for further cleanups, where we try to make
tree_entry look more like the more efficient tree-walk descriptor.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:05:06 -07:00
Linus Torvalds
136f2e548a Make "struct tree" contain the pointer to the tree buffer
This allows us to avoid allocating information for names etc, because
we can just use the information from the tree buffer directly.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:05:02 -07:00
Linus Torvalds
91b452cba9 Fix memory leak in "git rev-list --objects"
Martin Langhoff points out that "git repack -a" ends up using up a lot of
memory for big archives, and that git cvsimport probably should do only
incremental repacks in order to avoid having repacking flush all the
caches.

The big majority of the memory usage of repacking is from git rev-list
tracking all objects, and this patch should go a long way in avoiding the
excessive memory usage: the bulk of it was due to the object names being
leaked from the tree parser.

For the historic Linux kernel archive, this simple patch does:

Before:
	/usr/bin/time git-rev-list --all --objects > /dev/null

	72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+125376minor)pagefaults 0swaps

After:
	/usr/bin/time git-rev-list --all --objects > /dev/null

	75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+43921minor)pagefaults 0swaps

where we do end up wasting a bit of time on some extra strdup()s (which
could be avoided, but that would require tracking where the pathnames came
from), but we avoid a lot of memory usage.

Minor page faults track maximum RSS very closely (each page fault maps in
one page into memory), so the reduction from 125376 page faults to 43921
means a rough reduction of VM footprint from almost half a gigabyte to
about a third of that. Those numbers were also double-checked by looking
at "top" while the process was running.

(Side note: at least part of the remaining VM footprint is the mapping of
the 177MB pack-file, so the remaining memory use is at least partly "well
behaved" from a project caching perspective).

For the current git archive itself, the memory usage for a "--all
--objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to
9MB), so the reduction seems to hold for much smaller projects too.

For regular "git-rev-list" usage (ie without the "--objects" flag) this
patch has no impact.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-28 13:27:51 -07:00
Johannes Schindelin
698ce6f87e fmt-patch: Support --attach
This patch touches a couple of files, because it adds options to print a
custom text just after the subject of a commit, and just after the
diffstat.

[jc: made "many dashes" used as the boundary leader into a single
 variable, to reduce the possibility of later tweaks to miscount the
 number of dashes to break it.]

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-21 02:03:09 -07:00
Junio C Hamano
328b710d80 Merge branch 'master' into js/fmt-patch
* master: (119 commits)
  diff family: add --check option
  Document that "git add" only adds non-ignored files.
  Add a conversion tool to migrate remote information into the config
  fetch, pull: ask config for remote information
  Fix build procedure for builtin-init-db
  read-tree -m -u: do not overwrite or remove untracked working tree files.
  apply --cached: do not check newly added file in the working tree
  Implement a --dry-run option to git-quiltimport
  Implement git-quiltimport
  Revert "builtin-grep: workaround for non GNU grep."
  builtin-grep: workaround for non GNU grep.
  builtin-grep: workaround for non GNU grep.
  git-am: use apply --cached
  apply --cached: apply a patch without using working tree.
  apply --numstat: show new name, not old name.
  Documentation/Makefile: create tarballs for the man pages and html files
  Allow pickaxe and diff-filter options to be used by git log.
  Libify the index refresh logic
  Builtin git-init-db
  Remove unnecessary local in get_ref_sha1.
  ...
2006-05-21 01:34:54 -07:00
Linus Torvalds
5fb61b8dcf Make "git rev-list" be a builtin
This was surprisingly easy. The diff is truly minimal: rename "main()" to
"cmd_rev_list()" in rev-list.c, and rename the whole file to reflect its
new built-in status.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-18 15:46:11 -07:00
Renamed from rev-list.c (Browse further)