1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-01 06:47:52 +01:00
git/builtin
Nguyễn Thái Ngọc Duy c6458e60ed index-pack: kill union delta_base to save memory
Once we know the number of objects in the input pack, we allocate an
array of nr_objects of struct delta_entry. On x86-64, this struct is
32 bytes long. The union delta_base, which is part of struct
delta_entry, provides enough space to store either ofs-delta (8 bytes)
or ref-delta (20 bytes).

Because ofs-delta encoding is more efficient space-wise and more
performant at runtime than ref-delta encoding, Git packers try to use
ofs-delta whenever possible, and it is expected that objects encoded
as ref-delta are minority.

In the best clone case where no ref-delta object is present, we waste
(20-8) * nr_objects bytes because of this union. That's about 38MB out
of 100MB for deltas[] with 3.4M objects, or 38%. deltas[] would be
around 62MB without the waste.

This patch attempts to eliminate that. deltas[] array is split into
two: one for ofs-delta and one for ref-delta. Many functions are also
duplicated because of this split. With this patch, ofs_deltas[] array
takes 51MB. ref_deltas[] should remain unallocated in clone case (0
bytes). This array grows as we see ref-delta. We save about half in
this case, or 25% of total bookkeeping.

The saving is more than the calculation above because some padding in
the old delta_entry struct is removed. ofs_delta_entry is 16 bytes,
including the 4 bytes padding. That's 13MB for padding, but packing
the struct could break platforms that do not support unaligned
access. If someone on 32-bit is really low on memory and only deals
with packs smaller than 2G, using 32-bit off_t would eliminate the
padding and save 27MB on top.

A note about ofs_deltas allocation. We could use ref_deltas memory
allocation strategy for ofs_deltas. But that probably just adds more
overhead on top. ofs-deltas are generally the majority (1/2 to 2/3) in
any pack. Incremental realloc may lead to too many memcpy. And if we
preallocate, say 1/2 or 2/3 of nr_objects initially, the growth rate
of ALLOC_GROW() could make this array larger than nr_objects, wasting
more memory.

Brought-up-by: Matthew Sporleder <msporleder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-18 17:48:32 -07:00
..
add.c add: ignore only ignored files 2014-11-21 10:19:14 -08:00
annotate.c annotate: use argv_array 2014-07-16 11:10:11 -07:00
apply.c Merge branch 'mh/simplify-repack-without-refs' 2014-12-22 12:26:50 -08:00
archive.c
bisect--helper.c
blame.c refs.c: change resolve_ref_unsafe reading argument to be a flags field 2014-10-15 10:47:24 -07:00
branch.c Merge branch 'mg/branch-d-m-f' 2014-12-22 12:27:36 -08:00
bundle.c
cat-file.c Merge branch 'ak/cat-file-clean-up' 2015-01-22 13:46:38 -08:00
check-attr.c Merge branch 'jc/check-attr-honor-working-tree' into maint 2014-03-18 14:03:03 -07:00
check-ignore.c
check-mailmap.c
check-ref-format.c
checkout-index.c Merge branch 'es/checkout-index-temp' 2015-01-12 11:38:28 -08:00
checkout.c Merge branch 'nd/ls-tree-pathspec' 2014-12-22 12:27:12 -08:00
clean.c clean: typofix 2014-12-22 09:57:42 -08:00
clone.c Merge branch 'jc/clone-borrow' 2015-01-07 12:42:13 -08:00
column.c
commit-tree.c commit-tree: simplify parsing of option -S using skip_prefix() 2014-12-29 09:32:45 -08:00
commit.c Merge branch 'jk/commit-date-approxidate' 2014-12-22 12:28:14 -08:00
config.c Merge branch 'jk/colors-fix' into maint 2014-12-22 12:16:58 -08:00
count-objects.c count-objects: use for_each_loose_file_in_objdir 2014-10-16 10:10:41 -07:00
credential.c
describe.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
diff-files.c
diff-index.c
diff-tree.c diff-tree: avoid lookup_unknown_object 2014-07-28 10:14:34 -07:00
diff.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
fast-export.c teach fast-export an --anonymize option 2014-08-27 10:42:16 -07:00
fetch-pack.c Merge branch 'nd/shallow-clone' 2014-01-17 12:21:20 -08:00
fetch.c Merge branch 'jk/fetch-reflog-df-conflict' 2014-11-06 10:52:32 -08:00
fmt-merge-msg.c Merge branch 'rs/use-strbuf-complete-line' 2014-12-22 12:28:22 -08:00
for-each-ref.c Merge branch 'rc/for-each-ref-tracking' 2015-01-14 12:39:02 -08:00
fsck.c refs.c: change resolve_ref_unsafe reading argument to be a flags field 2014-10-15 10:47:24 -07:00
gc.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
get-tar-commit-id.c use skip_prefix() to avoid more magic numbers 2014-10-07 11:09:16 -07:00
grep.c make add_object_array_with_context interface more sane 2014-10-16 10:10:44 -07:00
hash-object.c hash-object: add --literally option 2014-09-11 14:23:51 -07:00
help.c Merge branch 'jc/exec-cmd-system-path-leak-fix' 2014-12-22 12:27:01 -08:00
index-pack.c index-pack: kill union delta_base to save memory 2015-04-18 17:48:32 -07:00
init-db.c Merge branch 'jc/exec-cmd-system-path-leak-fix' 2014-12-22 12:27:01 -08:00
interpret-trailers.c trailer: add interpret-trailers command 2014-10-13 13:55:27 -07:00
log.c Merge branch 'km/log-usage-string-i18n' 2015-01-14 12:32:39 -08:00
ls-files.c grammofix in user-facing messages 2014-09-02 12:00:30 -07:00
ls-remote.c builtin/ls-remote.c: rearrange xcalloc arguments 2014-05-27 14:00:43 -07:00
ls-tree.c ls-tree: disable negative pathspec because it's not supported 2014-12-01 11:33:45 -08:00
mailinfo.c git-mailinfo: add --message-id 2014-11-25 15:24:55 -08:00
mailsplit.c mailsplit: remove unnecessary unlink(2) call 2014-10-07 10:49:57 -07:00
merge-base.c get_merge_bases(): always clean-up object flags 2014-10-30 12:51:10 -07:00
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c
merge-tree.c merge-tree: remove unused df_conflict arguments 2014-09-02 11:02:58 -07:00
merge.c Merge branch 'rs/plug-strbuf-leak-in-merge' 2015-01-12 11:38:37 -08:00
mktag.c
mktree.c builtin/mktree.c: use ALLOC_GROW() in append_to_tree() 2014-03-03 14:54:45 -08:00
mv.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
name-rev.c use xstrfmt to replace xmalloc + strcpy/strcat 2014-06-19 15:20:54 -07:00
notes.c builtin/notes: add --allow-empty, to allow storing empty notes 2014-11-12 11:00:11 -08:00
pack-objects.c pack-objects: use --objects-edge-aggressive for shallow repos 2014-12-29 09:58:25 -08:00
pack-redundant.c
pack-refs.c
patch-id.c patch-id: make it stable against hunk reordering 2014-06-10 13:09:24 -07:00
prune-packed.c prune-packed: use for_each_loose_file_in_objdir 2014-10-16 10:10:40 -07:00
prune.c prune: keep objects reachable from recent objects 2014-10-16 10:10:42 -07:00
push.c Merge branch 'jk/push-simple' into maint 2014-12-22 12:18:08 -08:00
read-tree.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
receive-pack.c Merge branch 'js/push-to-deploy' 2014-12-22 12:27:04 -08:00
reflog.c Merge branch 'jk/prune-mtime' 2014-10-29 10:07:56 -07:00
remote-ext.c use skip_prefix() to avoid more magic numbers 2014-10-07 11:09:16 -07:00
remote-fd.c
remote.c git remote: allow adding remotes agreeing with url.<...>.insteadOf 2014-12-23 12:42:36 -08:00
repack.c Merge branch 'mh/simplify-repack-without-refs' 2014-12-22 12:26:50 -08:00
replace.c refs.c: pass the ref log message to _create/delete/update instead of _commit 2014-10-15 10:47:22 -07:00
rerere.c rerere: fix for merge.conflictstyle 2014-04-30 10:30:02 -07:00
reset.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
rev-list.c commit: record buffer length in cache 2014-06-13 12:09:38 -07:00
rev-parse.c Merge branch 'jc/merge-bases' 2015-01-07 12:55:05 -08:00
revert.c parse-options: multi-word argh should use dash to separate words 2014-03-24 10:43:34 -07:00
rm.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
send-pack.c Merge branch 'jc/push-cert' 2014-10-08 13:05:25 -07:00
shortlog.c
show-branch.c Merge branch 'ak/show-branch-usage-string' 2015-01-20 16:16:09 -08:00
show-ref.c
stripspace.c
symbolic-ref.c refs.c: change resolve_ref_unsafe reading argument to be a flags field 2014-10-15 10:47:24 -07:00
tag.c refs.c: pass the ref log message to _create/delete/update instead of _commit 2014-10-15 10:47:22 -07:00
unpack-file.c
unpack-objects.c index-pack: terminate object buffers with NUL 2014-12-09 11:56:37 -08:00
update-index.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
update-ref.c update-ref: fix "verify" command with missing <oldvalue> 2014-12-11 11:56:53 -08:00
update-server-info.c
upload-archive.c
var.c
verify-commit.c verify-commit: scriptable commit signature verification 2014-06-23 15:50:31 -07:00
verify-pack.c run-command: introduce CHILD_PROCESS_INIT 2014-08-20 09:53:37 -07:00
verify-tag.c
write-tree.c